I've been distracted by other things lately and only just got around to reading the "Why oh why in this day and age" thread, which wandered off-topic into the subject of Spin compilers. It appears there's some confusion about spin2cpp and its role in compiling Spin, so I wanted to point out a few things:
(1) spin2cpp started off as a converter from Spin to C++, but now it's a full compiler. It parses Spin code into an intermediate tree, and then from that tree can produce C, C++, PASM (P1 or P2), or binary output. The name is misleading now. Actually there is a different frontend for spin2cpp (called "fastspin") that mimics the interface of openspin, so perhaps we should start using that name for the compiler suite.
(2) The spin2cpp parser is written in YACC, so it is in some sense a formal definition of Spin grammar (at least, the Spin grammar accepted by spin2cpp; but so far I haven't found any programs that openspin takes but spin2cpp doesn't). There are actually two phases to parsing: lexical analysis (converting strings into tokens) and then the grammar (figuring out how the tokens go together to make a program). The grammar is in YACC; the lexical analysis is hand-written in C just to make some things easier (e.g. for indenting we emit INDENT and OUTDENT tokens).
(3) fastspin/spin2cpp does do optimization, both on the intermediate tree (so language independent optimizations) and on the PASM output. Examples of some of the optimizations it can perform are: dead code elimination, function inlining (any sufficiently small function is inlined, as are all functions that are called only once), common sub-expression elimination, and loop strength reduction.
(4) fastspin/spin2cpp can produce both LMM and plain COG programs. There are options to place data in either COG or HUB memory, and similarly for code to go in COG or HUB memory. On the P1, code in HUB is interpreted via the usual LMM mechanism, and also FCACHE. FCACHE means that small loops are compiled as plain COG code and loaded at run time into COG memory, so the whole loop runs at full speed (rather than doing the LMM read/execute one instruction at a time over and over).
(5) fastspin supports inline assembly. In fact many of the Spin primitives like COGNEW are implemented internally as plain Spin functions that use inline assemlby; these are then inlined automatically by the optimizer.
(6) The P2 output routines are a bit out of date, they need to be brought up to the most recent P2 definition.
(7) Performance of the PASM binaries produced by fastspin/spin2cpp is generally decent. PropGCC can usually do better (it has a more sophisticated optimizer) but I think fastspin can beat all of the other languages/compilers out there (other than pure PASM of course) and on some specific tasks like pin manipulation it can even beat PropGCC. As a representative example, here are the results from Heater's fft benchmark with recent builds of fastspin and PropGCC:
gcc LMM -O2: 65247 us
gcc LMM -Os: 138096 us
fastspin 3.2.0 -O: 170822 us
fastspin 3.2.0: 171422 us
Catalina LMM: ~348000 us
gcc CMM -O2: 520296 us
gcc CMM -Os: 567318 us
PropBasic LMM(*): 690842 us
JDForth: ~1200000 us
OpenSpin: 1465321 us
(*): PropBasic produced incorrect results, so should be taken with a grain of salt
Catalina and JDForth results are from a slightly older benchmark version, as
posted on forums