Compiling Spin
ersmith
Posts: 6,053
I've been distracted by other things lately and only just got around to reading the "Why oh why in this day and age" thread, which wandered off-topic into the subject of Spin compilers. It appears there's some confusion about spin2cpp and its role in compiling Spin, so I wanted to point out a few things:
(1) spin2cpp started off as a converter from Spin to C++, but now it's a full compiler. It parses Spin code into an intermediate tree, and then from that tree can produce C, C++, PASM (P1 or P2), or binary output. The name is misleading now. Actually there is a different frontend for spin2cpp (called "fastspin") that mimics the interface of openspin, so perhaps we should start using that name for the compiler suite.
(2) The spin2cpp parser is written in YACC, so it is in some sense a formal definition of Spin grammar (at least, the Spin grammar accepted by spin2cpp; but so far I haven't found any programs that openspin takes but spin2cpp doesn't). There are actually two phases to parsing: lexical analysis (converting strings into tokens) and then the grammar (figuring out how the tokens go together to make a program). The grammar is in YACC; the lexical analysis is hand-written in C just to make some things easier (e.g. for indenting we emit INDENT and OUTDENT tokens).
(3) fastspin/spin2cpp does do optimization, both on the intermediate tree (so language independent optimizations) and on the PASM output. Examples of some of the optimizations it can perform are: dead code elimination, function inlining (any sufficiently small function is inlined, as are all functions that are called only once), common sub-expression elimination, and loop strength reduction.
(4) fastspin/spin2cpp can produce both LMM and plain COG programs. There are options to place data in either COG or HUB memory, and similarly for code to go in COG or HUB memory. On the P1, code in HUB is interpreted via the usual LMM mechanism, and also FCACHE. FCACHE means that small loops are compiled as plain COG code and loaded at run time into COG memory, so the whole loop runs at full speed (rather than doing the LMM read/execute one instruction at a time over and over).
(5) fastspin supports inline assembly. In fact many of the Spin primitives like COGNEW are implemented internally as plain Spin functions that use inline assemlby; these are then inlined automatically by the optimizer.
(6) The P2 output routines are a bit out of date, they need to be brought up to the most recent P2 definition.
(7) Performance of the PASM binaries produced by fastspin/spin2cpp is generally decent. PropGCC can usually do better (it has a more sophisticated optimizer) but I think fastspin can beat all of the other languages/compilers out there (other than pure PASM of course) and on some specific tasks like pin manipulation it can even beat PropGCC. As a representative example, here are the results from Heater's fft benchmark with recent builds of fastspin and PropGCC:
(1) spin2cpp started off as a converter from Spin to C++, but now it's a full compiler. It parses Spin code into an intermediate tree, and then from that tree can produce C, C++, PASM (P1 or P2), or binary output. The name is misleading now. Actually there is a different frontend for spin2cpp (called "fastspin") that mimics the interface of openspin, so perhaps we should start using that name for the compiler suite.
(2) The spin2cpp parser is written in YACC, so it is in some sense a formal definition of Spin grammar (at least, the Spin grammar accepted by spin2cpp; but so far I haven't found any programs that openspin takes but spin2cpp doesn't). There are actually two phases to parsing: lexical analysis (converting strings into tokens) and then the grammar (figuring out how the tokens go together to make a program). The grammar is in YACC; the lexical analysis is hand-written in C just to make some things easier (e.g. for indenting we emit INDENT and OUTDENT tokens).
(3) fastspin/spin2cpp does do optimization, both on the intermediate tree (so language independent optimizations) and on the PASM output. Examples of some of the optimizations it can perform are: dead code elimination, function inlining (any sufficiently small function is inlined, as are all functions that are called only once), common sub-expression elimination, and loop strength reduction.
(4) fastspin/spin2cpp can produce both LMM and plain COG programs. There are options to place data in either COG or HUB memory, and similarly for code to go in COG or HUB memory. On the P1, code in HUB is interpreted via the usual LMM mechanism, and also FCACHE. FCACHE means that small loops are compiled as plain COG code and loaded at run time into COG memory, so the whole loop runs at full speed (rather than doing the LMM read/execute one instruction at a time over and over).
(5) fastspin supports inline assembly. In fact many of the Spin primitives like COGNEW are implemented internally as plain Spin functions that use inline assemlby; these are then inlined automatically by the optimizer.
(6) The P2 output routines are a bit out of date, they need to be brought up to the most recent P2 definition.
(7) Performance of the PASM binaries produced by fastspin/spin2cpp is generally decent. PropGCC can usually do better (it has a more sophisticated optimizer) but I think fastspin can beat all of the other languages/compilers out there (other than pure PASM of course) and on some specific tasks like pin manipulation it can even beat PropGCC. As a representative example, here are the results from Heater's fft benchmark with recent builds of fastspin and PropGCC:
gcc LMM -O2: 65247 us gcc LMM -Os: 138096 us fastspin 3.2.0 -O: 170822 us fastspin 3.2.0: 171422 us Catalina LMM: ~348000 us gcc CMM -O2: 520296 us gcc CMM -Os: 567318 us PropBasic LMM(*): 690842 us JDForth: ~1200000 us OpenSpin: 1465321 us (*): PropBasic produced incorrect results, so should be taken with a grain of salt Catalina and JDForth results are from a slightly older benchmark version, as posted on forums
Comments
Here's the results from spin2cpp. The exact command line is: which produces spibase.pasm (the assembly listing) and spibase.binary (the compiled binary). Here's spibase.pasm:
The inner loop in the spiout function is pretty good, I think. I do see that the optimizer missed the chance to replace _var_02 with the constant 1, but otherwise I think it's what most PASM programmers would write by hand. It's also slightly better than the code GCC produces for a similar function; in this case fastspin/spin2cpp's knowledge of the Prop architecture was able to beat GCC's better generic optimizers.
Excuse me whist I transcribe it into Javascript....
Since Chip asked recently about formal grammars, would it be possible to convert Spin's into readable form (e.g. BNF)?
-Phil
Even better, how hard would it be to make fastspin able to output code in multiple output formats in the same binary? You could have all initialization code (since you have to boot into PNUT anyway) and cogs that don't need to be fast running PNUT to save RAM, and cogs that need to be fast running native PASM compiled from Spin. To specify what target should be used, I would suggest a special builtin function that takes a function and an output format and returns a pointer that can be passed to COGNEW. Obviously, if a single function needs to be compiled to multiple output formats, the compiler should be smart enough to do that.
It would be really neat if you made it so it could do things like compile arithmetic-only functions to floating point command stream scripts, similar to what JasonDorie did for the ELEV-8 firmware in C. This is why I think the special builtin above would be better than a special COGNEW: you might not want to pass the pointer to COGNEW - you might want to pass it to your floating point cog instead.
EDIT: reworded stuff
This should be a fantastic PASM learning tool for me in Linux.
Haven't got past FTDI drivers yet, "sudo shy".
I'll get over it as soon as I learn to make scripts, something like batch files that I have done before. Still learning the Linux landscape.
Quite happy with PropellerIDE and BST. Over thought that one, not knowing BST was part of the basic install.
Just curious since I have been using Linux for many years now, but what do you mean "haven't got past FTDI drivers yet"? I know I never have to worry about the drivers as Linux Mint works out of the box with this sort of thing.
Peter, I'm going by what I read on the FTDI site, I don't have my Linux machine up at the moment, but Parallax diverted me to the FTDI site for installation instructions. I couldn't find a .deb package to do an express install. I am going wrong direction, right?
EDIT: When I can get USB running, I will have the best of all worlds.
I think the Linux advice is well well out of date, many serial USB devices just work with Linux these days, and have done so for many years. This includes the popular FTDI drivers too of course.
Ignore what you saw and just plug it in and it comes up as ttyUSB0 etc. I use minicom for a serial terminal.
This is distribution dependent, so it may not apply to your Linux installation. Debian and its derivatives (and possibly RedHat/Fedora derivates as well) require that you either run as root or add user to the "dialout" group. Instructions here: http://askubuntu.com/questions/112568/how-do-i-allow-a-non-default-user-to-use-serial-device-ttyusb0 You will need to log out and back in for the change to take affect.
The intermediate form is an abstract syntax tree (AST) which is a binary tree. But unfortunately there's no documentation of the exact details of the nodes, beyond what's in ast.h. Something to put on my todo list .
Below is a vaguely BNF like description that's generated by bison -v, edited a bit by hand to make the nonterminals more obvious. Note that this isn't exactly the same as the language OpenSpin accepts; there are some extensions like inline assembly and IF/THEN/ELSE expressions.
Thanks David, My distribution is Linux Mint (Sarah), and I should expand on what I said, I should say that: when I plugged in the Propeller, the USB device wasn't recognized. I know I installed an express card USB 3.0, and I want to be able to take advantage of that. I will look into this a little later.
Note that BST does have an optimizer already.
Mike Y.
Let's say you have the following code:
The only reason variable "c" exists is to save the value that will get written to "a" so it doesn't get clobbered by "b := a". A store and a load can be shaved off by just leaving the result of "a+b" on the stack until after the "b := a", and then assigning it to "a". Furthermore, the "b := " of "b := a" can have its pop flag cleared, to save having to load "a" twice. BST's optimizer did not do this (in fact, it does very little).
This is sort of a bad example, since the above optimizations can be performed in plain Spin as shown below. However, if any control structures were in between "c := a+b" and "a := c", it would not be possible to leave the values of the partially evaluated expressions on the stack in plain Spin.
Thanks,
-Phil
Might be best to open a new thread for this
I will if I can't figure it out, certainly don't want to derail the OP.