flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler
ersmith
Posts: 6,030
Update October 2020: "fastspin" has been renamed to "flexspin" to better fit in to the naming scheme of the FlexProp GUI for Propeller development.
Please go to the spin2cpp release directory for the latest version of flexspin. There is also a simple GUI for Spin and BASIC development on P2 at FlexProp.
Flexspin is a compiler for Spin (and much of Spin2), BASIC, and C which can produce binaries for P1 and P2. FlexProp is a very simple IDE which uses flexspin.
I won't try to keep this post up to date. The links above always have the most recent versions of flexspin and FlexProp. The compiler has beta C language support now. The Spin support is pretty mature, and BASIC is in good shape. Lots of test programs run on the P2 Eval board.
Updated May 18, 2018: Some more bug fixes both to multiple assignment and to PNut compatibility. Now able to build ROM_137PBJ.spin2 correctly (or at least, the same as PNut ).
Updated May 17, 2018: Fixed a couple of problems that were impacting PNut compatibility, and improved the syntax for multiple assignment (so the parentheses are no longer required in most cases).
Updated May 16, 2018: Updated with a new Beta of fastspin that has support for multiple assignment and functions that return multiple values. I'd also like to point to a simple GUI for fastspin + P2: https://github.com/totalspectrum/spin2gui/releases. It allows you to edit a .spin or .spin2 file, compile it, and download to the P2 hardware (thanks to Dave Hein for his loadp2 program). It's configurable and could (at least in theory) also be used for P1 development, but I haven't tested that since there are already so many P1 GUIs available . AFAIK the only competitor for spin2gui on P2 is PNut, and at the moment PNut only supports assembly code.
Updated May 10, 2018: My, how time flies... Anyway, now that hardware seems imminent I've gone back and updated fastspin for the v32b FPGA images. fastspin -2 is able to compile all of the samples from Dave Hein's p2asm, including the boot ROM, so the instruction set coverage is pretty good now (but obviously let me know if anything is missing!). The Spin compiler itself is less tested in P2 mode, but I've used it for some emulators and demos on the FPGA. As always, usage is command line only and is just:
Updated March 7, 2017: Updated the compiler for the v16a instruction set. The instruction set coding is not tested very thoroughly, but simple programs do compile and run.
Updated May 18: Fixed several bugs reported in the forums. The new binary is attached as fastspin_beta4.zip. Note that fastspin.exe can produce code for either P1 or P2; to get P2 you need to specify the -2 flag.
Updated May 9: Fixed binary output of jumps in COG mode, and updated the push/pop code in the compiler to use postincrement/predecrement mode. The updated binary is attached. Source code, as always, is at github.com/totalspectrum/spin2cpp.
Updated May 7: Fixed a bug in the >< operator (REV works differently on P2) so that fft_bench works now.
Updated May 6: I've attached the beta version of the compiler. The DAT section parser works better (produces the same result as PNut for the inputs I've tested) but is still incomplete; for example, it doesn't understand the fancy ptra++ syntax for read and write. I've fixed a few bugs in the PASM output
too, and added a fastspin specific readme. Usage is pretty simple:
*** Original Message ***
Here's an alpha version of the fastspin compiler for P2. fastspin compiles Spin code to PASM; it otherwise acts very much like openspin, but has a few extensions (such as inline assembly between asm...endasm). This version of fastspin has a -2 option to produce P2 code (the output file will be named with a .p2asm extension).
This is labeled an "alpha" version because assembly code inside DAT sections is not always compiled correctly -- the P2 instruction parser is incomplete and buggy. Inline assembly does (mostly) work, because it's passed through to PNut.
Having said all the caveats, fastspin is able to compile simple programs (like the fibonacci demo), and it may be useful for putting together quick demos and tests of the hardware. I compiled and ran fibo like so:
Please let me know of any issues you find. I'm still working on the P2 support for DAT sections and hope to have that functional soon.
Eric
Please go to the spin2cpp release directory for the latest version of flexspin. There is also a simple GUI for Spin and BASIC development on P2 at FlexProp.
Flexspin is a compiler for Spin (and much of Spin2), BASIC, and C which can produce binaries for P1 and P2. FlexProp is a very simple IDE which uses flexspin.
I won't try to keep this post up to date. The links above always have the most recent versions of flexspin and FlexProp. The compiler has beta C language support now. The Spin support is pretty mature, and BASIC is in good shape. Lots of test programs run on the P2 Eval board.
Updated May 18, 2018: Some more bug fixes both to multiple assignment and to PNut compatibility. Now able to build ROM_137PBJ.spin2 correctly (or at least, the same as PNut ).
Updated May 17, 2018: Fixed a couple of problems that were impacting PNut compatibility, and improved the syntax for multiple assignment (so the parentheses are no longer required in most cases).
Updated May 16, 2018: Updated with a new Beta of fastspin that has support for multiple assignment and functions that return multiple values. I'd also like to point to a simple GUI for fastspin + P2: https://github.com/totalspectrum/spin2gui/releases. It allows you to edit a .spin or .spin2 file, compile it, and download to the P2 hardware (thanks to Dave Hein for his loadp2 program). It's configurable and could (at least in theory) also be used for P1 development, but I haven't tested that since there are already so many P1 GUIs available . AFAIK the only competitor for spin2gui on P2 is PNut, and at the moment PNut only supports assembly code.
Updated May 10, 2018: My, how time flies... Anyway, now that hardware seems imminent I've gone back and updated fastspin for the v32b FPGA images. fastspin -2 is able to compile all of the samples from Dave Hein's p2asm, including the boot ROM, so the instruction set coverage is pretty good now (but obviously let me know if anything is missing!). The Spin compiler itself is less tested in P2 mode, but I've used it for some emulators and demos on the FPGA. As always, usage is command line only and is just:
fastspin -2 myfile.spin2This will produce myfile.pasm2 (which is the converted PASM2) and myfile.binary. You can load myfile.binary with PNut, I think, but I use Dave Hein's excellent loadp2 program, included with his p2gcc package.
Updated March 7, 2017: Updated the compiler for the v16a instruction set. The instruction set coding is not tested very thoroughly, but simple programs do compile and run.
Updated May 18: Fixed several bugs reported in the forums. The new binary is attached as fastspin_beta4.zip. Note that fastspin.exe can produce code for either P1 or P2; to get P2 you need to specify the -2 flag.
Updated May 9: Fixed binary output of jumps in COG mode, and updated the push/pop code in the compiler to use postincrement/predecrement mode. The updated binary is attached. Source code, as always, is at github.com/totalspectrum/spin2cpp.
Updated May 7: Fixed a bug in the >< operator (REV works differently on P2) so that fft_bench works now.
Updated May 6: I've attached the beta version of the compiler. The DAT section parser works better (produces the same result as PNut for the inputs I've tested) but is still incomplete; for example, it doesn't understand the fancy ptra++ syntax for read and write. I've fixed a few bugs in the PASM output
too, and added a fastspin specific readme. Usage is pretty simple:
fastspin -2 fibo.spinproduces fibo.p2asm, which can then be loaded by PNut.
*** Original Message ***
Here's an alpha version of the fastspin compiler for P2. fastspin compiles Spin code to PASM; it otherwise acts very much like openspin, but has a few extensions (such as inline assembly between asm...endasm). This version of fastspin has a -2 option to produce P2 code (the output file will be named with a .p2asm extension).
This is labeled an "alpha" version because assembly code inside DAT sections is not always compiled correctly -- the P2 instruction parser is incomplete and buggy. Inline assembly does (mostly) work, because it's passed through to PNut.
Having said all the caveats, fastspin is able to compile simple programs (like the fibonacci demo), and it may be useful for putting together quick demos and tests of the hardware. I compiled and ran fibo like so:
fastspin -2 fibo.spin PNut_v7 fibo.p2asmIn PNut I selected compile and run, then opened a terminal window to see the output.
Please let me know of any issues you find. I'm still working on the P2 support for DAT sections and hope to have that functional soon.
Eric
Comments
It's in the spin2cpp repo (https://github.com/totalspectrum/spin2cpp). fastspin is just a different front end to spin2cpp, and it's checked into that repo too.
@David, You didn't add a smiley to your post, but I'm sure you were just kidding. cspin could be used to convert C to Spin, and then compile it with fastspin. However, that would be very limited. We really do need GCC for the P2.
but C++
It doesn't do register assignment at all, it just allocates all local variables in unique COG locations. So large programs can run out of space in COG memory. That's something that can be fixed eventually.
Sometimes simpler and more predictable is easier to manage!
How would your optimize manage this DivMod requirement ?
Google suggests GCC will use one mul/div, where native div opcode gives both remainder and quotient.
Prop docs say
32 x 32 unsigned multiply with 64-bit product
64 / 32 unsigned divide with 32-bit quotient and 32-bit remainder
so that's looking like the same opcodes, compilers should be able to optimise the above well ?
I've just done some tests using GCC on intel D2000, with rather erratic results. Close, but no cigar.
(Looks more like GCC issue than D2000 ?)
Tests on GCC give these possible outcomes, (release build) which moves around with editing other code (?!) :
(debug build gives expected inefficient two copies, but it is quite stable)
Function use seems more stable, but has bonus push/pop & bonus xor ?
No idea why clearing the upper-32b before imul is used, but that XOR seems 'mobile' in GCC, sometimes it pops up after imul, which is not where I would want it.
I'm also unclear if this is mixing signed/unsigned opcodes, or the mnemonics are just lazy..
GCC has it in theory, but in practice I've had trouble getting PropGCC to combine a div and mod, even though we've told it that the appropriate functions produce both results. I think PropGCC4 sometimes gets it right, but PropGCC6 has had issues. There is a standard C function (ldiv) that can do both div and mod, though, which helps.
Would be nice to have stable support of this type of use, as it is something of a HLL blindspot.
Not sure which GCC intel uses, but it seems to get the big steps right, then drop the ball in the details... much like you say..
I needed to have 32*32 -> 64b then 64b/32b -> Div.Mod
P2 should be able to do this in the same number of ASM lines too, I think ?
I can't find numbers yet on those intel opcode speeds, but they are not likely to be fast at 32MHz sysclks.
P2 with Cordic delays, may be similar ? / faster ?
So that labels in the inline assembly could start at the left column. PASM code doesn't use indentation the same way Spin does, so I thought it was more prudent to explicitly mark the end of the assembly (it makes parsing easier too). There probably would be some way to stick with just indentation (maybe require labels to start with ":" ) but I'm lazy .
Eric
Nice ASM example. How many cycles is that in P2 ?
As a rough check, I can get something like 6*32+4*2 = 200
ie guess 6 cordic lines need 32c each and 4 non cordic lines are 2c, harder to get an odd number tho ?
Can you post the code in another thread, with an empty call as a comparison.
Chip, or someone else may know the answer?
If the reads are identical, it should not matter what phase effects are there, provided they are stable.
The CORDIC takes just as long, no matter how many cogs are using it. Another cog gets a turn each clock, and if a cog doesn't use its turn, it gets wasted and no other cog can use it.
Given P2 hardware says this
32 x 32 unsigned multiply
64 / 32 unsigned divide
to me, that makes supporting unsigned operations quite important on pretty much any P2 language ?
Sounds like Spin needs an extension for P2 ?
I would think extended Spin for the P2 is a given once the dust settles.
Frankly I'm not too worried about it, but here's the code if you'd like to take a look.
I find this (bold added) - seems 39c is a normal delay thru CORDIC
"When a cog issues a CORDIC instruction, it must wait for its hub slot, which is 0..15 clocks away, in order to hand off the command to the CORDIC solver. Thirty-nine clocks later, results will be available via the GETQX and GETQY instructions, which will wait for the results, in case they haven’t arrived yet.
Because each cog’s hub slot comes around every 16 clocks and the pipeline is 38 clocks long, it is possible to overlap CORDIC commands, where three commands are initially given to the CORDIC solver, and then results are picked up and another command is given, indefinitely, until, at the end, three result are picked up"