Spin compiler shootout
In light of my recent work on flexspin's bytecode output, I thought it'd be interesting to do some comparisons between the different Spin compilers.
Important notes:
- The binary size given is what the compiler says it is, I think BSTC always reports 4 bytes too little
- Flexspin's PASM binaries include space for VARiables, whereas the bytecode compilers don't.
- Flexspin version used is current git build (ignore attached binary, use the one from the post below)
- Homespun version used is 0.32p2
- Openspin should be / is the same as Propeller Tool and Propellent.
- The default O-level for flexspin currently is O1 for PASM and O0 for bytecode. This may change at some point, IDK.
- BSTC and flexspin have fine-grained optimization control, but I'm only testing default and -Ocgr for BSTC and the three presets for flexspin
If you have something that'd be neat to benchmark, please post it
Mandelbrot
Simple math and loop benchmark
Compiler | Binary size | Cycles |
---|---|---|
openspin | 876 | 224442480 |
openspin -u | 740 | 224442480 |
homespun | 876 | 224442480 |
bstc | 872 | 224275872 |
bstc -Ocgr | 736 | 223241696 |
flexspin -O0 | 3828 | 45648784 |
flexspin -O1 | 2260 | 33349568 |
flexspin -O2 | 2108 | 33507216 |
flexspin --interp=rom -O0 | 872 | 224272960 |
flexspin --interp=rom -O1 | 720 | 224000864 |
flexspin --interp=rom -O2 | - | FAIL: internal error |
Verdict: On the bytecode front, flexspin wins on code size, bstc wins on speed (edit: see below). flexspin PASM -O2 is smaller but slower than -O1 in this case, oddly enough.
Sort benchmark
More complex benchmark based on the directory sorting code from VentilatorOS.
Compiler | Binary size | Cycles |
---|---|---|
openspin | 3484 | 83486896 |
openspin -u | 3368 | 83486896 |
homespun | 3484 | 83486896 |
bstc | 3480 | 83486896 |
bstc -Ocgr | 3364 | 83396240 |
flexspin -O0 | 8504 | 32150304 |
flexspin -O1 | 6164 | 15531168 |
flexspin -O2 | 5992 | 15491232 |
flexspin --interp=rom -O0 | 3472 | 80822432 |
flexspin --interp=rom -O1 | 3336 | 80822256 |
flexspin --interp=rom -O2 | - | FAIL: internal error |
Verdict: flexspin wins. Not sure why flexspin bytecode is so much faster than BSTC.
Spintris
A test for correctness and binary size in a real application.
Compiler | Binary size | Note |
---|---|---|
openspin | 29836 | PASS |
openspin -u | 29516 | PASS |
homespun | 29836 | PASS |
bstc | 29832 | PASS |
bstc -Ocgr | 29272 | PASS |
flexspin -O0 | 55000 | FAIL: Too large |
flexspin -O1 | 44000 | FAIL: Too large |
flexspin -O2 | - | FAIL: out of registers |
flexspin --interp=rom -O0 | 29280 | PASS |
flexspin --interp=rom -O1 | 29072 | PASS |
flexspin --interp=rom -O2 | - | FAIL: internal error |
Verdict: flexspin bytecode wins.
Spin Hexagon
Similiar to Spintris, tests correctness and binary size. Interesting because it is designed for flexspin's PASM backend (bytecode is not fast enough for useful framerates), but doesn't use any nonstandard features.
Test only concerns hexagon.spin
, not hexagon_boot.spin
. The flexspin -O1,inline-single,loop-reduce --fcache=86
entry is the recommended settings from the Spin Hexagon README
Compiler | Binary size | Note |
---|---|---|
openspin | 15952 | PASS |
openspin -u | - | FAIL: compiler error |
homespun | - | FAIL: parser error |
bstc | 20556 | FAIL: playfield does not spin |
bstc -Ocgr | 20000 | FAIL: playfield does not spin |
flexspin -O0 | 44224 | FAIL: Too large |
flexspin -O1 | 29676 | PASS |
flexspin -O2 | - | FAIL: out of registers |
flexspin -O1,inline-single,loop-reduce --fcache=86 | 29040 | PASS |
flexspin --interp=rom -O0 | 15928 | PASS |
flexspin --interp=rom -O1 | 15096 | PASS |
flexspin --interp=rom -O2 | - | FAIL: internal error |
Verdict: Only useful on flexspin PASM, so eh. BSTC produces oddly large, defective binaries. openspin -u and homespun don't get very far at all.
Template
Template for the table
|Compiler|Binary size|Note| |-|-:|-:| |openspin||| |openspin -u||| |homespun||| |bstc||| |bstc -Ocgr||| |flexspin -O0||| |flexspin -O1||| |flexspin -O2||| |flexspin --interp=rom -O0||| |flexspin --interp=rom -O1||| |flexspin --interp=rom -O2|||
Comments
Aaaaand ooop, I forgot to actually use the latest flexspin build. Gotta rerun the benchmarks I guess.
.
EDIT: yep, done that, slight improvements. New build attached again. Confusingly, it has the same version string, because it doesn't have the dirty flag in it.
Aaaand figured out why BSTC is faster in the mandelbrot benchmark
New mandelbrot result for
flexspin --interp=rom -O1
is 222966688 cycles ,forflexspin --interp=rom -O0
223238784 cycles. Other results unchanged.Bonus round: Spin Hexagon booter
This is the
hexagon_boot.spin
from Spin Hexagon. It uses KyeFAT to wrangle SD stuffs, so it's mostly just that.Nice overview
I hang my head in shame.
The particular error is
IDK what it is even trying to do there.
Neat that the error report gives the column though, that may be neat to add into flexspin. Though the reported line numbers are already not very good, so idk if that'd just be more confusing
I've been able to try flexspin --interp=rom on a few of the standard C/Spin benchmarks.
xxtea (simple compression/decompression algorithm):
fftbench (Heater's original FFT benchmark demo):
It looks like speed wise flexspin bytecode still has some room for improvement on the math intensive benchmark, but the sizes are pretty good.
The FFT is slower on flexspin than bstc and openspin?
If you shoot me that code, I'll investigate.
Never mind, figured it out.
Also, flexspin PASM seems to give different results on the fft_bench. All the other compilers just print 4 bins, but flexspin PASM prints loads.
Are you sure your frontend code is up to date with mine? I'm not getting that problem, but I do remember seeing it (and fixing it) a little while ago. I think it was in 3c63c44e8.
I've attached the version of fft_bench I used for the tests, although another version I pulled from Heater's github also works for me.
Uhh, oh yes, I think I'm just an idiot and ran it with that change which I mentioned on that issue that breaks the PASM backend.
I've got almost the same, except mine reports us instead of ms.
I just remembered I built my faster spin interpreter for the P1V ROM where the vector table was squeezed in the ROM. I will try and find and post it.
Rerunning some of the tests with latest flexspin (should not differ from 5.5.1 though). All these are for
-O1 --interp=rom
, because that's the interesting setting.