Shop OBEX P1 Docs P2 Docs Learn Events
Spin compiler shootout — Parallax Forums

Spin compiler shootout

Wuerfel_21Wuerfel_21 Posts: 5,051
edited 2021-06-07 19:05 in Propeller 1

In light of my recent work on flexspin's bytecode output, I thought it'd be interesting to do some comparisons between the different Spin compilers.

Important notes:

  • The binary size given is what the compiler says it is, I think BSTC always reports 4 bytes too little
  • Flexspin's PASM binaries include space for VARiables, whereas the bytecode compilers don't.
  • Flexspin version used is current git build (ignore attached binary, use the one from the post below)
  • Homespun version used is 0.32p2
  • Openspin should be / is the same as Propeller Tool and Propellent.
  • The default O-level for flexspin currently is O1 for PASM and O0 for bytecode. This may change at some point, IDK.
  • BSTC and flexspin have fine-grained optimization control, but I'm only testing default and -Ocgr for BSTC and the three presets for flexspin

If you have something that'd be neat to benchmark, please post it

Mandelbrot

Simple math and loop benchmark

Compiler Binary size Cycles
openspin 876 224442480
openspin -u 740 224442480
homespun 876 224442480
bstc 872 224275872
bstc -Ocgr 736 223241696
flexspin -O0 3828 45648784
flexspin -O1 2260 33349568
flexspin -O2 2108 33507216
flexspin --interp=rom -O0 872 224272960
flexspin --interp=rom -O1 720 224000864
flexspin --interp=rom -O2 - FAIL: internal error

Verdict: On the bytecode front, flexspin wins on code size, bstc wins on speed (edit: see below). flexspin PASM -O2 is smaller but slower than -O1 in this case, oddly enough.

Sort benchmark

More complex benchmark based on the directory sorting code from VentilatorOS.

Compiler Binary size Cycles
openspin 3484 83486896
openspin -u 3368 83486896
homespun 3484 83486896
bstc 3480 83486896
bstc -Ocgr 3364 83396240
flexspin -O0 8504 32150304
flexspin -O1 6164 15531168
flexspin -O2 5992 15491232
flexspin --interp=rom -O0 3472 80822432
flexspin --interp=rom -O1 3336 80822256
flexspin --interp=rom -O2 - FAIL: internal error

Verdict: flexspin wins. Not sure why flexspin bytecode is so much faster than BSTC.

Spintris

A test for correctness and binary size in a real application.

Compiler Binary size Note
openspin 29836 PASS
openspin -u 29516 PASS
homespun 29836 PASS
bstc 29832 PASS
bstc -Ocgr 29272 PASS
flexspin -O0 55000 FAIL: Too large
flexspin -O1 44000 FAIL: Too large
flexspin -O2 - FAIL: out of registers
flexspin --interp=rom -O0 29280 PASS
flexspin --interp=rom -O1 29072 PASS
flexspin --interp=rom -O2 - FAIL: internal error

Verdict: flexspin bytecode wins.

Spin Hexagon

Similiar to Spintris, tests correctness and binary size. Interesting because it is designed for flexspin's PASM backend (bytecode is not fast enough for useful framerates), but doesn't use any nonstandard features.

Test only concerns hexagon.spin, not hexagon_boot.spin. The flexspin -O1,inline-single,loop-reduce --fcache=86 entry is the recommended settings from the Spin Hexagon README

Compiler Binary size Note
openspin 15952 PASS
openspin -u - FAIL: compiler error
homespun - FAIL: parser error
bstc 20556 FAIL: playfield does not spin
bstc -Ocgr 20000 FAIL: playfield does not spin
flexspin -O0 44224 FAIL: Too large
flexspin -O1 29676 PASS
flexspin -O2 - FAIL: out of registers
flexspin -O1,inline-single,loop-reduce --fcache=86 29040 PASS
flexspin --interp=rom -O0 15928 PASS
flexspin --interp=rom -O1 15096 PASS
flexspin --interp=rom -O2 - FAIL: internal error

Verdict: Only useful on flexspin PASM, so eh. BSTC produces oddly large, defective binaries. openspin -u and homespun don't get very far at all.

Template

Template for the table

|Compiler|Binary size|Note|
|-|-:|-:|
|openspin|||
|openspin -u|||
|homespun|||
|bstc|||
|bstc -Ocgr|||
|flexspin -O0|||
|flexspin -O1|||
|flexspin -O2|||
|flexspin --interp=rom -O0|||
|flexspin --interp=rom -O1|||
|flexspin --interp=rom -O2|||

Comments

  • Wuerfel_21Wuerfel_21 Posts: 5,051
    edited 2021-06-07 17:00

    Aaaaand ooop, I forgot to actually use the latest flexspin build. Gotta rerun the benchmarks I guess.
    .
    EDIT: yep, done that, slight improvements. New build attached again. Confusingly, it has the same version string, because it doesn't have the dirty flag in it.

  • Wuerfel_21Wuerfel_21 Posts: 5,051
    edited 2021-06-07 17:47

    Aaaand figured out why BSTC is faster in the mandelbrot benchmark

    New mandelbrot result for flexspin --interp=rom -O1 is 222966688 cycles ,for flexspin --interp=rom -O0 223238784 cycles. Other results unchanged.

  • Wuerfel_21Wuerfel_21 Posts: 5,051
    edited 2021-06-07 19:18

    Bonus round: Spin Hexagon booter

    This is the hexagon_boot.spin from Spin Hexagon. It uses KyeFAT to wrangle SD stuffs, so it's mostly just that.

    Compiler Binary size Note
    openspin 11008 PASS
    openspin -u - FAIL: compiler error
    homespun 11008 PASS
    bstc 11004 PASS
    bstc -Ocgr 9208 PASS
    flexspin -O0 - FAIL: out of registers
    flexspin -O1 25228 PASS
    flexspin -O2 22836 FAIL: garbled screen, works fine otherwise
    -O1,inline-single,loop-reduce --fcache=86 24880 PASS
    flexspin --interp=rom -O0 11112 PASS
    flexspin --interp=rom -O1 8476 PASS
    flexspin --interp=rom -O2 - FAIL: internal error
  • Nice overview

  • mparkmpark Posts: 1,305

    |homespun| - |FAIL: parser error|

    I hang my head in shame.

  • @mpark said:

    |homespun| - |FAIL: parser error|

    I hang my head in shame.

    The particular error is

    Error: hexagon.spin (1877, 6): Variable not allowed in constant expression
    byte W______[5]
         ^
    

    IDK what it is even trying to do there.

    Neat that the error report gives the column though, that may be neat to add into flexspin. Though the reported line numbers are already not very good, so idk if that'd just be more confusing

  • I've been able to try flexspin --interp=rom on a few of the standard C/Spin benchmarks.

    xxtea (simple compression/decompression algorithm):

    flexspin pasm     (Spin):    22737 cycles / 3452 bytes
    flexspin bytecode (C):      930704 cycles / 1836 bytes
    flexspin bytecode (Spin):  1013968 cycles / 1180 bytes
    bstc -Ogcr        (Spin):  1029920 cycles / 1200 bytes
    openspin bytecode (Spin):  1044640 cycles / 1324 bytes (1200 bytes with -u)
    

    fftbench (Heater's original FFT benchmark demo):

    flexspin pasm     (C):        100 ms / 22560 bytes
    flexspin pasm     (Spin):      92 ms / 13720 bytes
    flexspin bytecode (C):       1682 ms / 12056 bytes
    flexspin bytecode (Spin):    1499 ms /  2788 bytes
    openspin bytecode (Spin):    1465 ms /  3244 bytes
    bstc -Ogcr        (Spin):    1465 ms /  3244 bytes
    

    It looks like speed wise flexspin bytecode still has some room for improvement on the math intensive benchmark, but the sizes are pretty good.

  • Wuerfel_21Wuerfel_21 Posts: 5,051
    edited 2021-06-19 20:43

    The FFT is slower on flexspin than bstc and openspin?

    If you shoot me that code, I'll investigate.

    Never mind, figured it out.

  • Also, flexspin PASM seems to give different results on the fft_bench. All the other compilers just print 4 bins, but flexspin PASM prints loads.

  • @Wuerfel_21 said:
    Also, flexspin PASM seems to give different results on the fft_bench. All the other compilers just print 4 bins, but flexspin PASM prints loads.

    Are you sure your frontend code is up to date with mine? I'm not getting that problem, but I do remember seeing it (and fixing it) a little while ago. I think it was in 3c63c44e8.

    I've attached the version of fft_bench I used for the tests, although another version I pulled from Heater's github also works for me.

  • @ersmith said:

    @Wuerfel_21 said:
    Also, flexspin PASM seems to give different results on the fft_bench. All the other compilers just print 4 bins, but flexspin PASM prints loads.

    Are you sure your frontend code is up to date with mine? I'm not getting that problem, but I do remember seeing it (and fixing it) a little while ago. I think it was in 3c63c44e8.

    Uhh, oh yes, I think I'm just an idiot and ran it with that change which I mentioned on that issue that breaks the PASM backend.

    I've attached the version of fft_bench I used for the tests, although another version I pulled from Heater's github also works for me.

    I've got almost the same, except mine reports us instead of ms.

  • Cluso99Cluso99 Posts: 18,069

    I just remembered I built my faster spin interpreter for the P1V ROM where the vector table was squeezed in the ROM. I will try and find and post it.

  • Wuerfel_21Wuerfel_21 Posts: 5,051
    edited 2021-06-23 19:48

    Rerunning some of the tests with latest flexspin (should not differ from 5.5.1 though). All these are for -O1 --interp=rom, because that's the interesting setting.

    Test Size Time Remark
    mandelbrot 720 222966688 cycles Unchanged
    sortbench 3336 80629136 cycles Slightly faster
    Spintris 29048 --- Slightly smaller
    Spin Hexagon 15048 --- Slightly smaller
    Hexagon booter 8464 --- Slightly smaller
    fftbench (Spin) 2752 1434 ms Slightly smaller and faster
    fftbench (C) 12040 1682 ms Unchanged (I think Eric used the binary size not, the reported program size, which is 16 less)
Sign In or Register to comment.