Automatically converting Spin objects to PASM

ersmith · 2018-04-16 23:59

Edit: 2018-04-28: there was a nasty bug in assignment statements that's fixed now. I've bumped the version number to reflect this, so it's now fastspin 3.7.1. I do encourage readers to check https://github.com/totalspectrum/spin2cpp/releases for newer releases, I probably won't keep updating this thread now that the .cog.spin feature is in an "offical" release.

Edit: 2018-04-25: updated to yet another fastspin.zip that is the 3.7.0 release candidate. Added a number of optimizations and improved the handling of comments in the generated code.

Edit: 2018-04-19: updated to a newer fastspin.zip that adds comments automatically and plays better with openspin (no @@@).

Here are some thoughts I have on a mechanism to automatically convert Spin objects to PASM, along with an implementation. I'd love to get feedback on this.

Automatically Converting Spin To Pasm

Introduction

Spin is the standard programming language for the Propeller. It's generally implemented by means of a bytecode compiler on the PC, producing binaries that are interpreted by a bytecode interpreter in the Propeller ROM. This works well -- Spin programs are compact, and perform reasonably well. But sometimes you need more performance than an interpreter can give.

The fastspin compiler can convert Spin programs to PASM automatically. Originally fastspin only worked on whole programs, which was fine if you wanted to speed up a small program. But PASM code is a lot bigger then bytecode, so converting a whole program to PASM increases its size quite a lot, and may not be feasible for larger programs.

The upcoming release of fastspin (3.7.0) will allow individual Spin objects to be converted to PASM and then integrated with regular Spin programs (compiled with the Propeller Tool, openspin, or other Spin bytecode compilers).

Installing Fastspin

NOTE: I've attached a prerelease version of 3.7.0 to this message, use it for now. After 3.7.0 is released, you can download fastspin.zip from https://github.com/totalspectrum/spin2cpp/releases. (Make sure you get version 3.7.0 or later.)

Unzip it to a folder on your hard drive. You'll get two files: fastspin.exe (the program) and fastspin.md (the documentation). For convenience you may want to add the folder containing fastspin.exe to your system PATH; or, you can just copy fastspin.exe into the folder you're working in.

Using Fastspin to Convert to PASM

This is very easy. For example, to convert a Spin object Fibo.spin to a PASM object that can run in another COG, called Fibo.cog.spin, just do:

   fastspin -w Fibo.spin

in a command line (Windows or Linux or Mac). This produces a file Fibo.cog.spin. You now can use "Fibo.cog" in place of "Fibo" and get the speed benefits.

The only other thing you have to do is add a call to the __cognew method to actually start the object up. (We have to use the special __cognew method because the Spin cognew function only supports executing bytecode methods.)

A complete example

Here's Fibo.spin:

PUB fibo(n)
  if (n < 2)
    return n
  return fibo(n-1) + fibo(n-2)

and here is fibodemo.spin, which uses it:

'' fibodemo.spin: demonstrate COGOBJ with a Fibonacci calculator
'' runs both bytecode and PASM versions of the fibo function

CON
  _clkmode = xtal1 + pll16x
  _clkfreq = 80_000_000

OBJ
  ser: "FullDuplexSerial"
  bytecode: "Fibo"
  pasm: "Fibo.cog"
  
PUB hello | e1, e2, i, n, n2
  pasm.__cognew
  ser.start(31, 30, 0, 115200)
  repeat i from 1 to 10
    e1 := CNT
    n := bytecode.fibo(i)
    e1 := CNT - e1
    e2 := CNT
    n2 := pasm.fibo(i)
    e2 := CNT - e2
    ser.str(string("fibo("))
    ser.dec(i)
    ser.str(string(") = "))
    ser.dec(n)
    ser.str(string(" bytecode time "))
    ser.dec(e1)
    ser.str(string(" / pasm time "))
    ser.dec(e2)
    ser.str(string(" cycles", 13, 10))
    if ( n <> n2)
      ser.str(string("  ERROR", 13, 10))

Note that you can mix regular Spin objects and COG objects freely, and in fact can use the original Fibo.spin alongside the converted Fibo.cog.spin.

The __cognew method was automatically added by fastspin when it translated blinker.spin from Spin to PASM. It does all the housekeeping involved with getting communication going between the Spin code and the PASM code running on another processor. Similarly there's a __cogstop method which will stop the COG. You can also use the regular Spin cogstop function, but this may leave the code in a state which makes it hard to start again, so __cogstop is better.

To compile and run this on the command line I do:

fastspin -w Fibo.spin
openspin fibodemo.spin
propeller-load fibodemo.binary -r -t

You can also use Fibo.cog.spin with any Spin IDE or other Spin tools. It should be legal for all Spin compilers.

The output looks like:

fibo(1) = 1 bytecode time 2272 / pasm time 11536 cycles
fibo(2) = 1 bytecode time 6368 / pasm time 11536 cycles
fibo(3) = 2 bytecode time 10464 / pasm time 12640 cycles
fibo(4) = 3 bytecode time 18656 / pasm time 12640 cycles
fibo(5) = 5 bytecode time 30944 / pasm time 13744 cycles
fibo(6) = 8 bytecode time 51424 / pasm time 15952 cycles
fibo(7) = 13 bytecode time 84192 / pasm time 18160 cycles
fibo(8) = 21 bytecode time 137440 / pasm time 22576 cycles
fibo(9) = 34 bytecode time 223456 / pasm time 29200 cycles
fibo(10) = 55 bytecode time 362720 / pasm time 41344 cycles

Performance

Some things to note about the performance:

(1) There's a fixed overhead of nearly 10000 cycles for managing the inter-COG communication and getting the answer back from the remote COG. So for small calculations the PASM isn't worth it.

(2) Once we start getting slightly more complicated calculations the PASM speed quickly becomes apparent. In this case we can see that the PASM approaches 10x faster than regular Spin bytecode.

Blinking Leds

Another classic test. Here's a simple pin blinking object:

'' blinker.spin
CON
  pin = 15
  pausetime = 40_000_000

PUB run
  DIRA[pin] := 1
  repeat
    OUTA[pin] ^= 1
    waitcnt(CNT+pausetime)

Tweak pin to be something that's actually wired up to an LED, and pausetime to the number of cycles you want to pause.

To use this in a program do something like:

'' blinkdemo.spin
'' blink an LED in another COG
'' COGSPIN version
''
CON
  _clkmode = xtal1 + pll16x
  _clkfreq = 80_000_000
  
OBJ
  fds: "FullDuplexSerial"
  blink: "blinker.cog"
  
PUB demo | id
  fds.start(31, 30, 0, 115200)
  id := blink.__cognew
  blink.run
  fds.str(string("blink running in cog "))
  fds.dec(id)
  fds.tx(13)
  fds.tx(10)
  repeat

Convert blinker.spin to the PASM blinker.cog.spin as usual...

fastspin -w blinker.spin

...and then you can compile and run blinkdemo.spin in the Propeller tool of your choice. In my case it's the command line:

   fastspin -w blinker.spin
   openspin -q blinkdemo.spin
   propeller-load blinkdemo.binary -r -t

Synchronous and Asynchronous Operation

If you've been watching carefully you've noticed that the fibo demo got results back from the PASM COG (i.e. the Spin COG waited for the PASM COG to finish) but the blink demo did not (the blinking ran alongside the Spin COG). The first case is "synchronous" operation, and the second is "asynchronous". You may wonder how fastspin knew to wait in one case and not in the other. The answer is simple: if a method returns a value, fastspin adds code to make the Spin bytecode wait for the PASM's result. If the method never returns a value (and never assigns to the result variable) then there's no wait, and the Spin bytecode can continue on its way.

You can use asynchronous operation to allow the Spin code to do work even while a calculation is in progress. The trick is to start the computation in a method that returns nothing, and then to provide a "getter" method that actually returns the result. For example:

'' multiply all N elements in an array by a number
'' and return the resulting sum of them
VAR
  long answer
  
'' start the array scaling operation
'' we want this to run asynchronously,
'' so do NOT return the result, just store
'' it in the variable "answer"

PUB scaleArray(arrptr, n, scale) | i, t, sum
  sum := 0
  repeat i from 0 to n-1
    t := long[arrptr][i] * scale
    long[arrptr][i] := t
    sum += t
  answer := sum

'' here's the getter
PUB getAnswer
  return answer

Now on the Spin COG side we can launch the computation with the scaleArray method, then do some work and come back later to get the answer with getAnswer. If instead of answer := sum we had done return sum or result := sum then scaleArray would have caused the Spin COG to wait for the result.

A More Real World Example

''
'' Trivial Serial port written in Spin
''
'' this is for a very simple serial port
'' (transmit only, and only on the default pin for now)
''
CON
  txpin = 30
  
VAR
  long bitcycles
   
PUB start(baudrate)
  bitcycles := clkfreq / baudrate
  return 1
  
PUB tx(c) | val, nextcnt
  OUTA[txpin] := 1
  DIRA[txpin] := 1

  val := (c | 256) << 1
  nextcnt := CNT
  repeat 10
     waitcnt(nextcnt += bitcycles)
     OUTA[txpin] := val
     val >>= 1

PUB str(s) | c
  REPEAT WHILE ((c := byte[s++]) <> 0)
    tx(c)

This simple serial port, in the original Spin, works correctly and is fine for output at low speeds. But if you convert it to PASM then it will transmit at 921600 baud, and perhaps higher -- I haven't actually tried to see how high it will go.

Granted, we already have FullDuplexSerial.spin and a host of other serial ports. But this TrivialSerial is really easy to understand and to modify, so if you need to change something to communicate with a non-standard device, well, this would be easier than trying to tweak somebody else's PASM code

.

The Generated Code

The .cog.spin produced by fastspin is somewhat readable, and we can look in it to see how the compiler does. For example, the code generated for the tx method of the trival serial driver looks like:

pasm_Tx
    movs    _Tx_ret, #doreturn
_Tx
    mov _Tx__mask_0001, imm_1073741824_
    or  OUTA, imm_1073741824_
    or  DIRA, imm_1073741824_
    or  arg1, #256
    shl arg1, #1
    mov _Tx_Val, arg1
    mov _Tx_Nextcnt, CNT
    mov _Tx__idx__0002, #10
L__0021
    rdlong  Tx_tmp002_, objptr
    add _Tx_Nextcnt, Tx_tmp002_
    mov arg1, _Tx_Nextcnt
    waitcnt arg1, #0
    shr _Tx_Val, #1 wc
    muxc    OUTA, _Tx__mask_0001
    djnz    _Tx__idx__0002, #L__0021
_Tx_ret
    ret

which isn't too bad. We could improve it by keeping the bitcycles counter in a local variable, which would avoid the read from memory in the inner loop. Or we could tweak the generated assembly by hand.

Behind the Scenes

Some other things you will note in the .cog.spin file:

The compiler guesses at the size of stack needed for the object and adds a constant __STACK_SIZE at the top of the file to say how big to make it (in longs). This is a conservative guess, and if memory is tight you may be able to reduce this size. Conversely, if you have recursive functions you may find you need more space and can increase it here. If you add a definition for __STACK_SIZE to your original .spin file, fastspin will notice it and use that value instead of the one it calculates. (__STACK_SIZE is basically the same as the Spin constant _STACK, but is used for PASM code instead of bytecode.)
There's also a define for __MBOX_SIZE, giving the size of the mailbox used to communicate with the COG. This one is exact, don't try to change it.
The first two items in the mailbox are a global lock and a function code. The global lock is to make sure only one Spin COG at a time uses the PASM COG. This feature hasn't been tested very much yet. The function code indicates what function the PASM COG is running. If it is 0 it means the PASM COG is idle and is able to accept new commands.

jmg · 2018-04-17 02:39

ersmith wrote: »

I'd love to get feedback on this.

Looks very nifty.
general comments :
Can you add the source, to the generated PASM as comments, and add a header that says ThisFile Autogenerated by.. PgmName VersionNumber date [param list]
The included files do not seem to exactly match the file names ? That makes it a little bit harder for someone to follow what files are being included where...
What are the limits of this - do you need 1 master COG, for up to 7 PASM COGs ?
How does this compare with PropBASIC ?
eg Could you port this, and compare ? https://forums.parallax.com/discussion/123170/propbasic-reciprocal-frequency-counter-0-5hz-to-40mhz-40mhz-now

How can you launch 7 (8?) copies of the same code, but using different pins, in this flow ?

ersmith · 2018-04-17 09:33

jmg wrote: »

Can you add the source, to the generated PASM as comments, and add a header that says ThisFile Autogenerated by.. PgmName VersionNumber date [param list]

Thanks, that's an excellent idea. I'll add that before the final release.

The included files do not seem to exactly match the file names ? That makes it a little bit harder for someone to follow what files are being included where...

Sorry, I don't quite follow this. I left off the .spin extensions on the OBJ includes; is that what you mean? Or did I introduce some typo in the text somewhere?

What are the limits of this - do you need 1 master COG, for up to 7 PASM COGs ?

This specific mechanism is intended to work in conjunction with a COG running Spin bytecode, so yeah, you'd generally have a master COG and 7 slaves. Of course you could use fastspin (without the -w flag) to compile the master COG too and have it running COG PASM or LMM.

Also, the "slaves" can do anything you can do in Spin, including starting other COGs, so you could end up with a pretty complicated topology if you wanted.

Finally there is inter-COG locking, so in principle you could have multiple Spin COGs using the same PASM COG (e.g. a floating point server for some exotic floating point format). That hasn't been tested very much yet, but it should theoretically work.

How does this compare with PropBASIC ?
eg Could you port this, and compare ? https://forums.parallax.com/discussion/123170/propbasic-reciprocal-frequency-counter-0-5hz-to-40mhz-40mhz-now

Comparison with PropBASIC:

(1) Language: Spin vs. BASIC. That's a flavor question, although PropBASIC does have a some restrictions (e.g. on expression syntax and making you explicitly add WRLONG and RDLONG) to make compilation easier, so I personally find Spin a lot easier to write.

(2) Ease of interface to Spin: I think fastspin -w is easier here, because it generates wrappers for all the PUB methods in the Spin you convert, so the .cog.spin gets used in exactly the same way as the original .spin. For drivers with a single entry point that run in a loop the two will be roughly the same.

(3) Speed of generated code: fastspin does a number of optimizations, such as function inlining, dead code elimination, common subexpression elimination, loop strength reduction, and some peephole optimizations. So generally it will be faster than PropBASIC (sometimes much faster). A careful programmer can probably do most of those optimizations by hand, of course.

(4) Readability of generated code: PropBASIC has the edge here; the language is simplified enough so that there's a pretty obvious mapping between the source code and generated PASM, and the PropBASIC output is well commented. I can easily add comments to the PASM output of fastspin (in fact the spinconvert.exe tool already has that option) but because the optimizer can rename variables, move code around, and inline or remove functions, the original source won't line up easily with the generated PASM.

The frequency counter object you pointed to seems to use the hardware counters, so I don't think the performance will be much different in any language. Is there already a Spin version of it somewhere? If so we could just compile that.

I've looked around at benchmarks, and there don't seem to be many for PropBASIC. Heater has a version of fftbench for PropBASIC LMM, but it produces incorrect output so I'm suspicious of it. For what it's worth the timing results for fftbench are:

fastspin 3.7.0 LMM:      170709 us
PropBasic LMM:           690842 us
spin interpreter:       1465321 us

How can you launch 7 (8?) copies of the same code, but using different pins, in this flow ?

This is actually easy, but I see I oversimplified the blinker.spin object in the sample documentation. The pin number and delay don't have to be CON, they could just as well be parameters. This whole mechanism is designed to work with *any* Spin code you can write. So the blink object could be:

'' blinker.spin
PUB run(pin, pausetime)
  DIRA[pin] := 1
  repeat
    OUTA[pin] ^= 1
    waitcnt(CNT+pausetime)

Then you can instantiate multiple copies of the object, with different parameters So for example:

''
'' blink LEDs in multiple COGs
''
CON
  _clkmode = xtal1 + pll16x
  _clkfreq = 80_000_000
  basepin = 0
  basepause = 20_000_000
  numcogs = 6 '' leave room for FDS and Spin
  
OBJ
  fds: "FullDuplexSerial"
  blink[numcogs]: "blinker.cog"
  
PUB demo | id, i
  fds.start(31, 30, 0, 115200)
  repeat i from 0 to numcogs-1
    id := blink[i].__cognew
    blink[i].run(basepin + i, basepause * i)
    fds.str(string("blink running in cog "))
    fds.dec(id)
    fds.tx(13)
    fds.tx(10)
  repeat

yeti · 2018-04-17 12:09

$ cat Makefile 
FASTSPIN = /opt/parallax/bin/fastspin
SPINSIM  = /opt/parallax/bin/spinsim

all: mandelbrot32demo.binary mandelbrot32demo.log

mandelbrot32demo.binary: mandelbrot32demo.spin mandelbrot32.cog.spin
        $(FASTSPIN) -O mandelbrot32demo.spin

mandelbrot32.cog.spin: mandelbrot32.spin SimpleSerial.cog.spin
        $(FASTSPIN) -O -w $<

SimpleSerial.cog.spin: SimpleSerial.spin
        $(FASTSPIN) -O -w $<

%.log: %.binary
        $(SPINSIM) $< -b | tee $@

clean:
        rm -rf *.binary *.cog.spin *.pasm
$ make
/opt/parallax/bin/fastspin -O -w SimpleSerial.spin
Propeller Spin/PASM Compiler 'FastSpin' (c) 2011-2018 Total Spectrum Software Inc.
Version 3.7.0-beta Compiled on: Apr 17 2018
SimpleSerial.spin
/opt/parallax/bin/fastspin -O -w mandelbrot32.spin
Propeller Spin/PASM Compiler 'FastSpin' (c) 2011-2018 Total Spectrum Software Inc.
Version 3.7.0-beta Compiled on: Apr 17 2018
mandelbrot32.spin
|-SimpleSerial.cog.spin
/opt/parallax/bin/fastspin -O mandelbrot32demo.spin
Propeller Spin/PASM Compiler 'FastSpin' (c) 2011-2018 Total Spectrum Software Inc.
Version 3.7.0-beta Compiled on: Apr 17 2018
mandelbrot32demo.spin
|-mandelbrot32.cog.spin
mandelbrot32demo.pasm
Done.
Program size is 2976 bytes
/opt/parallax/bin/spinsim mandelbrot32demo.binary -b | tee mandelbrot32demo.log
!!!!!!!!!!!!!!!"""""""""""""####################################""""""""""""""""
!!!!!!!!!!!!!"""""""""#######################$$$$$$$%'0(%%%$$$$$#####"""""""""""
!!!!!!!!!!!"""""""#######################$$$$$$$$%%%&&(++)++&$$$$$$$######""""""
!!!!!!!!!"""""#######################$$$$$$$$$$%%%%&')*A;/*('&%%$$$$$$#######"""
!!!!!!!!""""#####################$$$$$$$$$$%%%&&&''),AAAAAA@+'&%%%%%$$$$########
!!!!!!!"""####################$$$$$$$$%%%&'())((())*-AAAAAA.+))('&&&&+&%$$######
!!!!!!""###################$$$$$%%%%%%&&&'+.AAA08AAAAAAAAAAAAAAA/+,A//A)%%$#####
!!!!!"################$$$%%%%%%%%%%&&&&')-+7AAAAAAAAAAAAAAAAAAAAAAAAA4(&&%$$####
!!!!"##########$$$$$%%&(,('''''''''''((*-5AAAAAAAAAAAAAAAAAAAAAAAAAAA3+)4&%$$###
!!!!####$$$$$$$$%%%%%&'(*-A1.+/A-4+))**AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3+'&%$$$##
!!!!#$$$$$$$$$%%%%%%'''++.7AAAAAAAAA9/0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA<6'%%$$$$#
!!!#$$$$$$$%&&&&''().-2.6AAAAAAAAAAAAA>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'&%%$$$$#
!!!378<@AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2+)'&&%%$$$$#
!!!#$$$$$$$%&&&&''().-2.6AAAAAAAAAAAAA>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'&%%$$$$#
!!!!#$$$$$$$$$%%%%%%'''++.7AAAAAAAAA9/0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA<6'%%$$$$#
!!!!####$$$$$$$$%%%%%&'(*-A1.+/A-4+))**AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3+'&%$$$##
!!!!"##########$$$$$%%&(,('''''''''''((*-5AAAAAAAAAAAAAAAAAAAAAAAAAAA3+)4&%$$###
!!!!!"################$$$%%%%%%%%%%&&&&')-+7AAAAAAAAAAAAAAAAAAAAAAAAA4(&&%$$####
!!!!!!""###################$$$$$%%%%%%&&&'+.AAA08AAAAAAAAAAAAAAA/+,A//A)%%$#####
!!!!!!!"""####################$$$$$$$$%%%&'())((())*-AAAAAA.+))('&&&&+&%$$######
!!!!!!!!""""#####################$$$$$$$$$$%%%&&&''),AAAAAA@+'&%%%%%$$$$########
!!!!!!!!!"""""#######################$$$$$$$$$$%%%%&')*A;/*('&%%$$$$$$#######"""
!!!!!!!!!!!"""""""#######################$$$$$$$$%%%&&(++)++&$$$$$$$######""""""
!!!!!!!!!!!!!"""""""""#######################$$$$$$$%'0(%%%$$$$$#####"""""""""""
!!!!!!!!!!!!!!!"""""""""""""####################################""""""""""""""""
85406600 ticks

Do we have boxing day already?
:-D

mpark · 2018-04-17 15:43

Amazing work, @ersmith! Kudos!

ersmith · 2018-04-17 15:47

@jmg: I just noticed that the option to include the original source as comments already exists in fastspin (it's -g). So for now you can use "fastspin -w -g foo.spin" to convert foo.spin to foo.cog.spin and get the comments you asked for. I'll make this the default in future versions of fastspin. Thanks for the suggestion.

ersmith · 2018-04-17 15:51

@yeti: Glad you're having fun

. Here's a version of the mandelbrot demo that uses 4 cogs. Output is:

!!!!!!!!!!!!!!!"""""""""""""####################################"""""""""""""""
!!!!!!!!!!!!!"""""""""#######################$$$$$$$%'0(%%%$$$$$#####""""""""""
!!!!!!!!!!!"""""""#######################$$$$$$$$%%%&&(++)++&$$$$$$$######"""""
!!!!!!!!!"""""#######################$$$$$$$$$$%%%%&')*A;/*('&%%$$$$$$#######""
!!!!!!!"""####################$$$$$$$$%%%&'())((())*-AAAAAA.+))('&&&&+&%$$#####
!!!!!!""###################$$$$$%%%%%%&&&'+.AAA08AAAAAAAAAAAAAAA/+,A//A)%%$####
!!!!!"################$$$%%%%%%%%%%&&&&')-+7AAAAAAAAAAAAAAAAAAAAAAAAA4(&&%$$###
!!!!"##########$$$$$%%&(,('''''''''''((*-5AAAAAAAAAAAAAAAAAAAAAAAAAAA3+)4&%$$##
!!!!####$$$$$$$$%%%%%&'(*-A1.+/A-4+))**AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3+'&%$$$#
!!!!#$$$$$$$$$%%%%%%'''++.7AAAAAAAAA9/0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA<6'%%$$$$
!!!#$$$$$$$%&&&&''().-2.6AAAAAAAAAAAAA>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'&%%$$$$
!!!378<@AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2+)'&&%%$$$$
!!!#$$$$$$$%&&&&''().-2.6AAAAAAAAAAAAA>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'&%%$$$$
!!!!#$$$$$$$$$%%%%%%'''++.7AAAAAAAAA9/0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA<6'%%$$$$
!!!!####$$$$$$$$%%%%%&'(*-A1.+/A-4+))**AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3+'&%$$$#
!!!!"##########$$$$$%%&(,('''''''''''((*-5AAAAAAAAAAAAAAAAAAAAAAAAAAA3+)4&%$$##
!!!!!"################$$$%%%%%%%%%%&&&&')-+7AAAAAAAAAAAAAAAAAAAAAAAAA4(&&%$$###
!!!!!!""###################$$$$$%%%%%%&&&'+.AAA08AAAAAAAAAAAAAAA/+,A//A)%%$####
!!!!!!!"""####################$$$$$$$$%%%&'())((())*-AAAAAA.+))('&&&&+&%$$#####
!!!!!!!!""""#####################$$$$$$$$$$%%%&&&''),AAAAAA@+'&%%%%%$$$$#######
!!!!!!!!!"""""#######################$$$$$$$$$$%%%%&')*A;/*('&%%$$$$$$#######""
!!!!!!!!!!!"""""""#######################$$$$$$$$%%%&&(++)++&$$$$$$$######"""""
!!!!!!!!!!!!!"""""""""#######################$$$$$$$%'0(%%%$$$$$#####""""""""""
!!!!!!!!!!!!!!!"""""""""""""####################################"""""""""""""""
!!!!!!!!!!!!!!!!!"""""""""""""""""""""################"""""""""""""""""""""""""
!!!!!!!!!!!!!!!!!!!""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
!!!!!!!!!!!!!!!!!!!!!!"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
!!!!!!!!!!!!!!!!!!!!!!!!!!"""""""""""""""""""""""""""""""""""""""""""""""""""""
26804416 ticks

It was pretty easy to convert. I split the calculation code up so it did one line at a time and stored the results in a buffer instead of printing them. Now the main program loops through the lines, starting the calculation up in the background then fetching the results and printing them while the next line is being processed.

Hmmm, I see I have an off by one error in my loop code and went 4 lines too far. Sorry about that

ersmith · 2018-04-17 15:52

mpark wrote: »

Amazing work, @ersmith! Kudos!

Thanks! Hope you find it useful, and let me know if you run into any issues.

Mickster · 2018-04-18 17:38

Wow such a feeble response to what is clearly a significant development!

I have only ever used PropBasic and hope to heck that we have a P2 version.

But this is too good not to try...what exactly do I need besides the Prop?

avsa242 · 2018-04-18 23:18

I can attest to its performance - I tried it out on a project I've been working on lately that has a big mix of SPIN and PASM. Using fastspin, it absolutely screws compared to using any of the other traditional compiling methods...definitely well done!

Cheers,
Jesse

Dave Hein · 2018-04-19 00:19

Great work, Eric! This will allow people to write Spin projects completely in Spin -- even down to the low-level peripheral drivers. PASM is required when every single cycle needs to be accounted for, but there are many cases where tight timing is not required, and Spin compiled to PASM will work just fine. There's a lot of value in writing video, serial, I2C and SPI drivers in Spin. It is easier for the general population of programmers to understand and modify.

I think a lot of people will use this new feature of fastspin. However, a lot of people aren't comfortable with using command-line programs. For general acceptance, an IDE with a GUI is needed. Maybe the PropellerIDE can be adapted to run fastspin, and use all it's features.

jmg · 2018-04-19 00:28

Dave Hein wrote: »

... However, a lot of people aren't comfortable with using command-line programs. For general acceptance, an IDE with a GUI is needed. Maybe the PropellerIDE can be adapted to run fastspin, and use all it's features.

Any IDE worthy of the name, should be able to launch external batch files. Even general editors can do that.

ersmith · 2018-04-19 02:29

Mickster wrote: »

I have only ever used PropBasic and hope to heck that we have a P2 version.

But this is too good not to try...what exactly do I need besides the Prop?

All you need for these demos is some way to run Spin programs. How are you using PropBasic? Do you use PropellerIDE? If so you should be fine. The only place you need the command line is to convert the .spin object you want to run in a COG into .cog.spin.

The .spin that "fastspin -w" generates does sometimes use @@@ (if you're using strings, for example) so bstc is probably the best compiler to use for it. (Edit: the latest version of fastspin at the top of this thread should no longer use @@@, so all Spin compilers should work with it) Or, you can use fastspin itself if the program isn't too large.

"fastspin -w" produces a wrapped Spin object ready to run in another COG and to be used from a main .spin. (The -w stands for wrapped.) It's intended as a helper for creating device drivers.

"fastspin"without the -w compiles Spin to LMM binary. So it's output is generally pretty large. But if you can live with that restriction, it can replace openspin or bstc.

Eric

ersmith · 2018-04-19 02:34

Dave Hein wrote: »

I think a lot of people will use this new feature of fastspin. However, a lot of people aren't comfortable with using command-line programs. For general acceptance, an IDE with a GUI is needed. Maybe the PropellerIDE can be adapted to run fastspin, and use all it's features.

Thanks, Dave. Yes, it would be nice to have it integrated into a GUI. I've actually thought that the PASM conversion could perhaps be integrated into the Spin language. So for example if you say:

OBJ
  A: "Fibo.spin"
COGOBJ
  B: "Fibo.spin"

you could get both the bytecode version in A and the PASM version in B. Of course you'd have to explicitly do a cognew on B somewhere. We'd have to change openspin to understand COGOBJ (or have some other way to mark an object as to be run in another COG) and to call out to fastspin to do the conversion. Or, we could perhaps integrate fastspin as a separate module in openspin.

Does this make sense? I'm not sure I've explained it well.

jmg · 2018-04-19 03:41

ersmith wrote: »

jmg wrote: »

The included files do not seem to exactly match the file names ? That makes it a little bit harder for someone to follow what files are being included where...

Sorry, I don't quite follow this. I left off the .spin extensions on the OBJ includes; is that what you mean? Or did I introduce some typo in the text somewhere?

Yes, that makes the connections of actual filenames, to where they are used more cryptic

I presume this is also supported, ( or could be) ?

OBJ
  ser: "FullDuplexSerial.spin"
  bytecode: "Fibo.spin"
  pasm: "Fibo.cog.spin"

That's now more like usual includes, where the full file name is visible.

Where openspin is used like a mini-linker, can it easily check the file stamps / eg so if Fibo.spin (source) is newer than fastpin's (compiled) output, it issues a warning line ?
I'd expect most users to 'build all', but because this will work without that, a simple protection check could be worth having.

ersmith · 2018-04-19 17:02

jmg wrote: »
ersmith wrote: »

Sorry, I don't quite follow this. I left off the .spin extensions on the OBJ includes; is that what you mean? Or did I introduce some typo in the text somewhere?

Yes, that makes the connections of actual filenames, to where they are used more cryptic

I presume this is also supported, ( or could be) ?
OBJ
  ser: "FullDuplexSerial.spin"
  bytecode: "Fibo.spin"
  pasm: "Fibo.cog.spin"
That's now more like usual includes, where the full file name is visible.

I think all the Spin compilers (PropTool, openspin, bstc, fastspin, etc.) support both forms (with and without the .spin extension). So feel free to use whichever version you prefer. I left the .spin off because I found it highlighted the difference between regular objects and the "cog" objects that need to have their __cognew method called before use.

Where openspin is used like a mini-linker, can it easily check the file stamps / eg so if Fibo.spin (source) is newer than fastpin's (compiled) output, it issues a warning line ?

This would be nice. Even nicer would be to have fastspin integrated into openspin so if it sees

OBJ
   pasm: COGNEW "Fibo.spin"

it would know to go and compile Fibo.spin to Fibo.cog.spin and then use that.

ersmith · 2018-04-19 17:38

I've updated the .zip file in the first post to have a new version of fastspin.exe that automatically adds the comments suggested by @jmg. It also has a fix so that instead of emitting @@@ for absolute addresses it uses @ and then fixes them up at runtime (in the first __cognew call). This should allow the output to work more often with PropTool and openspin.

For @mickster's benefit I've bundled David Betz's very handy propeller-load.exe program, so that everything you need to create programs is in the .zip file. For example, for the Fibonacci demo you would do:

fastspin -q -w Fibo.spin
fastspin -q -w SimpleSerial.spin
fastspin fibodemo.spin
propeller-load fibodemo.binary -r -t

The first two lines convert Fibo.spin and SimpleSerial.spin to their .cog equivalents (producing PASM objects that can run in another cog). The last line produces the demo. You could (and normally would) use "bstc -b" or "openspin" on the last line instead of "fastspin", but either one is fine for a demo this small.

ersmith · 2018-04-19 18:16

jmg wrote: »

Can you add the source, to the generated PASM as comments, and add a header that says ThisFile Autogenerated by.. PgmName VersionNumber date [param list]

That's done now.

How does this compare with PropBASIC ?
eg Could you port this, and compare ? https://forums.parallax.com/discussion/123170/propbasic-reciprocal-frequency-counter-0-5hz-to-40mhz-40mhz-now

Attached is a pretty much direct port. I don't have a good frequency source handy, so I had the original Spin COG wiggle the pin that the frequency counter COG is monitoring.

The PASM code generated by PropBASIC (see Freq Counter3.spin in the thread you referenced) and fastspin is pretty similar. I did notice a few things while doing the port. PropBASIC knows to convert 8*x to x<<3, but it doesn't change 10*x into (x<<1) + (x<<3), which fastspin does. (Bean did that optimization by hand in his source code.). OTOH PropBASIC has an easy way to get the remainder after division, whereas fastspin doesn't. PropBASIC has nice string handling and other functions built in. fastspin, on the other hand, allows arbitrary expressions in assignments and doesn't need you to say RDLONG and WRLONG to access HUB variables. So it's kind of a toss-up which one is "better" for COG code. Everyone will probably have his or her own preference.

For LMM code I don't think anything but PropGCC can beat fastspin, because AFAIK PropGCC is the only other compiler to implement FCACHE. With FCACHE small loops run at full COG speed rather than needing the LMM interpreter, which makes for a 5x difference in performance.

jmg · 2018-04-19 20:43

ersmith wrote: »

jmg wrote: »

Can you add the source, to the generated PASM as comments, and add a header that says ThisFile Autogenerated by.. PgmName VersionNumber date [param list]

That's done now.

That looks great, now anyone can download a .cog.spin file, and see what generated it, when, with what command line, and see the source used too

One artifact I noticed, is sometimes(mostly) the source line is before the code line, but sometimes after & Label was not at repeat ?
Not sure if that is deliberate, or hard to fix ?

one example

'       digitSum := 0                 ' Sum of all digits so far. Used to create replace leading zeros with spaces
	mov	_Freqcount_Digitsum, #0
	shl	_Freqcount_Sigcnt, #3
'       sigCnt := sigCnt * 8	    ' Scale signal count to get megaHertz digit

..and..
L__0028
	mov	muldiva_, _Freqcount_Sigcnt
	mov	muldivb_, _Freqcount_Cnttime
	call	#divide_
'       repeat
'         digit := sigCnt / cntTime	' calculate this digit of the result
	mov	_Freqcount_Digit, muldivb_

Attached is a pretty much direct port. I don't have a good frequency source handy, so I had the original Spin COG wiggle the pin that the frequency counter COG is monitoring.

The PASM code generated by PropBASIC (see Freq Counter3.spin in the thread you referenced) and fastspin is pretty similar. I did notice a few things while doing the port. PropBASIC knows to convert 8*x to x<<3, but it doesn't change 10*x into (x<<1) + (x<<3), which fastspin does. (Bean did that optimization by hand in his source code.). PropBASIC has nice string handling and other functions built in. fastspin, on the other hand, allows arbitrary expressions in assignments and doesn't need you to say RDLONG and WRLONG to access HUB variables. So it's kind of a toss-up which one is "better" for COG code. Everyone will probably have his or her own preference.

As long as there are no 'brick-walls', and this gives a very good, practical comparison.

ersmith wrote: »

OTOH PropBASIC has an easy way to get the remainder after division, whereas fastspin doesn't.

Is that easy to add ? - the second divide call will slow things down. Looks like the divide_ code already produces both results ?
eg if these two lines are right next to each other, it could decide to use one divide_ call ?
It would emit a comment explaining and maybe a pragma/sw is needed to disable, in case anyone really wants/needs 2 divide calls.

digit := sigCnt / cntTime ' calculate this digit of the result
sigCnt := sigCnt // cntTime 'remainder

Another suggestion

It looks like fastspin knows exactly how much code:data is used, can it report those values as comments ?
Some compilers include a % used or % free resource note too, which can be useful in packing code...

ie it could be something (roughly) like

Freqcount COG Footprint      -+--    CODE(L)               DATA 

lib:Init                      =      17                    0
lib:divide_                   =      18                    1
lib:TrivialSerial_Start       =      etc
lib:TrivialSerial_Tx          = 
lib:TrivialSerial_Str         = 

user:Freqcount                =      xxx                  yyy
user:Freqcount:String         =                           46 bytes
etc...
Total                         =      XXX         +        YYY   =  ZZZ  (75% of 496L)

Such mapping reports are very useful in figuring what base overhead is, and what user code is using, and how much room is left, in any MCU design !

I see these comments

CON
  outpin = 5  ' pin to toggle
  '' frequency to test at
  '' the regular Spin interpreter can't handle a very
  '' high frequency (20 kHz is about as high as it can go)
  '' fastspin LMM code can go as high as 1 MHz
  '' fastspin COG code can go to 10 MHz
  freq = 20_000
  pause = _clkfreq / (freq*2)

'' toggle 
PRI runtoggle | nextcnt
  DIRA[outpin] := 1
  nextcnt := cnt
  repeat
    nextcnt += pause
    waitcnt(nextcnt)
    OUTA[outpin] ^= 1

You could do a pin-toggle loop, without the wait, and use FreqCtr to give a precise loop-speed report of all 3 code generation choices.
That would also show P2's speeds, when it can run on P2.

dgately · 2018-04-19 20:56

ersmith wrote: »

For @mickster's benefit I've bundled David Betz's very handy propeller-load.exe program, so that everything you need to create programs is in the .zip file.

Could you please add David's propeller-load source as a sub-module within the spin2cpp github sources? Unfortunately, just placing linux & WIN version of the binaries within the github project source is not conducive macOS users and non-x86 linux OS users (such as: Raspian, other ARM linux OS's).

Thanks,
dgately

ersmith · 2018-04-20 01:50

jmg wrote: »

One artifact I noticed, is sometimes(mostly) the source line is before the code line, but sometimes after & Label was not at repeat ?
Not sure if that is deliberate, or hard to fix ?

It's not deliberate, and it is hard to fix

. The problem is that fastspin has multiple passes, all of which can transform code and move it around. With a single pass compiler like PropBASIC it's pretty clear what code goes with which source statements. Once code generation and parsing are separated, and especially when optimization gets involved, things get murkier.

For example, this code:

VAR
  long a[10]
  
' set contents of a to lesser of b and c
PUB setarray(b, c) | i
  repeat i from 0 to 9
    a[i] := b <# c

gets transformed into

_setarray
        maxs    arg1, arg2
        mov     _var_05, objptr
        mov     _var_03, #10
L__0003
        wrlong  arg1, _var_05
        add     _var_05, #4
        djnz    _var_03, #L__0003
_setarray_ret
        ret

As you can see, the "b <# c" got hoisted out of the loop, and the array index calculation got turned into a pointer dereference (with a new temporary _var_05 introduced as the pointer). The loop got turned around too, and the index counts down from 10 instead of up from 0 (so we can use djnz).

That's kind of an extreme example, but it gives an idea of the difficulty of keeping the source lines straight

.

ersmith wrote: »

OTOH PropBASIC has an easy way to get the remainder after division, whereas fastspin doesn't.

Is that easy to add ? - the second divide call will slow things down. Looks like the divide_ code already produces both results ?

In principle it should be do-able. I'm trying to figure out how to get it done in practice.

It looks like fastspin knows exactly how much code:data is used, can it report those values as comments ?
Some compilers include a % used or % free resource note too, which can be useful in packing code...

Thanks, that's a great suggestion. I'll add it to my TODO list.

ersmith · 2018-04-20 01:53

dgately wrote: »

ersmith wrote: »

For @mickster's benefit I've bundled David Betz's very handy propeller-load.exe program, so that everything you need to create programs is in the .zip file.

Could you please add David's propeller-load source as a sub-module within the spin2cpp github sources?

Unfortunately, just placing linux & WIN version of the binaries within the github project source is not conducive macOS users and non-x86 linux OS users (such as: Raspian, other ARM linux OS's).

I was just referring to the .zip I posted at the beginning of this thread. I wasn't really planning to add propeller-load to the fastspin distribution, although if I do then it probably does make sense to build the submodule in the tree, as you suggested.

Thanks,
Eric

PS: So far I think the last fastspin release (3.6.6) has had all of 8 downloads, and of those I'd be surprised if any are Mac or Linux users

David Betz · 2018-04-20 02:02

ersmith wrote: »

dgately wrote: »

ersmith wrote: »

For @mickster's benefit I've bundled David Betz's very handy propeller-load.exe program, so that everything you need to create programs is in the .zip file.

Could you please add David's propeller-load source as a sub-module within the spin2cpp github sources?

Unfortunately, just placing linux & WIN version of the binaries within the github project source is not conducive macOS users and non-x86 linux OS users (such as: Raspian, other ARM linux OS's).

I was just referring to the .zip I posted at the beginning of this thread. I wasn't really planning to add propeller-load to the fastspin distribution, although if I do then it probably does make sense to build the submodule in the tree, as you suggested.

Thanks,
Eric

PS: So far I think the last fastspin release (3.6.6) has had all of 8 downloads, and of those I'd be surprised if any are Mac or Linux users

You might want to use proploader instead of propeller-load. That's what Parallax has been using lately and it knows how to do wireless loading using the WX wi-fi module.

yeti · 2018-04-20 07:08

dgately wrote: »

Could you please add David's propeller-load source as a sub-module within the spin2cpp github sources? Unfortunately, just placing linux & WIN version of the binaries within the github project source is not conducive macOS users and non-x86 linux OS users (such as: Raspian, other ARM linux OS's).

I don't think this is a good idea. Adding a loader to every code generating tool's source tree just complicates everything and yields duplicate installs or worse.

And for the ones not building from source, a pointer where a loader binary can be found probably is enough.

Documenting what exists and where to find it would be all what's needed in this context.

ersmith wrote: »

PS: So far I think the last fastspin release (3.6.6) has had all of 8 downloads, and of those I'd be surprised if any are Mac or Linux users

Hmmm...
Maybe lots of the Linux users are able to type "git pull && make" and prefer binaries to be built on their own digital soil?

In my digital neighbourhood installing binaries not built on or for the installed distribution is a no-no. So most of them prefer sources.

But no Linux downloads at all remains a surprise.

dgately · 2018-04-20 17:45

OK, I agree on the above posts (ersmith, dbetz & yeti)! I'll just use proploader or propeller-load that I build from their respective github sources.

The only issue might be that the linux binary that is currently embedded in the spin2cpp github sources could be mistaken for some other x86 binary (macOS, mostly).

Personally, I think binaries in source trees should be avoided, when possible.

Thanks,
dgately

yeti · 2018-04-20 18:25

dgately wrote: »

The only issue might be that the linux binary that is currently embedded in the spin2cpp github sources could be mistaken for some other x86 binary (macOS, mostly).

On a "pure 64bit" Debian9 it doesn't even run:

$ uname -a
Linux kumari 4.15.0-0.bpo.2-amd64 #1 SMP Debian 4.15.11-1~bpo9+1 (2018-04-07) x86_64 GNU/Linux
$ pwd
/opt/spin2cpp/src/spin2cpp/propeller-load
$ ls -l
total 296
-rwxr-xr-x 1 yeti yeti  98236 Mar 29  2016 propeller-load
-rwxr-xr-x 1 yeti yeti 202658 Mar 30  2016 propeller-load.exe
$ ./propeller-load
bash: ./propeller-load: No such file or directory
$ ldd propeller-load
        not a dynamic executable
$ file propeller-load
propeller-load: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=b39abe06c2e0910af478afbf39fdc10a49900b91, not stripped

Not everyone has 32bit binary compatibility installed on 64bit systems.

dgately wrote: »

Personally, I think binaries in source trees should be avoided, when possible.

Yes!

Let's find a way to doucument what can be found where and then pointers (in)to this are all what needs to be included in such a situation.

jmg · 2018-04-20 19:16

dgately wrote: »

Personally, I think binaries in source trees should be avoided, when possible.

If you mean literally in a source directory, then I'd agree.

However, compiled binaries do need to be somewhere easy to find, (eg in a /bin area) as users do NOT want to have to wade thru all the build-themselves dance-steps.
If I find someone's project where they did not provide binaries, usually I just move on, as I just don't have the time or patience to reinvent build quirks....

The best projects include an example directory, where you get a full copy of input and output files, that means you can see what to expect, and easily check if that is what you need.

yeti · 2018-04-20 19:20

jmg wrote: »

However, compiled binaries do need to be somewhere easy to find, (eg in a /bin area) as users do NOT want to have to wade thru all the build-themselves dance-steps.

So I am not a user.
What am I then?

...but I'm not against downloadable binaries in archives side by side to sources...

jmg · 2018-04-20 19:30

yeti wrote: »

jmg wrote: »

However, compiled binaries do need to be somewhere easy to find, (eg in a /bin area) as users do NOT want to have to wade thru all the build-themselves dance-steps.

So I am not a user.
What am I then?

Of course, with sources there, everyone has the obvious and implicit ability to build, if that is 'their thing' - 100% their choice.

The point is most users want something 'that just works', with minimal effort, and minimal total downloads.
Download of source and all the correct matching compiler resource(s) is non trivial, which is why binaries make sense.

Heater. · 2018-04-20 19:31

Binaries should never be in any part of a source code repository. It's a "source code" repository after all.

But yes, it's a good idea if that source code repo contains links to pre-compiled binaries, for whatever platforms where available.

Those binaries could be on the same site as the source, Github, bit bucket, etc, or on some other server.

https://help.github.com/articles/creating-releases/

yeti · 2018-04-20 19:35

Having sources and binares packed as different downloads sure is ok.

And I hope we'll find a way to document what's available where to avoid situations like distributing a loader with every code generating tool. Explaining where to download the loader(s) should be enough even for the average user.

I really think it all boils down to better documentation of all the tools and providing at least an uptodate collection of pointers to the sources and downloads.

Automatically converting Spin objects to PASM

Comments