Code size: Spin Vs. Assembly ?

Rayman · 2007-11-30 13:55

Ok, so assembly is 80 to 200 times faster than Spin (so says deSilva).· But what about the relative size of the code to do the same things?

It appears to me that SPIN compiles to very small code, perhaps comparable in size to the equivalent assembly code.· But, I haven't looked at this in any detail.

Anybody looked at this?

Martin Hebel · 2007-11-30 14:01

Spin doesn't 'compile' at all. Tokens representing the Spin program are stored in main RAM. The Cog's interpreter fetches the next token, and processes it using 80 to 200 ASM instructions to perform the specified operation.

This is one reason why interpreted langauages can be argued to be better than compiled. It takes very little room to store the tokenized version of the program.

-Martin

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
SelmaWare Solutions - StampPlot GUI for controllers, XBee and Propeller Application Boards

Southern Illinois University Carbondale, Electronic Systems Technologies

hippy · 2007-11-30 14:42

I'd say Spin was compiled. At least in the sense that it turns source code into instruction set opcodes for the Spin Virtual Machine. In the same we would still call a compiler a compiler even if the instruction set of a processor is actually interpreted by microcode. Conversely of course we could argue that there's no such thing as a compilation for a microcoded processor either

I wouldn't really call the Spin bytecode "tokens" either, not in the same way that the Basic Stamp and other home computers of the 80's tokenised programs.

Anyway, back to the issue; Spin size versus Assembly size ... I don't think there's any easy comparison to be had in general.

Spin bytecode is quite efficiently optimised so common things like pushing zero or one to the stack is a single byte, and popping it to some global or local (stack) variable is also a byte, so that's a 2:1 improvement, more so if moving a number over 511 which requires two longs in Assembler but could be as short as three bytes in Spin. Then again it depends on the value of the number and where's it going; the advantage moves between the two.

There are some specialised opcodes which make Case and Repeat very space efficient, but then one would probably use different techniques in Assembler than in Spin. Spin isn't optimised, so an unknown 'poor choice' of coding style can have an impact on both space and execution time.

Trying to compare at the opcode level is near impossible ( they are very different architectures ), only in terms of particular tasks could a more realistic comparison be made IMO.

Spin appears to be more compact than Assembler but I'm not sure if that's simply because it has 32KB instead of 2KB to reside in. My gut feeling is that they both achieve the same code density overall.

Ken Peterson · 2007-11-30 15:25

Has anyone ever "compiled" a set of guidelines for manually optimizing SPIN?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Rayman · 2007-11-30 15:43

I would like to see optimization guide too! I had to trial-and-error find fast enough stuff for my wav player...

hippy · 2007-12-01 04:54

The problem there is determining what is optimal. Smallest code size, shortest or simplest looking instruction sequence may not necessarily mean fastest execution. My theory has been that shorter sequences are likely faster because the work is mainly being done in-cog rather than a longer sequence of bytecodes.

Things like var++, var~ look to be more efficient and faster than var+=1, var:=0 because the later push then pop while the former work with reference to a pointer so there should be a lot less interaction with hub memory going on. Accessing byteArray[noparse][[/noparse] i ] may be faster than using 'pointers' in Spin ( byte[noparse][[/noparse] @byteArray + i ] ) because there are specific opcodes to do indexing from an offset.

The compiler really is non-optimising so one instant saving is to get used to using 'constant'.

var := var + "A" - "0"

will do three pushes, an add, a subtract and a pop.

var += constant( "A" - "0" )

is a lot more efficient.

Likewise if using CON values to control flow ...

if PIN_NUMBER => 28

if constant( PIN_NUMBER => 28 )

Harrison. · 2007-12-01 06:36

Thanks for the reminder on using the constant() compiler pseudo function. I saved about 37 longs in my tcp stack just by doing that. I also saved a few longs by removing unneeded bitmasks in long->byte conversions.

deSilva · 2007-12-01 11:56

I posted this as an example half a year ago:
----

deSilva said...

It is not widely known that SPIN allows full recursion of calls! This can be emulated within an assembly program by installing an ad-hoc stack mechanism.
It will be instructive anyhow to have a look at the many "patches". Note that you never shall "patch" crossing JMPRETs (aka CALLs), as the code has to stay re-entrant!

The speed-up is about 40, which is not so overwhelming compared to the general speed-up from SPIN to handmade assembly (about 80 according to my experience) which discloses a very efficient stack management within SPIN!

This innocent looking piece of SPIN.....
PUB spinFibo(n)
  if n>2
     return spinFibo(n-1)+ spinFibo(n-2)
  else
     return 1
... has thus created this "assembly-monster":
DAT
fiboasm
' PAR shall contain a reference to 2 longs
'  [noparse][[/noparse] 0 ] Argument for fibo (0: result ready)
'  [noparse][[/noparse] 1 ] Result
 
    mov a, #$1ff
    add a, cnt
    waitcnt a,#0    ' save energy while idling
    rdlong a, par
    tjz a,#fiboasm
  ' organize a stack
    mov stackP, #stack
    
    jmpret retaddr, #fibo   ' result = fibo(a)

   ' result available
    mov a, par
    add a, #4
    wrlong result, a
    mov a,#0
    wrlong a, par
    jmp #fiboasm

fibo
' if a<3 return 1
    cmps a, #3   wc
    mov result, #1
    if_c jmp retaddr

    
    add stackP, #1   ' points to the LAST USED entry
    movd :f1, stackP
    add stackP, #1       
    movd :f2, stackP  
:f1 mov 0-0, retaddr ' push return address 
:f2 mov 0-0, a       ' push argument

    sub a, #1 
    jmpret retaddr, #fibo ' call fibo(a-1)

    movs :f3, stackP 
    movd :f4, stackP 
:f3 mov a, 0-0      ' get argument
                    '... and substitute by result
:f4 mov 0-0, result                    

    sub a, #2
    jmpret retaddr, #fibo  ' call fibo(a-2)         

' add both reults
    movs :f5, stackP
    sub stackP, #1
:f5 add result, 0-0  
    movs :f6, stackP   ' return to caller
    sub stackP, #1     ' adjust stack
:f6 jmp 0-0    


retaddr  res 1
result res 1
a  res 1

' The stack runs from lower to higher addresses; stackP always points to the last used entry!
stackP res 0    ' a litte bit over-optimized [img]http://forums.parallax.com/images/smilies/smile.gif[/img]
stack res 100     ' ... or as long as it will go
----
If you are interested in the general timing without trying yourself:
fibo(29) needed:
26 sec with SPIN
1.8 sec with PHP on my mid-range Windows Notebook
800 ms with the above posted piece of code
30 ms with a very efficient FORTH Implementation on my mid-range Windows Notebook
----
BTW: I am well aware that there are simple algorithms to compute the n-th Fibonacci number in o(1) - this is obviously not the point

The space efficiency is a matter of what has been "anticipated" in the design of the SPIN interpreter.
Recursive calls of procedures (or just decent calls of precedured in the first place) have.
Procedure variables (pointers to methods) have not...

There was a nice example the other day: "How to compute the parity of a word?". This is just ONE machine instruction. The SPIN code is awkward...

In this FIBONACCI example we need:

36 bytes for the SPIN byte code
152 bytes for machine code

This is most likely an extreme situation; however note that you have to augment your assembly program by many goodies (multiplication, division, searching,..) freely available in the SPIN COG... Such things add to size...

A most important consideration is how to work with algorithmically really complex programs... Their max size can be 8 k machine instructions, using a clever swapping technique; realistically it will not be more than 4k instructions.....

This will be also the bottleneck for future C-programs, until they provide a SD-card overlay swapping....

Code size: Spin Vs. Assembly ?

Comments