Shop OBEX P1 Docs P2 Docs Learn Events
beginner : about Spin interpreter — Parallax Forums

beginner : about Spin interpreter

henrib75henrib75 Posts: 9
edited 2012-05-22 13:40 in Propeller 1
 
Hello,

I'm starting in Parallax Propeller and I have a lot of questions on the Spin interpreter ... I've practiced programming microcontroler (Intel) and know some languages ​​(C, x86 ASM, Basic, ActionScript, etc. ...). I understand that the interpreter would sit in a Cog RAM and reads instructions (and interprets) in the Main RAM. But as this is dependent on the Ram Main Hub, a Cog can read the RAM only when it's his turn? The interpreter spends more time waiting than execute an instruction?
or I'm wrong completely in the operation? If I am right, this is not the worst speed reducer for this interpreter ? There is not a "buffer" (in Cog Ram) that avoids reading a single instruction at a time? particularly
in the loops?
 
THANK YOU !
 
Henri Blum
site : www.opium-bleu.com
blog : www.HenriBlum.com
 
ps: I'm French, so excuse my English ...

Comments

  • Mark_TMark_T Posts: 1,981
    edited 2012-05-22 08:44
    Since the timing is completely deterministic on the Prop you can endevour to prevent unnecessary waiting - a hub ram access takes 8 to 23 cycles. This means if you get 4n+2 cog instructions between two hub instructions then any wait for the first one will synchronize things perfectly for the next one. Cog instructions are basically all 4 clocks, so there are 4 cog instructions per hub-round-robin time (16 cycles), and 2 cog instructions + 1 correctly synchronized hub instruction is 16 cycles... By cunning and re-ordering instructions you try and get this to work for you whenever possible.

    The maximum rate of transfer to/from hub ram at 80MHz is thus one long every 200ns, or 20MB/second. The maximum rate of cog instruction execution is 20MIPs - not too mismatched in my opinion.
  • Heater.Heater. Posts: 21,230
    edited 2012-05-22 09:03
    I doubt the interpreter spends more time waiting for hub than executing the byte codes.
    Don't forget the interpreter has to spend some time decoding and executing the byte code so there is plenty to do whilst waiting for the next hub access slot to come around.
    The Spin interpreter is published here somewhere so yo can have a look at it and see how it may be optimized.
    I suspect you will have a hard time improving on Chips implementation though.
  • Mike GreenMike Green Posts: 23,101
    edited 2012-05-22 09:11
    Also, the Spin interpreter is stack-based and the stack (and all variables) are also stored in the hub RAM, so there's a lot of data being accessed in the hub RAM, not just the instruction codes. As Mark_T mentioned, with some care (not difficult at all), these hub accesses can be optimized so there's essentially no waiting. There is one wasted clock cycle per hub access which occurs every 16 clock cycles once the cog becomes synchronized with the hub (7 cycles for the hub access, 2 x 4 for other instructions + 1 "wasted").
  • Duane DegnDuane Degn Posts: 10,588
    edited 2012-05-22 11:53
    Heater. wrote: »
    I suspect you will have a hard time improving on Chips implementation though.

    Didn't Cluso99 make some improvements?

    I thought I read (here on the forum) that he was able to increase the interpreters speed a bit.
  • Mike GreenMike Green Posts: 23,101
    edited 2012-05-22 12:08
    I believe Cluso99's version used the time-honored tradeoff of space vs. speed and made use of some space in hub memory to allow for both a speedup of the interpreter and the addition of some features. On the other hand, Chip's version fits completely in cog memory.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-05-22 13:40
    My version of Chip's interpreter runs about 20% faster but requires a lookup table (vector table) of IIRC 1KB of hub ram to decode the instructions. The remainder of the code is still contained in the cog.

    The interpreter implements a great set of byte-codes, so for every bytecode a number of instructions are executed within the cog. IIRC it was ~50 instructions average, although I got the simple maths bytecodes down to ~25. As Mike said, the stacks are in hub, so this also adds to hub delays. However, from my working with the interpreter, hub is not really the bottleneck. I manages to unravel Chips code because I stole 1KB of hub to do the instruction decoding - this saved both speed and instructions. I also built a zero-footprint (in cog) debugger that can trace all the instructions (either pasm and/or spin bytecodes).

    Having said all of this, the interpreter is quite slow compared to pasm. The advantage is small program space. Therefore, a mix of spin (where speed is not an issue) and pasm (where speed is required) is an ideal mix, and makes for easy code writing.

    I looked at reading longs rather than bytes, but in fact it takes more instructions to work this out, and of course takes valuable pasm instruction space. Much better results can be achieved in speeding up the interpreter code.
Sign In or Register to comment.