Propeller internals... shared memory, spin vs. asm etc...
ght_dak
Posts: 15
I've been trolling through the documents and forums but I guess I'm still not clear on some of the key elements of the inner workings of the prop chip and how it may affect high performance applications.
First, there is some discussion of how spin code works on multiple cogs... some documentation mentions that the spin interpreter is "copied" into the new cogs ram during cognew (is this true?).· Then, there is mention that the spin code is read from "main memory" during execution.· But, if the cog needs to read from main memory to get its spin code during runtime isn't there a memory contention issue?· Does each long of·spin code to·be interpreted·need to wait for its cog access time window?
This·would apply to variables as well.· If one kicks off a new cog with spin code, the variables are shared with the parent... does this mean that each read and write to this variable is also coordinated by the cog access control mechanism?
Which then begs the question of where does the speed improvement from ASM code come from?· Obviously, ASM doesn't need to be interpreted, so its gonna be a lot faster... but is there also seems to be·a substantial speed improvement because, most of the time, the cog is accessing cog local memory vs the 32K of main memory.
Is it simply true that since spin code is so much slower, that memory access for code and data is a relatively small issue?
And, finally, what is the conventional wisdom when it comes to just how much faster is ASM vs SPIN?
First, there is some discussion of how spin code works on multiple cogs... some documentation mentions that the spin interpreter is "copied" into the new cogs ram during cognew (is this true?).· Then, there is mention that the spin code is read from "main memory" during execution.· But, if the cog needs to read from main memory to get its spin code during runtime isn't there a memory contention issue?· Does each long of·spin code to·be interpreted·need to wait for its cog access time window?
This·would apply to variables as well.· If one kicks off a new cog with spin code, the variables are shared with the parent... does this mean that each read and write to this variable is also coordinated by the cog access control mechanism?
Which then begs the question of where does the speed improvement from ASM code come from?· Obviously, ASM doesn't need to be interpreted, so its gonna be a lot faster... but is there also seems to be·a substantial speed improvement because, most of the time, the cog is accessing cog local memory vs the 32K of main memory.
Is it simply true that since spin code is so much slower, that memory access for code and data is a relatively small issue?
And, finally, what is the conventional wisdom when it comes to just how much faster is ASM vs SPIN?
Comments
Spin needs to read code and variables from main (hub) memory, plus it needs to interpret the instructions. Of course the advantage is you are not limited to the 512 longs of RAM that each cog contains.
Bean.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
“The United States is a nation of laws -· poorly written and randomly enforced.” - Frank Zappa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
www.hittconsulting.com
·
I imagine that most of the time this wouldn't be a huge problem since we're waiting for clock countdowns to do our next thing, but it still is something to be considered for some applications.
2) Shared memory can be accessed "simultaneously".· The hub is designed to supply each cog with a "turn" to read or write the shared memory.· If a cog's "turn" hasn't come up yet, the cog is made to wait until its "turn" so there's no contention for the shared memory.· Code can be written so that, once a shared memory location is accessed, the cog stays in sync with the hub and this delay is minimized.
3) The ratio between native instruction speed and Spin operation speed is probably at around 80:1.· It depends on the specific operations involved, but that ratio is a good starting point.· Multiplication and division are done by subroutine, so that improves the ratio.· Other operations in Spin (like subscripting) may take several native instructions to implement (other than the interpreter overhead) so that also improves the average ratio.
4) I suspect that the Spin interpreter is very tightly optimized in terms of overhead (like the synchronization between cog and hub).· In any event, because it would be very hard to handle determinism using the execution times of byte codes, this is done in Spin·mostly by using the various WAITxxx instructions.· Tighter control of timing generally has to be done using assembly language.
Post Edited (Mike Green) : 6/30/2007 1:45:28 AM GMT