Assembly Oververlay Loader for Cog FAST (renamed & released)
Cluso99
Posts: 18,069
Editted 14 June 2008 - Solved in post below
I am trying to find a way to shift a variable length of longs between (to and from) cog and hub memory while maintaining the "sweet spot". I have looked at the LMM threads.
This code misses the sweet spot by one instruction and so takes 32 cycles instead of 16...
The same applies when moving data the other way (using wrlong instead of rdlong).
I can take into account if the pipeling is already loaded when the instruction is fetched if that helps.
So the challenge - can anyone find a way to replace the two adds (or subs) with a single instruction???? It may utilise the djnz instruction.·
FYI - I am trying to build an assembler overlay model (similar to computers in the 70's) when memory was precious
Post Edited (Cluso99) : 6/14/2008 10:54:12 AM GMT
I am trying to find a way to shift a variable length of longs between (to and from) cog and hub memory while maintaining the "sweet spot". I have looked at the LMM threads.
This code misses the sweet spot by one instruction and so takes 32 cycles instead of 16...
'' loading overlay - going forward overlay_load movd load,#0 'initialise cog ptr nop 'delay for pipeline load rdlong 0-0,hptr 'read long from hub ram add load,d_inc 'increment cog pointer add hptr,#4 'increment hub pointer by 1 long djnz hlen,#load 'repeat for entire buffer jmp #0 'go execute the overlay
'' loading overlay - going backwards overlay_load movd load,hlen 'initialise cog ptr nop 'delay for pipeline load rdlong 0-0,hptr 'read long from hub ram sub load,d_inc 'decrement buffer pointer sub hptr,#4 'decrement hub memory pointer djnz hlen,#load jmp #0 'jump to overlay (address $000)
hptr long 0-0 'hub ram overlay xx end address hlen long 0-0 'hub ram overlay xx length d_inc long $0200 'decrement destination (source) by 1
The same applies when moving data the other way (using wrlong instead of rdlong).
I can take into account if the pipeling is already loaded when the instruction is fetched if that helps.
So the challenge - can anyone find a way to replace the two adds (or subs) with a single instruction???? It may utilise the djnz instruction.·
FYI - I am trying to build an assembler overlay model (similar to computers in the 70's) when memory was precious
Post Edited (Cluso99) : 6/14/2008 10:54:12 AM GMT
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Thanks. I've been looking at this off and on for a few weeks.
I did get a Logic Analyser sampling at 12.5nS Your Logic Analyser inspired me to give it a go! It uses 4 cogs giving 1880 samples and rates of 12.5nS, 50nS, 100nS, 200ns, then incrementally by 50nS thereafter. It outputs in *.spin unicode format so you can see the waveforms in the Propeller Tool
So once setup, you could leave the counters running and just reload the address at the beginning of each transfer. This way you only have the setup cost when you start the cog. This obviously depends on how and when you do the transfer.
*) enforced by the hub access window
Post Edited (kuroneko) : 6/10/2008 5:04:58 AM GMT
I'm trying to decide how to have a table of overlays with their hub addresses and length.
So I've posted the code to get any comments · Also I am not sure if I want to call the overlay loader or jump to it.
That would be fantastic, thanks
To show how it works, it adds up 8 longs the start address of which is passed in PAR. If the sum is correct, it lights up one LED, if the timing is correct, another (Hydra gamepad LEDs). Timing obviously depends on the hub access window so there is a 0..15 cycle penalty depending on code location. The source also includes the missed-slot-by-one-instruction version for comparison. For real world usage just get rid of the timing code, only keep the counter setup, sync and workload. If anything is not clear, just ask.
Post Edited (kuroneko) : 6/10/2008 7:26:04 AM GMT
Does anyone know how to code a long so that the value stored by the assembler will be the actual·hub address of the assembly routine??
var long @myfunction
that would use the address in HUB if you have that long in a DAT section... if I got you correctly
kuroneko's use of CTR looks very interesting. Thanks.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
-Phil
Update: Corrected some errors in the code.
Post Edited (Phil Pilgrim (PhiPi)) : 6/10/2008 9:59:42 PM GMT
Many thanks to kuroneko and Phil Pilgrim (PhiPi) for your code to hit the "sweet spot". At the moment I wanted to get it sampling for comments so I am not hitting it so it takes 32 cycles per long transfer. The code can also be modified to transfer blocks of data between cog and hub memory.
Kuroneko's option uses the two counters and a pin. Phil's option means the overlay cannot begin at $0 - I am not sure this is a problem anyway.
I was blown away by Kuroneko's use of the counters
Hippy: Yes it depends entirely on the application as to how many loops, etc are actually used. I had a look at the LMM thread while doing this. I liked your concepts.
····· Here's how to call an overlay...
Here's how the overlay loader works...
Comments please... ·and enjoy...
Post Edited (Cluso99) : 6/11/2008 1:25:44 PM GMT
This code is a concept for fast overlaying assembler routines within a COG and achieves the "sweet spot" for loading code in 16 clock cycles per instruction (long) plus an overhead of 36 clocks (+/-) per overlay. (1 clock = 12.5nS @ 80MHz)
The load section is posted below and the complete file OverlayLoader.spin is attached.
Enjoy