Fastes possible memory transfer
Hello,
I need to transfer a block of bytes from HUB to COG memory, with this code:
according with the documentation, if I'm not wrong, it looses a slot at every cycle.
Is there a better, fastest, way to transfer a block of memory without loosing slots ?
Regards.
I need to transfer a block of bytes from HUB to COG memory, with this code:
MOVD :l1, #buffer
MOV count, 64
:l1 RDLONG buffer, srcptr
ADD :l1, increment
ADD srcptr, #4
DJNZ count, #:l1
srcptr LONG $0
count LONG $0
increment LONG 000000000000_000000100_000000000
buffer RES 64
according with the documentation, if I'm not wrong, it looses a slot at every cycle.
Is there a better, fastest, way to transfer a block of memory without loosing slots ?
Regards.

Comments
Do that rdlong, add, add sequence 4 times in each iteration and reduce the start count to 16. For example.
MOVD :l1, #buffer MOVS :l1, #srcaddr MOV count, 64 :l1 RDLONG buffer, #srcptr ADD :l1, increment DJNZ count, #:l1 increment LONG %000000000000_000000001_000000100 count RES 1Note that your code had the wrong increment value, you add 1 to a cog address to get to the next long, not 4 and binary constants start with %
How do I place the buffer in the first 512 bytes ?
The whole program uses SPIN code also and a (yet unknown) number of COGs.
Spin will place code and declared variables wherever it likes and gives you no control over that.
It might be that if you declare an array at the start of the first object which itself contains little code then that array may end up within 512 bytes of the start of RAM. But that would only be by accident of the compliler implementation in use.
I believe you can reserve any amount of Low Hub space after address $10 by having your first object contain only a DAT section of the size you choose, plus of course the declaration of one next object.
Cheers,
Peter (pjv)
Still this is not nice as it depends on undocumented compiler operation. I.E. that the main object will always be first in memory and that no other junk is put in low memory.
But I guess if it works it works...
Thanks for your help.
Sorry, I meant to have used the word "PUB" instead of "next object" as Heater pointed out. Although it is not documented, this technique works consistently according to verbal confirmation by Chip Gracey.
Then, to get the fastest transfer rate you asked for, use the technique outlined by Mark_T.
I believe there are no other approaches that will be as fast.
Cheers,
Peter (pjv)
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 BUF_SIZE = 128 'Number of longs in buffer. PUB start cognew(@pasm,@bufend - 4) 'Start cog, pointing to last long in hub buffer. DAT org 0 pasm mov hubbufend,par '... call #loadbuf '... loadbuf movd long0,#bufend-1 'Point to last long in cog buffer. movd long1,#bufend-2 'Point to next-to-last long in cog buffer. mov loop_back,djnz_instr 'Replace jmp loadbuf_ret instruction with a djnz. mov ptr,hubbufend 'Point to last long in hub buffer. jmp #long0 'Start the transfer. loop sub long0,dec_dest 'Decrement long0 dest addr by 2. long1 rdlong 0-0,ptr 'Read long at pointer address. (Pointer is address + 3.) sub ptr,#7 'Pointer + 3 - 7 == pointer - 4. sub long1,dec_dest 'Decrement long1 dest addr by 2. long0 rdlong 0-0,ptr 'Read long at pointer address. (Pointer is actual address.) loop_back jmp loadbuf_ret 'Is djnz ptr,#loop until overwritten by transfer. buffer long 0[BUF_SIZE] 'Serves as both hub and cog buffers. bufend djnz_instr djnz ptr,#loop 'Placed at loadbuf_ret to perform loop until overwritten by transfer. dec_dest long 2 << 9 'Amount by which to decrement destination address in transfer insturctions. hubbufend long 0-0 'Hub address of the last long in buffer. loadbuf_ret ret 'Return address gets placed here. ptr res 0 'Pointer into hub.The transfer loop loads the data in reverse order and hits the hub sweet spot every time without using a counter. The hub and cog buffers occupy the same hub memory and must be placed immediately after loop_back. To start, the jmp loadbuf_ret instruction is replaced by a djnz, which creates the loop. The loop terminates when the djnz gets overwritten again by the jmp loadbuf_ret from the hub. (You read that right, no "#".) Actually, due to pipelining, the loop will terminate one or two transfers later than that, depending upon whether BUF_SIZE is even or odd. But the additional rdlongs are harmless, since they overwrite the code with the same instructions that are already there.
Unfortunately, the same technique cannot be employed for transferring data out of the cog to the hub.
-Phil
I have not studied your example in great detail, but I suspect you have hit another home run!
I see you also use that reverse DJNZ trick as a pointer.... it gives me great pleasure every time I can use that; kind of a dual function with a single instruction. It really helps keep the ripple-sorter I use down to a very compact routine.
Nice going!
Cheers,
Peter (pjv)
What I'm worried about is that there is a register running on its own and the code execution expects to be synchronized with it. One day someone may forget that and/or do a change that alter this synchronization and the program doesn't work anymore and will be very hard to discover the problem.
Jonathan
CON BUF_SIZE = 128 ' number of longs in buffer (2n) VAR long hub[BUF_SIZE] PUB start cognew(@pasm, @hub{0}) DAT org 0 pasm '... mov ptr, par ' hub location call #loadbuf ' transfer fixed size buffer '... waitpeq $, #0 loadbuf movd long0, #ptr -1 ' last long in cog buffer movd long1, #ptr -2 ' second-to-last long in cog buffer add ptr, #BUF_SIZE * 4 -1 ' last byte in hub buffer (8n + 7) movi ptr, #BUF_SIZE - 2 ' add magic marker long0 rdlong 0-0, ptr ' | sub long0, dst2 ' | sub ptr, i2s7 wc ' | long1 rdlong 0-0, ptr ' | sub long1, dst2 ' | if_nc djnz ptr, #long0 ' sub #7/djnz (Thanks Phil!) loadbuf_ret ret ' initialised data and/or presets dst2 long 2 << 9 ' dst +/-= 2 i2s7 long 2 << 23 | 7 ' uninitialised data and/or temporaries buffer res BUF_SIZE ptr res 1 ' buffer + BUF_SIZE tail fit DATA ... not really needed but why waste an idea ...-Phil