Fastes possible memory transfer
macca
Posts: 826
Hello,
I need to transfer a block of bytes from HUB to COG memory, with this code:
according with the documentation, if I'm not wrong, it looses a slot at every cycle.
Is there a better, fastest, way to transfer a block of memory without loosing slots ?
Regards.
I need to transfer a block of bytes from HUB to COG memory, with this code:
MOVD :l1, #buffer MOV count, 64 :l1 RDLONG buffer, srcptr ADD :l1, increment ADD srcptr, #4 DJNZ count, #:l1 srcptr LONG $0 count LONG $0 increment LONG 000000000000_000000100_000000000 buffer RES 64
according with the documentation, if I'm not wrong, it looses a slot at every cycle.
Is there a better, fastest, way to transfer a block of memory without loosing slots ?
Regards.
Comments
Do that rdlong, add, add sequence 4 times in each iteration and reduce the start count to 16. For example.
Note that your code had the wrong increment value, you add 1 to a cog address to get to the next long, not 4 and binary constants start with %
How do I place the buffer in the first 512 bytes ?
The whole program uses SPIN code also and a (yet unknown) number of COGs.
Spin will place code and declared variables wherever it likes and gives you no control over that.
It might be that if you declare an array at the start of the first object which itself contains little code then that array may end up within 512 bytes of the start of RAM. But that would only be by accident of the compliler implementation in use.
I believe you can reserve any amount of Low Hub space after address $10 by having your first object contain only a DAT section of the size you choose, plus of course the declaration of one next object.
Cheers,
Peter (pjv)
Still this is not nice as it depends on undocumented compiler operation. I.E. that the main object will always be first in memory and that no other junk is put in low memory.
But I guess if it works it works...
Thanks for your help.
Sorry, I meant to have used the word "PUB" instead of "next object" as Heater pointed out. Although it is not documented, this technique works consistently according to verbal confirmation by Chip Gracey.
Then, to get the fastest transfer rate you asked for, use the technique outlined by Mark_T.
I believe there are no other approaches that will be as fast.
Cheers,
Peter (pjv)
The transfer loop loads the data in reverse order and hits the hub sweet spot every time without using a counter. The hub and cog buffers occupy the same hub memory and must be placed immediately after loop_back. To start, the jmp loadbuf_ret instruction is replaced by a djnz, which creates the loop. The loop terminates when the djnz gets overwritten again by the jmp loadbuf_ret from the hub. (You read that right, no "#".) Actually, due to pipelining, the loop will terminate one or two transfers later than that, depending upon whether BUF_SIZE is even or odd. But the additional rdlongs are harmless, since they overwrite the code with the same instructions that are already there.
Unfortunately, the same technique cannot be employed for transferring data out of the cog to the hub.
-Phil
I have not studied your example in great detail, but I suspect you have hit another home run!
I see you also use that reverse DJNZ trick as a pointer.... it gives me great pleasure every time I can use that; kind of a dual function with a single instruction. It really helps keep the ripple-sorter I use down to a very compact routine.
Nice going!
Cheers,
Peter (pjv)
What I'm worried about is that there is a register running on its own and the code execution expects to be synchronized with it. One day someone may forget that and/or do a change that alter this synchronization and the program doesn't work anymore and will be very hard to discover the problem.
Jonathan
-Phil