[POC] reverse overlay loader aka cog-to-hub transfer

OK, I didn't have a rainy day but at least an odd idea. Turns out it actually works and can be used to transfer cog memory to hub RAM without penalty. The following example transfers 64 longs.
Enjoy!
Enjoy!
''
'' cog to hub transfer
''
'' Author: Marko Lukat
'' Last modified: 2011/02/18
'' Version: 0.1
''
'' acknowledgements
'' - code based on work done by Phil Pilgrim (PhiPi) and Ray Rodrick (Cluso99)
''
CON
_clkmode = XTAL1|PLL16X
_xinfreq = 5_000_000
VAR
long guard_before, storage[64], guard_after
PUB null | n
cognew(@entry, @storage{0})
waitcnt(clkfreq + cnt)
dira[16..23] := outa[16..23] := -1 ' success (preset)
if guard_before or guard_after
outa[16..23] := $81 ' transfer failed
else
repeat n from 0 to 63
if storage[n] <> $DEADBEEF
outa[16..23] := $81 ' transfer failed
quit
waitpne(0, 0, 0)
DAT org 0
entry movd xfer_copy7, #data + 63 ' -4 move cog address into wrlong instruction
movd xfer_copy1, #data + 62 ' +0 move cog address into wrlong instruction
movi ctra, #%0_11111_000 ' +4 LOGIC always
[COLOR="#D3D3D3"]'{optional} long 0[3] {3 x nop} ' +8 adjustment[/COLOR]
mov frqa, par ' +4 frqa gets added twice between modifying
shr frqa, #1 ' +8 shadow[phsa] and wrlong
mov phsa, #256 - 1 ' -4 hub byte count (8n + 7)
xfer_copy7 wrlong 0-0, phsa ' +0 = copy long from cog to hub
sub $-1, dst2 ' +8 decrement cog address by 2
sub phsa, #7 wz ' -4 decrement hub length by 1 long (prev by 1, now by 7)
xfer_copy1 wrlong 0-0, phsa ' +0 = copy long from cog to hub
sub $-1, dst2 ' +8 decrement cog address by 2
if_nz djnz phsa, #xfer_copy7 ' -4 decrement hub length by 1 long (prev by 7, now by 1)
cogid cnt '
cogstop cnt ' sayonara ...
' initialised data and/or presets
dst2 long 2 << 9
data long $DEADBEEF[64]
' uninitialised data and/or temporaries
fit
DAT
Limitations
- transfers have to be in 2n long quantities
- consumes a counter
- [thread=141015]Fastes possible memory transfer[/thread] [sic]
- [thread=129701]PHSA: A question for kuroneko & an idea for storing data in unused cog ram (from OBC)[/thread]
- [thread=104167]Assembly Oververlay Loader for Cog FAST (renamed & released)[/thread] [sic]
- [thread=118012]Quick Cog-to-Hub transfer[/thread]
Comments
FRQA = hubaddress << 1 (for every 2nd access) << 2 (implied because par points to longs)
PHSA = 255 originally = (4 * 64 longs) -1
we write the 2nd last long
PHSA = 255-7 wz (because we access the shadow cog ram so it holds the value we wrote last (not the actual current PHSA value)
we write th last long
PHSA = 255-7-1 and provided z not set, repeat loop
wrlong cog[data+3], base + %0000_[COLOR="red"]11[/COLOR]11 wrlong cog[data+2], base + %0000_[COLOR="red"]10[/COLOR]00 wrlong cog[data+1], base + %0000_[COLOR="red"]01[/COLOR]11 wrlong cog[data+0], base + %0000_[COLOR="red"]00[/COLOR]00
The loop itself is controlled by the block length (shadow[phsa]), the base address is added just before the wrlong. As frqa is added twice before phsa is sampled we simply divide it by 2 first (>> 1, par is 4n which we change to 2n, phsa += par/2 * 2).The zero flag is necessary as I needed the lower 2 bits high (to get the sub #7/djnz going). Simply adding/or'ing them would have left me with 3 at the end of the transfer which can't be used as an exit condition. So -1 worked just as well (%--11 and usable exit condition).
Just noticed a possible misunderstanding regarding last long. You probably meant the last one written whereas I refer to the last one in the block.