Shop OBEX P1 Docs P2 Docs Learn Events
[POC] reverse overlay loader aka cog-to-hub transfer — Parallax Forums

[POC] reverse overlay loader aka cog-to-hub transfer

kuronekokuroneko Posts: 3,623
edited 2011-02-18 17:29 in Propeller 1
OK, I didn't have a rainy day but at least an odd idea. Turns out it actually works and can be used to transfer cog memory to hub RAM without penalty. The following example transfers 64 longs.

Enjoy!
''
'' cog to hub transfer
''
''        Author: Marko Lukat
'' Last modified: 2011/02/18
''       Version: 0.1
''
'' acknowledgements
'' - code based on work done by Phil Pilgrim (PhiPi) and Ray Rodrick (Cluso99)
''
CON
  _clkmode = XTAL1|PLL16X
  _xinfreq = 5_000_000
  
VAR
  long  guard_before, storage[64], guard_after
  
PUB null | n

  cognew(@entry, @storage{0})
  waitcnt(clkfreq + cnt)

  dira[16..23] := outa[16..23] := -1            ' success (preset)

  if guard_before or guard_after
    outa[16..23] := $81                         ' transfer failed
  else
    repeat n from 0 to 63
      if storage[n] <> $DEADBEEF
        outa[16..23] := $81                     ' transfer failed
        quit
     
  waitpne(0, 0, 0)
  
DAT             org     0

entry           movd    xfer_copy7, #data + 63  '  -4   move cog address into wrlong instruction
                movd    xfer_copy1, #data + 62  '  +0   move cog address into wrlong instruction

                movi    ctra, #%0_11111_000     '  +4   LOGIC always

[COLOR="#D3D3D3"]'{optional}     long    0[3] {3 x nop}          '  +8   adjustment[/COLOR]

                mov     frqa, par               '  +4   frqa gets added twice between modifying
                shr     frqa, #1                '  +8   shadow[phsa] and wrlong
                mov     phsa, #256 - 1          '  -4   hub byte count (8n + 7)
                
xfer_copy7      wrlong  0-0, phsa               '  +0 = copy long from cog to hub
                sub     $-1, dst2               '  +8   decrement cog address by 2
                sub     phsa, #7 wz             '  -4   decrement hub length by 1 long (prev by 1, now by 7)

xfer_copy1      wrlong  0-0, phsa               '  +0 = copy long from cog to hub
                sub     $-1, dst2               '  +8   decrement cog address by 2                
        if_nz   djnz    phsa, #xfer_copy7       '  -4   decrement hub length by 1 long (prev by 7, now by 1)

                cogid   cnt                     '
                cogstop cnt                     ' sayonara ...

' initialised data and/or presets

dst2            long    2 << 9
data            long    $DEADBEEF[64]

' uninitialised data and/or temporaries

                fit

DAT
Limitations
  • transfers have to be in 2n long quantities
  • consumes a counter
References
  1. [thread=141015]Fastes possible memory transfer[/thread] [sic]
  2. [thread=129701]PHSA: A question for kuroneko & an idea for storing data in unused cog ram (from OBC)[/thread]
  3. [thread=104167]Assembly Oververlay Loader for Cog FAST (renamed & released)[/thread] [sic]
  4. [thread=118012]Quick Cog-to-Hub transfer[/thread]
The counter implementation has now been superseded by the [post=1108886]generic transfer loop[/post].

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2011-02-18 17:05
    Nice work. Let me understand this...
    FRQA = hubaddress << 1 (for every 2nd access) << 2 (implied because par points to longs)
    PHSA = 255 originally = (4 * 64 longs) -1
    we write the 2nd last long
    PHSA = 255-7 wz (because we access the shadow cog ram so it holds the value we wrote last (not the actual current PHSA value)
    we write th last long
    PHSA = 255-7-1 and provided z not set, repeat loop
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-18 17:29
    The last long is written first followed by the 2nd to last. Example, 4 longs (not actual code):
    wrlong  cog[data+3], base + %0000_[COLOR="red"]11[/COLOR]11
    wrlong  cog[data+2], base + %0000_[COLOR="red"]10[/COLOR]00
    wrlong  cog[data+1], base + %0000_[COLOR="red"]01[/COLOR]11
    wrlong  cog[data+0], base + %0000_[COLOR="red"]00[/COLOR]00
    
    The loop itself is controlled by the block length (shadow[phsa]), the base address is added just before the wrlong. As frqa is added twice before phsa is sampled we simply divide it by 2 first (>> 1, par is 4n which we change to 2n, phsa += par/2 * 2).

    The zero flag is necessary as I needed the lower 2 bits high (to get the sub #7/djnz going). Simply adding/or'ing them would have left me with 3 at the end of the transfer which can't be used as an exit condition. So -1 worked just as well (%--11 and usable exit condition).

    Just noticed a possible misunderstanding regarding last long. You probably meant the last one written whereas I refer to the last one in the block.
Sign In or Register to comment.