Shop OBEX P1 Docs P2 Docs Learn Events
Fastes possible memory transfer — Parallax Forums

Fastes possible memory transfer

maccamacca Posts: 740
edited 2012-07-04 21:02 in Propeller 1
Hello,

I need to transfer a block of bytes from HUB to COG memory, with this code:
                        MOVD    :l1, #buffer
                        MOV     count, 64
:l1                     RDLONG  buffer, srcptr
                        ADD     :l1, increment
                        ADD     srcptr, #4
                        DJNZ    count, #:l1

srcptr                  LONG    $0
count                   LONG    $0
increment               LONG    000000000000_000000100_000000000 

buffer                  RES     64

according with the documentation, if I'm not wrong, it looses a slot at every cycle.

Is there a better, fastest, way to transfer a block of memory without loosing slots ?


Regards.

Comments

  • kuronekokuroneko Posts: 3,623
    edited 2012-07-01 06:25
    macca wrote: »
    Is there a better, fastest, way to transfer a block of memory without loosing slots ?
    Not sure about better but [post=978929]this example[/post] will get you maximum transfer speed in both directions.
  • Heater.Heater. Posts: 21,230
    edited 2012-07-01 06:27
    Not sure how the hub slots work out off hand but you can always unroll the loop a bit.
    Do that rdlong, add, add sequence 4 times in each iteration and reduce the start count to 16. For example.
  • Mark_TMark_T Posts: 1,981
    edited 2012-07-01 13:57
    If you can arrange the hub memory buffer to be in the first 512 bytes of hub RAM then you can combine the increments:
                            MOVD    :l1, #buffer
                            MOVS    :l1, #srcaddr
                            MOV     count, 64
     :l1
                            RDLONG  buffer, #srcptr 
                            ADD     :l1, increment  
                            DJNZ    count, #:l1
    
     increment              LONG    %000000000000_000000001_000000100
    
     count                  RES    1
    

    Note that your code had the wrong increment value, you add 1 to a cog address to get to the next long, not 4 and binary constants start with %
  • maccamacca Posts: 740
    edited 2012-07-01 23:26
    Mark_T wrote: »
    If you can arrange the hub memory buffer to be in the first 512 bytes of hub RAM then you can combine the increments:

    How do I place the buffer in the first 512 bytes ?
    The whole program uses SPIN code also and a (yet unknown) number of COGs.
  • Heater.Heater. Posts: 21,230
    edited 2012-07-01 23:56
    You can't. Or at least not with any "normal" Spin programming.
    Spin will place code and declared variables wherever it likes and gives you no control over that.
    It might be that if you declare an array at the start of the first object which itself contains little code then that array may end up within 512 bytes of the start of RAM. But that would only be by accident of the compliler implementation in use.
  • pjvpjv Posts: 1,903
    edited 2012-07-02 07:46
    Hi Macca;

    I believe you can reserve any amount of Low Hub space after address $10 by having your first object contain only a DAT section of the size you choose, plus of course the declaration of one next object.

    Cheers,

    Peter (pjv)
  • Heater.Heater. Posts: 21,230
    edited 2012-07-02 08:09
    You will need at least 1 PUB in your first object which will then be calling your "real" main object with the address of the reserved space.

    Still this is not nice as it depends on undocumented compiler operation. I.E. that the main object will always be first in memory and that no other junk is put in low memory.

    But I guess if it works it works...
  • maccamacca Posts: 740
    edited 2012-07-02 08:35
    Well, the hub memory location is not reliable and the solution using a counter seems too error prone (I don't like to have something running on its own in that case), I think I'll unroll the loop a bit to loose less slots.
    Thanks for your help.
  • pjvpjv Posts: 1,903
    edited 2012-07-02 09:53
    Hi Macca;

    Sorry, I meant to have used the word "PUB" instead of "next object" as Heater pointed out. Although it is not documented, this technique works consistently according to verbal confirmation by Chip Gracey.

    Then, to get the fastest transfer rate you asked for, use the technique outlined by Mark_T.

    I believe there are no other approaches that will be as fast.

    Cheers,

    Peter (pjv)
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-07-02 11:03
    I think the following (untested) will do what you want for transferring data into the cog from the hub:
    CON
    
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
    
      BUF_SIZE      = 128           'Number of longs in buffer.
    
    PUB  start
    
      cognew(@pasm,@bufend - 4)     'Start cog, pointing to last long in hub buffer. 
    
    DAT
    
                  org       0
    pasm          mov       hubbufend,par
    
                  '...
                  call      #loadbuf
                  '...
    
    loadbuf       movd      long0,#bufend-1         'Point to last long in cog buffer.
                  movd      long1,#bufend-2         'Point to next-to-last long in cog buffer.
                  mov       loop_back,djnz_instr    'Replace jmp loadbuf_ret instruction with a djnz.
                  mov       ptr,hubbufend           'Point to last long in hub buffer.
                  jmp       #long0                  'Start the transfer.
                  
    loop          sub       long0,dec_dest          'Decrement long0 dest addr by 2.
    long1         rdlong    0-0,ptr                 'Read long at pointer address. (Pointer is address + 3.)
                  sub       ptr,#7                  'Pointer + 3 - 7 == pointer - 4.
                  sub       long1,dec_dest          'Decrement long1 dest addr by 2.
    long0         rdlong    0-0,ptr                 'Read long at pointer address. (Pointer is actual address.)
    loop_back     jmp       loadbuf_ret             'Is djnz ptr,#loop until overwritten by transfer.
    
    buffer        long      0[BUF_SIZE]             'Serves as both hub and cog buffers.
    bufend 
    
    djnz_instr    djnz      ptr,#loop               'Placed at loadbuf_ret to perform loop until overwritten by transfer.
    dec_dest      long      2 << 9                  'Amount by which to decrement destination address in transfer insturctions.
    hubbufend     long      0-0                     'Hub address of the last long in buffer.
    loadbuf_ret   ret                               'Return address gets placed here.
    
    ptr           res       0                       'Pointer into hub.    
    

    The transfer loop loads the data in reverse order and hits the hub sweet spot every time without using a counter. The hub and cog buffers occupy the same hub memory and must be placed immediately after loop_back. To start, the jmp loadbuf_ret instruction is replaced by a djnz, which creates the loop. The loop terminates when the djnz gets overwritten again by the jmp loadbuf_ret from the hub. (You read that right, no "#".) Actually, due to pipelining, the loop will terminate one or two transfers later than that, depending upon whether BUF_SIZE is even or odd. But the additional rdlongs are harmless, since they overwrite the code with the same instructions that are already there.

    Unfortunately, the same technique cannot be employed for transferring data out of the cog to the hub.

    -Phil
  • pjvpjv Posts: 1,903
    edited 2012-07-02 14:39
    Hello Phil;

    I have not studied your example in great detail, but I suspect you have hit another home run!

    I see you also use that reverse DJNZ trick as a pointer.... it gives me great pleasure every time I can use that; kind of a dual function with a single instruction. It really helps keep the ripple-sorter I use down to a very compact routine.

    Nice going!

    Cheers,

    Peter (pjv)
  • maccamacca Posts: 740
    edited 2012-07-03 01:49
    That's really interesting, thanks Phil!
  • kuronekokuroneko Posts: 3,623
    edited 2012-07-03 07:04
    macca wrote: »
    ... and the solution using a counter seems too error prone (I don't like to have something running on its own in that case)
    Can you elaborate? What error conditions do you expect to encounter with a LOGIC.always setup? Just curious ...
  • maccamacca Posts: 740
    edited 2012-07-03 07:26
    kuroneko wrote: »
    Can you elaborate? What error conditions do you expect to encounter with a LOGIC.always setup? Just curious ...

    What I'm worried about is that there is a register running on its own and the code execution expects to be synchronized with it. One day someone may forget that and/or do a change that alter this synchronization and the program doesn't work anymore and will be very hard to discover the problem.
  • lonesocklonesock Posts: 917
    edited 2012-07-03 10:36
    kuroneko's solution works great, and it is also nice to have an alternative for those times when your counters are already in use.

    Jonathan
  • kuronekokuroneko Posts: 3,623
    edited 2012-07-04 20:27
    I think the following (untested) will do what you want for transferring data into the cog from the hub:
    Thanks for posting this again. I always found the override approach slightly irritating but since I had a counter solution I didn't see the need for doing anything about it. Until todayA. This example doesn't use counters, doesn't override instructions and can serve multiple hub buffers in both directions but with a 2n long limitation.
    CON
      BUF_SIZE = 128                                ' number of longs in buffer (2n)
    
    VAR
      long  hub[BUF_SIZE]
      
    PUB start
    
      cognew(@pasm, @hub{0})                                                         
    
    DAT             org     0
    
    pasm            '...
                    mov     ptr, par                ' hub location
                    call    #loadbuf                ' transfer fixed size buffer
                    '...
                    waitpeq $, #0
    
    loadbuf         movd    long0, #ptr -1          ' last long in cog buffer
                    movd    long1, #ptr -2          ' second-to-last long in cog buffer
                    add     ptr, #BUF_SIZE * 4 -1   ' last byte in hub buffer (8n + 7)
                    movi    ptr, #BUF_SIZE - 2      ' add magic marker
                    
    long0           rdlong  0-0, ptr                ' |
                    sub     long0, dst2             ' |
                    sub     ptr, i2s7 wc            ' |
    long1           rdlong  0-0, ptr                ' |
                    sub     long1, dst2             ' |
            if_nc   djnz    ptr, #long0             ' sub #7/djnz (Thanks Phil!)
    
    loadbuf_ret     ret
    
    ' initialised data and/or presets
    
    dst2            long    2 << 9                  ' dst +/-= 2
    i2s7            long    2 << 23 | 7
    
    ' uninitialised data and/or temporaries
    
    buffer          res     BUF_SIZE
    ptr             res     1                       ' buffer + BUF_SIZE
    
    tail            fit
                    
    DAT
    
    A ... not really needed but why waste an idea ...
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2012-07-04 21:02
    Nicely done, Kuroneko, and a diabolically clever use of those upper ptr bits! :)

    -Phil
Sign In or Register to comment.