Shop OBEX P1 Docs P2 Docs Learn Events
Optimizing SPIN code for performance (array copies) — Parallax Forums

Optimizing SPIN code for performance (array copies)

agsags Posts: 386
edited 2013-06-16 22:29 in Propeller 1
I'm using a circular buffer similar to the rx_buffer in the Propeller Serial Terminal object, and am wondering which is faster when copying a byte sequence from the circular buffer to another (linear) buffer. [Recall that in a circular buffer (source) sequential bytes may wrap around from the end of the circular buffer back to the beginning] The circular buffer will be 64, 128 or 256 bytes long. I'll be copying anywhere from 6 to nearly the entire (circular) buffer length at a time. I'll need to check for the overflow (wrap around) condition in either case.

I think it boils down to the question: how much faster (if at all) is bytemove compared to using a repeat loop incrementing one byte at a time? Is bytemove slower for small copies (say one byte) but then faster beyond a "break even point" where the cost for setting up the move becomes dominated by the speed of the SPIN interpreter optimization?

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2013-06-16 14:09
    I would be fairly certain that bytemove would be faster. But this is easy for you to check yourself. Just setup both types of routines and time them using the cnt register.
  • Mike GreenMike Green Posts: 23,101
    edited 2013-06-16 14:25
    All of the multi-byte/word/long operations do their stuff using optimized native (Propeller) instructions, not Spin, so they're fast. They should process one storage unit per two hub cycles (32 system clocks ... 16 to fetch the data and 16 to store it). If your buffer is aligned on a long boundary (multiple of 4) and contains an integral number of longs, use LONGMOVE for maximum speed. At 80MHz, BYTEMOVE copies a byte in 400ns. That's 2.5MB per second.
  • Dave HeinDave Hein Posts: 6,347
    edited 2013-06-16 17:48
    ags, here's a routine that I use in spinix to read blocks of data from the serial port. It reads up to "num" bytes into a linear buffer starting at "ptr". It returns the number of bytes that were read.

    A tight loop calling the rx_check method can process slightly more than 57,600 baud. With rxblock1 you can process over 1 mega-baud. This routine uses a handle that points to a struct containing the rxhead, rxtail, rxmask and rxbuffer values. rx_head, rx_tail, rx_mask and rx_buffer are byte offsets into the struct. In the normal serial driver rx_head, rx_tail, rx_mask and rx_buffer are the actual variables, and you would use them directly.
    PUB rxblock1(handle, ptr, num) | num1, num2, rxhead, rxtail, rxbuffer, rxmask
      ' Get local copies of the read/write indices, and return if equal
      rxhead := word[handle + rx_head]
      rxtail := word[handle + rx_tail]
      if rxhead == rxtail
        return
        
      ' Determine the number of bytes at the end and beginning of the buffer
      rxmask := word[handle + rx_mask]  
      if rxhead => rxtail
        num1 := rxhead - rxtail
        num2 := 0
      else
        num1 := rxmask - rxtail + 1
        num2 := rxhead
    
      ' Limit the total number of bytes to num    
      if num1 > num
        num1 := num
        num2 := 0
      elseif num1 + num2 > num
        num2 := num - num1
    
      ' Copy the data    
      rxbuffer := word[handle + rx_buffer]
      bytemove(ptr, rxbuffer + rxtail, num1)
      bytemove(ptr + num1, rxbuffer, num2)
    
      ' Update the read index and return the number of bytes received  
      result := num1 + num2
      word[handle + rx_tail] := (rxtail + result) & rxmask
    
  • agsags Posts: 386
    edited 2013-06-16 22:29
    Dave, I think I ended up with something very similar to what you've done. I realized that RxCheck was wasting quite a bit of time, so I implemented a more block-oriented copy. I'll have to study your sample a bit more before I declare them functionally equivalent. Thanks for the reply
Sign In or Register to comment.