Optimizing SPIN code for performance (array copies)

ags · 2013-06-16 12:32

I'm using a circular buffer similar to the rx_buffer in the Propeller Serial Terminal object, and am wondering which is faster when copying a byte sequence from the circular buffer to another (linear) buffer. [Recall that in a circular buffer (source) sequential bytes may wrap around from the end of the circular buffer back to the beginning] The circular buffer will be 64, 128 or 256 bytes long. I'll be copying anywhere from 6 to nearly the entire (circular) buffer length at a time. I'll need to check for the overflow (wrap around) condition in either case.

I think it boils down to the question: how much faster (if at all) is bytemove compared to using a repeat loop incrementing one byte at a time? Is bytemove slower for small copies (say one byte) but then faster beyond a "break even point" where the cost for setting up the move becomes dominated by the speed of the SPIN interpreter optimization?

Cluso99 · 2013-06-16 14:09

I would be fairly certain that bytemove would be faster. But this is easy for you to check yourself. Just setup both types of routines and time them using the cnt register.

Mike Green · 2013-06-16 14:25

All of the multi-byte/word/long operations do their stuff using optimized native (Propeller) instructions, not Spin, so they're fast. They should process one storage unit per two hub cycles (32 system clocks ... 16 to fetch the data and 16 to store it). If your buffer is aligned on a long boundary (multiple of 4) and contains an integral number of longs, use LONGMOVE for maximum speed. At 80MHz, BYTEMOVE copies a byte in 400ns. That's 2.5MB per second.

Dave Hein · 2013-06-16 17:48

ags, here's a routine that I use in spinix to read blocks of data from the serial port. It reads up to "num" bytes into a linear buffer starting at "ptr". It returns the number of bytes that were read.

A tight loop calling the rx_check method can process slightly more than 57,600 baud. With rxblock1 you can process over 1 mega-baud. This routine uses a handle that points to a struct containing the rxhead, rxtail, rxmask and rxbuffer values. rx_head, rx_tail, rx_mask and rx_buffer are byte offsets into the struct. In the normal serial driver rx_head, rx_tail, rx_mask and rx_buffer are the actual variables, and you would use them directly.

PUB rxblock1(handle, ptr, num) | num1, num2, rxhead, rxtail, rxbuffer, rxmask
  ' Get local copies of the read/write indices, and return if equal
  rxhead := word[handle + rx_head]
  rxtail := word[handle + rx_tail]
  if rxhead == rxtail
    return
    
  ' Determine the number of bytes at the end and beginning of the buffer
  rxmask := word[handle + rx_mask]  
  if rxhead => rxtail
    num1 := rxhead - rxtail
    num2 := 0
  else
    num1 := rxmask - rxtail + 1
    num2 := rxhead

  ' Limit the total number of bytes to num    
  if num1 > num
    num1 := num
    num2 := 0
  elseif num1 + num2 > num
    num2 := num - num1

  ' Copy the data    
  rxbuffer := word[handle + rx_buffer]
  bytemove(ptr, rxbuffer + rxtail, num1)
  bytemove(ptr + num1, rxbuffer, num2)

  ' Update the read index and return the number of bytes received  
  result := num1 + num2
  word[handle + rx_tail] := (rxtail + result) & rxmask

ags · 2013-06-16 22:29

Dave, I think I ended up with something very similar to what you've done. I realized that RxCheck was wasting quite a bit of time, so I implemented a more block-oriented copy. I'll have to study your sample a bit more before I declare them functionally equivalent. Thanks for the reply

Optimizing SPIN code for performance (array copies)

Comments