Optimizing SPIN code for performance (array copies)
I'm using a circular buffer similar to the rx_buffer in the Propeller Serial Terminal object, and am wondering which is faster when copying a byte sequence from the circular buffer to another (linear) buffer. [Recall that in a circular buffer (source) sequential bytes may wrap around from the end of the circular buffer back to the beginning] The circular buffer will be 64, 128 or 256 bytes long. I'll be copying anywhere from 6 to nearly the entire (circular) buffer length at a time. I'll need to check for the overflow (wrap around) condition in either case.
I think it boils down to the question: how much faster (if at all) is bytemove compared to using a repeat loop incrementing one byte at a time? Is bytemove slower for small copies (say one byte) but then faster beyond a "break even point" where the cost for setting up the move becomes dominated by the speed of the SPIN interpreter optimization?
I think it boils down to the question: how much faster (if at all) is bytemove compared to using a repeat loop incrementing one byte at a time? Is bytemove slower for small copies (say one byte) but then faster beyond a "break even point" where the cost for setting up the move becomes dominated by the speed of the SPIN interpreter optimization?

Comments
A tight loop calling the rx_check method can process slightly more than 57,600 baud. With rxblock1 you can process over 1 mega-baud. This routine uses a handle that points to a struct containing the rxhead, rxtail, rxmask and rxbuffer values. rx_head, rx_tail, rx_mask and rx_buffer are byte offsets into the struct. In the normal serial driver rx_head, rx_tail, rx_mask and rx_buffer are the actual variables, and you would use them directly.
PUB rxblock1(handle, ptr, num) | num1, num2, rxhead, rxtail, rxbuffer, rxmask ' Get local copies of the read/write indices, and return if equal rxhead := word[handle + rx_head] rxtail := word[handle + rx_tail] if rxhead == rxtail return ' Determine the number of bytes at the end and beginning of the buffer rxmask := word[handle + rx_mask] if rxhead => rxtail num1 := rxhead - rxtail num2 := 0 else num1 := rxmask - rxtail + 1 num2 := rxhead ' Limit the total number of bytes to num if num1 > num num1 := num num2 := 0 elseif num1 + num2 > num num2 := num - num1 ' Copy the data rxbuffer := word[handle + rx_buffer] bytemove(ptr, rxbuffer + rxtail, num1) bytemove(ptr + num1, rxbuffer, num2) ' Update the read index and return the number of bytes received result := num1 + num2 word[handle + rx_tail] := (rxtail + result) & rxmask