Jon,
Here's an optimisation for the latter part - reduces 13 longs down to 5 longs. It does expect "i" changed from an index with ALTGB into a direct pointer to the specific register:
' test flags
testb i, #7 wc ' ch17 is digital
bitc tmp, #0+(10<<5) ' if NC then 0, if C then 2047
testb i, #6 wc ' ch18 is digital
bitc tmp, #16+(10<<5) ' if NC then 0, if C then 2047
wrlong tmp, p_ch
To keep it as an index would need one more instruction:
' test flags
getbyte i, #.sbus_raw + (23 >> 2), #23 & 3 ' packet byte 23
testb i, #7 wc ' ch17 is digital
bitc tmp, #0+(10<<5) ' if NC then 0, if C then 2047
testb i, #6 wc ' ch18 is digital
bitc tmp, #16+(10<<5) ' if NC then 0, if C then 2047
wrlong tmp, p_ch
PS: Neither solution is tested.
Regarding the bit-order reversal, that seems overly complicated. The byte receive routine should be able format each received byte correctly before placing in the array. Most likely this is a case of using SHR instead of SHL when receiving the serial bits.
Heh, the first one fails for sure. It needs the ch17/ch18 TESTB bit constants corrected for longword addressing. The bit offsets will change when the bit order is changed anyway.
Thanks for the tips, Evan. I used the first suggestion and found it worked, even shaving of another 0.3 microseconds. I posted the latest version above.
Regarding the bit-order reversal, that seems overly complicated.
I'm using a smart pin UART so there is no choice in the matter. If I was bit-banging the UART, it could be done in that process (as Jason does in his P1 code).
@evanh said:
Heh, the first one fails for sure. It needs the ch17/ch18 TESTB bit constants corrected for longword addressing. The bit offsets will change when the bit order is changed anyway.
This is what I put into the code and I have verified that it works. I'm testing with a fixed set of data and have manually changed the flag bits and seen the corresponding change in the channel values array. It works.
mov tmp, #0 ' clear ch17/ch18 workspace
testb v, #7 wc ' ch17 is digital
bitc tmp, #00 + (10 << 5) ' 0 if nc, 2047 if c
testb v, #6 wc ' ch18 is digital
bitc tmp, #16 + (10 << 5) ' 0 if nc, 2047 if c
wrlong tmp, ptrb
TBH, I don't know why bitc works the way it does in this situation, but I'm very tired. I will figure it out in the morning.
Figured it out; (10 << 5) is creating a group of 11 bits that gets filled with C. Okay, I can sleep now.
Ah, yes, I see the MOV is needed. I keep forgetting the unspecified bits are not cleared.
The reason why I'm saying the first example won't work was because it wasn't meant to be preceded by a GETBYTE at all. It was supposed to operate on longwords, and could but needs more tuning.
You can shave off one more program longword by ditching the final ALTGB and hard coding the final GETBYTE as per my second example.
@JonnyMac
Happy to help. The code you have now is as I thought with one extra long saved at the end by Evan's bitc that sets a contiguous group of bits to c. Only four bits are available with a 9-bit immediate for group size - 1, enough in this case but not always so.
Another long could be saved if the position of flags in the array is fixed and if I understood the algorithm (I just had a quick look at about 1 am) I could say whether any further improvement could be made. Are you sure that bits in each byte cannot be reversed before storing in the cog array?
25 bytes of data are sent then a 10 to 20 millisecond pause before the next 25 bytes are sent. So there is this amount of time to process the data which is easy enough to do even on a P1 which is where this code was first used.
I use a lock while the data is being updated because I zero the channel value just prior to it being updated so I don't want the data to be used until I know it's valid. I could use a different variable to collect the value and then just assign it to the channel but it's nice to have all the values updated at the same time.
@TonyB_ said:
... Are you sure that bits in each byte cannot be reversed before storing in the cog array?
I have my doubts there needs to be any reversing. RS232 is lsb first and the smartpins only understand lsb-first. So unless the data is explicitly mangled by the sender, there shouldn't be any reordering needed. Just needs the SHR rxdata, #32-8 to move the byte to the least 8-bits is all.
@TonyB_ said:
... Are you sure that bits in each byte cannot be reversed before storing in the cog array?
I have my doubts there needs to be any reversing. RS232 is lsb first and the smartpins only understand lsb-first. So unless the data is explicitly mangled by the sender, there shouldn't be any reordering needed. Just needs the SHR rxdata, #32-8 to move the byte to the least 8-bits is all.
If needed, reversing the bits of each byte in a long is trivial:
@TonyB_ said:
If needed, reversing the bits of each byte in a long is trivial:
movbyts D,#%%0123
rev D
That works with single bytes. I tend to use this strategy which always works, even when the value is greater than 8 bits.
rev chval ' flip back
rol chval, #11 ' restore LSB position
Some of this may not make sense because you're not seeing the whole picture. In case you want to play and make more suggestions (which I always appreciate), here is a data set that I captured from an RC receiver.
Byte 0 is the header ($0F)
Bytes 1-22 contain packed channel values (16, each is 11 bits)
Byte 23 is the flags
-- bit0 is ch17
-- bit1 is ch18
-- bit2 is loss-of-signal
-- bit3 indicates failsafe mode (tends to be throttle channel to 0)
Byte 24 is the footer ($00)
Again, these values are from the receiver, hence are in "normal" order. In Jason Dorie's P1 S.BUS receiver he deliberately receives the bits in reverse order, I believe in support of his unpacking algorithm. If the bytes in the packet above can be quickly unpacked without having them reverse first, this is good.
These are the channel values I get from the code I posted, as well as from a Spin translation of the algorithm used in many C versions of S.BUS receiver/decoders.
@evanh said:
I have my doubts there needs to be any reversing. RS232 is lsb first and the smartpins only understand lsb-first. So unless the data is explicitly mangled by the sender, there shouldn't be any reordering needed. Just needs the SHR rxdata, #32-8 to move the byte to the least 8-bits is all.
You may have a different thought after seeing what the C programmers are doing to unpack the channel data, but I reserve the right to be wrong , and would love to see your approach.
Here's how I am receiving a packet. I'm using a 250us (> 2 bytes @ 100Kbaud, 8E2, inverted) period to locate the idle time between packets.
sbus_rx fltl rxd ' reset uart, release smart pin
.wquiet mov x, #250 ' wait for low between packets
.wq1 waitx us1
testp rxd wc
if_c jmp #.wquiet ' if activity, restart wait
djnz x, #.wq1
drvl rxd ' re-enable smart pin
mov i, #0 ' reset packet index
.get_sbus testp rxd wc ' anything waiting?
if_nc jmp #.get_sbus ' not yet
rdpin b, rxd ' read new byte
shr b, #23 ' align lsb
and b, #$FF ' strip parity bit
.flip rev b ' reverse bits
rol b, #8
altsb i, #.sbus_raw ' write to cog array
setbyte b
incmod i, #24 wc ' update array index
tjnz i, #.get_sbus ' go again if not done
I used this code to capture the packet from my receiver (see previous post), but I wrote it to a hub array instead of a cog array. The smart pin is configured by the Spin2 interface code.
Well, you were right, reversing the bytes is not necessary for speedy decompression. This is what I came up with. It shaves off a couple hundredths microseconds and may be easier to understand (eye of the beholder, I guess).
Thanks, again, for the feedback and suggestions. I'm not yet a strong PASM programmer -- but am working my way there.
Edit: Restored original code -- uses shl to move new byte into correct position and then or to incorporate into working value.
' unpack s.bus bytes to channel values
unpack mov ptrb, p_ch ' point to hub array
mov ch, #0 ' set hub index
mov i, #1 ' s.bus data index (cog)
mov tmp, #0 ' workspace
mov bc, #0 ' bits in tmp
.get_byte altgb i, #sbusraw ' get s.bus byte from cog array
getbyte v ' get next byte
add i, #1 ' bump byte array index
shl v, bc ' move into position
or tmp, v ' add to work byte
add bc, #8 ' update bit count
cmpsub bc, #11 wc ' enough bits for channel value?
if_nc jmp #.get_byte
mov out, tmp ' make a copy
and out, ##$07FF ' clean-up
wrword out, ptrb++ ' write to hub
shr tmp, #11 ' remove last value
incmod ch, #15 wc ' update hub index
tjnz ch, #.get_byte ' keep going if more channels
' get digital channels from flags byte
altgb i, #sbusraw ' get flags byte
getbyte v
mov tmp, #0 ' clear ch17/ch18 workspace
testb v, #0 wc ' ch17 is digital
bitc tmp, #00 + (10 << 5) ' 0 if nc, 2047 if c
testb v, #1 wc ' ch18 is digital
bitc tmp, #16 + (10 << 5) ' 0 if nc, 2047 if c
wrlong tmp, ptrb
I tested this and it works. Now, I'm not much of a C programmer, but I made this attempt at a translation. In your S.BUS library you are doing a bit-by-bit unpacking which is probably consuming more time than necessary. As we say here in Hollywood... for your consideration:
void unpack_sbus()
{
uint8_t i = 1; \\ s.bus index
uint8_t j = 0; \\ channel value index
uint8_t bitcount = 0;
uint32_t workspace = 0;
while (j < 16) {
while (bitcount < 11) { \\ build channel value
workspace |= (ss[i++] << bitcount); \\ get byte, add to workspace
bitcount += 8; \\ update bitcount
}
channel[j++] = uint16_t(workspace & 0x07FF); \\ write to channel
workspace >>= 11; \\ remove old value
bitcount -= 11; \\ update bitcount
}
channel[16] = (ss[23] & 0x01) ? 2047 : 0; \\ extract digital channels
channel[17] = (ss[23] & 0x02) ? 2047 : 0;
}
Well, you were right, reversing the bytes is not necessary for speedy decompression. This is what I came up with. It shaves off a couple hundredths microseconds and may be easier to understand (eye of the beholder, I guess).
Thanks, again, for the feedback and suggestions. I'm not yet a strong PASM programmer -- but am working my way there.
@JonnyMac
Thanks for the full explanation about S-BUS channel data. The rolbyte optimization is missing in your latest code.
I've written an unpacker which is about twice the size but very fast, less than a microsecond at 200MHz. I could post it tomorrow after checking (but not testing).
The rolbyte optimization is missing in your latest code.
The use of rolbyte only applies when the bits are flipped; in standard order we have to shift the new byte into position and then add it.
As the routine stands, it takes about 3.7us -- less than half a bit period. According to information I've read, S.BUS packets come every 14ms (standard speed) or 7ms (high speed). Even in high speed mode there is 4ms to unpack and update the channel values. That is to say that I'm going to live with 3-4 microseconds and not have a worry.
I like what you've done with bit-bashing around the smartpin's enable/disable. I'd never thought that would even work without first dropping back to smartpin off (M == 0). EDIT: Which means I've got an error in the I/O block diagram, because I explicitly label the IN switch as controlled by M == 0.
Real-time packet validation could be added to the receive routine. Invalid parity, header and footer could all trigger a restart of the wait-for-next-packet.
Also, the deglitcher could be employed to help too. The normal method in UARTs is 16 samples over a full bit period and majority vote. But since the deglitcher is unanimous vote we will have to restrict the window. I'd aim for 5 samples over quarter bit period. eg:
hubset( 1<<30 | 3<<7 | 2<<5 | 4 ) 'set filt3 to 5 samples at interval of 16 sysclocks/sample (window = 80 sysclocks)
pinstart( rxd, P_ASYNC_RX | P_FILT3_AB | P_SYNC_IO | P_SCHMITT_A, bitperiod, 0 ) ' use filt3 for serial receive
@evanh said:
I like what you've done with bit-bashing around the smartpin's enable/disable. I'd never thought that would even work without first dropping back to smartpin off (M == 0). EDIT: Which means I've got an error in the I/O block diagram, because I explicitly label the IN switch as controlled by M == 0.
Ha, I was right already. I've only just now tested it, having assumed before. M == 0 is the switch after all. It did make the most sense.
So, Jon, the FLTL/DRVL pair aren't working the way you wanted in the "sbus_rx" routine. The TESTP rxd will never set C.
EDIT: It is flushing any excess from the smartpin though. That's likely a good thing.
I feel like the first cut of my P2 S.BUS receiver and demo is ready to make public. Before I send to Parallax for inclusion in their library, I'd appreciate any additional feedback on the finished object.
I've tested with a SkyFly FS-IA10B set to S.BUS mode which puts out 130 packets/second (fast mode). Everything seems to be working smoothly. I've tested in Propeller Tool and with FlexGUI (set T_TYPE to T_ANSI) so it should be okay with any OS.
Thanks, again, for comments and feedback. My friend Rick does effects for the new Star Wars TV shows (Mandalorian, Book of Boba Fett, Kenobi) and this is a tool he needs for puppeteering live animatronics using the Propeller (their shop uses Arduinos for this now, but -- like me -- he doesn't care for the Arduino).
Oh, that's so easy in spin how the DAT section pasm variables can be used in spin too. Well, I'm guessing not as a variable though, more like a initial object create parameter I presume. My spin knowledge is near non-existent.
Bugger, you're right. And I've led you up the garden path too. Certainly can't go embedding maths in an assembly line for variables/registers. That only works for constants/immediates.
No worries, guys, I appreciate the feedback. I tend to focus on writing obvious code that new programmers can follow. I re-implemented the parity check. If there is a parity error, or header/footer error the lost frame flag is set.
Another reason for simplicity is back-porting it to the P1 while trying to make things as similar as is possible.
The exercise has been really good, and I appreciate the suggestions.
@evanh said:
Bugger, you're right. And I've led you up the garden path too. Certainly can't go embedding maths in an assembly line for variables/registers. That only works for constants/immediates.
@evanh said:
Bugger, you're right. And I've led you up the garden path too. Certainly can't go embedding maths in an assembly line for variables/registers. That only works for constants/immediates.
Both our code shrinks are broken.
I think my code will work. The only problem with your code was the # before sbusraw. FlexSpin can handle register arithmetic and assembles the following code correctly.
verify
getbyte t1, sbusraw+(24>>2), #(24&3) ' footer @ index 24, expecting $00
rolbyte t1, sbusraw, #0 ' header @ index 0, expecting $0f
cmp t1, #$0f wz
if_ne jmp #sbus_rx ' if no match, wait for next packet
Comments
Jon,
Here's an optimisation for the latter part - reduces 13 longs down to 5 longs. It does expect "i" changed from an index with ALTGB into a direct pointer to the specific register:
To keep it as an index would need one more instruction:
PS: Neither solution is tested.
Regarding the bit-order reversal, that seems overly complicated. The byte receive routine should be able format each received byte correctly before placing in the array. Most likely this is a case of using SHR instead of SHL when receiving the serial bits.
Heh, the first one fails for sure. It needs the ch17/ch18 TESTB bit constants corrected for longword addressing. The bit offsets will change when the bit order is changed anyway.
Thanks for the tips, Evan. I used the first suggestion and found it worked, even shaving of another 0.3 microseconds. I posted the latest version above.
I'm using a smart pin UART so there is no choice in the matter. If I was bit-banging the UART, it could be done in that process (as Jason does in his P1 code).
This is what I put into the code and I have verified that it works. I'm testing with a fixed set of data and have manually changed the flag bits and seen the corresponding change in the channel values array. It works.
TBH, I don't know why bitc works the way it does in this situation, but I'm very tired. I will figure it out in the morning.
Figured it out; (10 << 5) is creating a group of 11 bits that gets filled with C. Okay, I can sleep now.
Ah, yes, I see the MOV is needed. I keep forgetting the unspecified bits are not cleared.
The reason why I'm saying the first example won't work was because it wasn't meant to be preceded by a GETBYTE at all. It was supposed to operate on longwords, and could but needs more tuning.
You can shave off one more program longword by ditching the final ALTGB and hard coding the final GETBYTE as per my second example.
@JonnyMac
Happy to help. The code you have now is as I thought with one extra long saved at the end by Evan's bitc that sets a contiguous group of bits to c. Only four bits are available with a 9-bit immediate for group size - 1, enough in this case but not always so.
Another long could be saved if the position of flags in the array is fixed and if I understood the algorithm (I just had a quick look at about 1 am) I could say whether any further improvement could be made. Are you sure that bits in each byte cannot be reversed before storing in the cog array?
25 bytes of data are sent then a 10 to 20 millisecond pause before the next 25 bytes are sent. So there is this amount of time to process the data which is easy enough to do even on a P1 which is where this code was first used.
I use a lock while the data is being updated because I zero the channel value just prior to it being updated so I don't want the data to be used until I know it's valid. I could use a different variable to collect the value and then just assign it to the channel but it's nice to have all the values updated at the same time.
Mike
I have my doubts there needs to be any reversing. RS232 is lsb first and the smartpins only understand lsb-first. So unless the data is explicitly mangled by the sender, there shouldn't be any reordering needed. Just needs the
SHR rxdata, #32-8
to move the byte to the least 8-bits is all.If needed, reversing the bits of each byte in a long is trivial:
That works with single bytes. I tend to use this strategy which always works, even when the value is greater than 8 bits.
Some of this may not make sense because you're not seeing the whole picture. In case you want to play and make more suggestions (which I always appreciate), here is a data set that I captured from an RC receiver.
Byte 0 is the header ($0F)
Bytes 1-22 contain packed channel values (16, each is 11 bits)
Byte 23 is the flags
-- bit0 is ch17
-- bit1 is ch18
-- bit2 is loss-of-signal
-- bit3 indicates failsafe mode (tends to be throttle channel to 0)
Byte 24 is the footer ($00)
Again, these values are from the receiver, hence are in "normal" order. In Jason Dorie's P1 S.BUS receiver he deliberately receives the bits in reverse order, I believe in support of his unpacking algorithm. If the bytes in the packet above can be quickly unpacked without having them reverse first, this is good.
These are the channel values I get from the code I posted, as well as from a Spin translation of the algorithm used in many C versions of S.BUS receiver/decoders.
I am displaying the device channel numbers, not the array index.
This is the unpacking code found in most C demos for S.BUS.
My gut tells me that Jason's idea to reverse the bytes and unpack them the way he does is going to be more efficient that a PASM conversion of this.
You may have a different thought after seeing what the C programmers are doing to unpack the channel data, but I reserve the right to be wrong , and would love to see your approach.
Here's how I am receiving a packet. I'm using a 250us (> 2 bytes @ 100Kbaud, 8E2, inverted) period to locate the idle time between packets.
I used this code to capture the packet from my receiver (see previous post), but I wrote it to a hub array instead of a cog array. The smart pin is configured by the Spin2 interface code.
@TonyB_ and @evanh
Well, you were right, reversing the bytes is not necessary for speedy decompression. This is what I came up with. It shaves off a couple hundredths microseconds and may be easier to understand (eye of the beholder, I guess).
Thanks, again, for the feedback and suggestions. I'm not yet a strong PASM programmer -- but am working my way there.
Edit: Restored original code -- uses shl to move new byte into correct position and then or to incorporate into working value.
@iseries
After re-writing the S.BUS unpacking code to work with standard (non-reversed) bytes, I translated it to Spin2.
I tested this and it works. Now, I'm not much of a C programmer, but I made this attempt at a translation. In your S.BUS library you are doing a bit-by-bit unpacking which is probably consuming more time than necessary. As we say here in Hollywood... for your consideration:
The C version has not been tested.
@JonnyMac
Thanks for the full explanation about S-BUS channel data. The rolbyte optimization is missing in your latest code.
I've written an unpacker which is about twice the size but very fast, less than a microsecond at 200MHz. I could post it tomorrow after checking (but not testing).
The use of rolbyte only applies when the bits are flipped; in standard order we have to shift the new byte into position and then add it.
As the routine stands, it takes about 3.7us -- less than half a bit period. According to information I've read, S.BUS packets come every 14ms (standard speed) or 7ms (high speed). Even in high speed mode there is 4ms to unpack and update the channel values. That is to say that I'm going to live with 3-4 microseconds and not have a worry.
Looking good Jon.
I like what you've done with bit-bashing around the smartpin's enable/disable. I'd never thought that would even work without first dropping back to smartpin off (M == 0). EDIT: Which means I've got an error in the I/O block diagram, because I explicitly label the IN switch as controlled by M == 0.
Real-time packet validation could be added to the receive routine. Invalid parity, header and footer could all trigger a restart of the wait-for-next-packet.
Also, the deglitcher could be employed to help too. The normal method in UARTs is 16 samples over a full bit period and majority vote. But since the deglitcher is unanimous vote we will have to restrict the window. I'd aim for 5 samples over quarter bit period. eg:
Ha, I was right already. I've only just now tested it, having assumed before. M == 0 is the switch after all. It did make the most sense.
So, Jon, the FLTL/DRVL pair aren't working the way you wanted in the "sbus_rx" routine. The TESTP rxd will never set C.
EDIT: It is flushing any excess from the smartpin though. That's likely a good thing.
I just spoke with Chip; you're right. Not a problem for my S.BUS object as I can define the S.BUS mode as a constant.
@iseries, @TonyB_ , @evanh
I feel like the first cut of my P2 S.BUS receiver and demo is ready to make public. Before I send to Parallax for inclusion in their library, I'd appreciate any additional feedback on the finished object.
I've tested with a SkyFly FS-IA10B set to S.BUS mode which puts out 130 packets/second (fast mode). Everything seems to be working smoothly. I've tested in Propeller Tool and with FlexGUI (set T_TYPE to T_ANSI) so it should be okay with any OS.
Thanks, again, for comments and feedback. My friend Rick does effects for the new Star Wars TV shows (Mandalorian, Book of Boba Fett, Kenobi) and this is a tool he needs for puppeteering live animatronics using the Propeller (their shop uses Arduinos for this now, but -- like me -- he doesn't care for the Arduino).
Oh, that's so easy in spin how the DAT section pasm variables can be used in spin too. Well, I'm guessing not as a variable though, more like a initial object create parameter I presume. My spin knowledge is near non-existent.
Here's the deglitcher additions (untested of course): Two lines inserted between pinstart() and coginit(), and the pin mode config word.
No need to explicitly strip the parity bit, the SETBYTE does that naturally.
And here's the parity check that goes straight after the SETBYTE:
Actually, to do the deglitch fully, the same should be applied to the activity waiting too:
And code size shrink for the verify routine:
No # before sbusraw. Also bad practice I think to use wcz if c not needed.
More code shrinking:
Bugger, you're right. And I've led you up the garden path too. Certainly can't go embedding maths in an assembly line for variables/registers. That only works for constants/immediates.
Both our code shrinks are broken.
No worries, guys, I appreciate the feedback. I tend to focus on writing obvious code that new programmers can follow. I re-implemented the parity check. If there is a parity error, or header/footer error the lost frame flag is set.
Another reason for simplicity is back-porting it to the P1 while trying to make things as similar as is possible.
The exercise has been really good, and I appreciate the suggestions.
PM sent.
I think my code will work. The only problem with your code was the # before sbusraw. FlexSpin can handle register arithmetic and assembles the following code correctly.