as for 2 stop bits, might be a try, I just don't now how to do that with smart pins, must read a bit about that.
IIRC, I think you just define TX as 9 bits, and align so the final sent bit is 1 - with the smart pins, you can thus define any number of stop bits this way, up to the 32b field width,
the first number is sysclock taken for test, thus negative on errors
the number after PASS is the effective baudrate inclding code overhead and the third number the derivation in sysclocks per byte, because of that overhead.
Interesting effect, - that seems quite a few Sysclks overhead, for a modest baud rate for P2 ?
Your times :
45061683/180M = 0.25034268333333333333
45061619/180M = 0.25034232777777777778
45061723/180M = 0.25034290555555555556
Possible TX times (following usual UARTS granularity )
16k*10/691200 = 0.23148148148148148148
16k*11/691200 = 0.25462962962962962963 - hmm, you get somewhere in the middle
Equivalent Stop Bit time
16k*10.814813/691200 = 0.25034289351851851852
Expressing that as SysCLKs
(180M/691200)*0.814813 = 212.190 (not quite your 146?)
Notice that elapsed time, is not a whole bit time. Most uarts derive a BAUD clock, and all TX's align to that.
That means sending "UUUUUUU" gives an exact baud/2 (5 pulses in 10 bit times) on most UARTS I've tested.
At 691200, you should have ~ whole bit time (~130 opcodes) from char-done, to load-next-char, for the interrupt, & more if the P2 interrupts on Tx buffer emptied.
What is the exact timing of the TX interrupt ?
Does P2 reset the TX timing on every byte ? (or is it jittering between 10 & 11 bits/char)
I would have expected P2 to be able to pack bytes continually in Tx and Rx. (certainly at 691200)
It certainly needs to be able to receive bytes continually (no gaps)
Any same-COG test is going to somewhat naturally self-pace. but a 2 COG test might have skews in paths in echo ?
The smart pin serial modes can absolutely send and receive gapless data.
I thought they could/should.
Is the UART TX baud-aligned between bytes ? (ie are fractional bit times between bytes impossible ?)
How much time margin is there on the TX side, and RX side, for interrupts ?
That would be useful in the DOCs, to see exactly when the TX and RX interrupts fire, and the best way to manage normal data, and RS485 data (which needs to wait for end of stop bit, before change of direction)
yeah I think my measured times are not correct, it is time needed for send 16k and receive 16K async plus time to read and write to and from the HUB with wrbyte/rdbyte.
I think the pins are transmitting gapless. The 1 COG talking to itself driver runs astonishingly 90Mbaud with 180Mhz. Just when using my echo server on another COG in between it breaks down.
That might be a problem of my echo server, I just threw it together in Spin, maybe doing inline PAM can do better as fastspin, but I think the problem is still a foot away from the screen I am looking at.
EDIT: I should be more precise here, the pins are transmitting at 90Mbaut with 180Mhz clock, but my driver is not fast enough to feed them constantly so the driver maxes out at around 70Mbaud or so,
I was not able to even envision how to use a Interrupt for sending, because when it fires at the time it can send, and I have nothing to send everything stops.
So I used INT 3 in mode #1 just firing every x clocks (currently 100) and checking if it has something to output in its buffer and can ctually output on the smartpin for both TX1 and TX2
The rest of the COG just takes care of the mailbox and transferring data from/to buffers and HUB.
Sadly I am running out of space and have to rethink, because I currently use LUT as buffer for bytes, but save them as longs in the LUT. thus wasting a lot of buffer space. Currently I have 4 128 byte buffer for RX1,TX1,RX2,TX2 but if I could address the LUT byte wise I could have 4 1K buffers.
I just need to figure out some small way to replace wrlut x,y/rdlut x,y with some call to something addressing bytes in the lut. And I am at 480 longs right now …
I do have a index from 0 to buffer-size for each buffer (currently in longs) and would like to access the LUT byte-wise. I do have very less code space left, but I have still reused init code space for variables.
' I want to replace all wrlut's and rdlut's used right now
'current code something like this
.rx1block cmp rx1cmd, #0 wz 'need more bytes?
if_z jmp #.done 'no - done
'
cmp rx1_head, rx1_tail wz 'byte received?
if_z ret 'no - try again don't block the rest
'
mov rx_address, rx1_tail 'adjust to buffer start
add rx_address, rx1_lut_buff 'by adding rx1_lut_buff
rdlut rx_char, rx_address 'get byte from circular buffer in lut
incmod rx1_tail, rx1_lut_btop 'increment buffer tail
wrbyte rx_char, rx1param 'write byte to Block
add rx1param, #1 'adjust Block address
_ret_ sub rx1cmd, #1 'adjust count - try again don't block the rest
'
' now I want to use rx1_head rx1_tail, rx1_lut_btop as bytes not longs as they are now
'
.rx1block cmp rx1cmd, #0 wz 'need more bytes?
if_z jmp #.done 'no - done
'
cmp rx1_head, rx1_tail wz 'byte received?
if_z ret 'no - try again don't block the rest
'
'new
* mov byte_index, rx1_tail
* and byte_index, #%11
* shl byte_index, #4
mov rx_address, rx1_tail 'adjust to buffer start
* shr rx_address, #2
add rx_address, rx1_lut_buff 'and adding rx1_lut_buff
rdlut rx_char, rx_address 'get long from circular buffer in lut
* shr rx_char, byte_index
* and rx_char, #$FF
'new
incmod rx1_tail, rx1_lut_btop 'increment buffer tail
wrbyte rx_char, rx1param 'write byte to Block
add rx1param, #1 'adjust Block address
_ret_ sub rx1cmd, #1 'adjust count - try again don't block the rest
Code execution from LUT is slower then Code execution from RAM and if I use alts/d + getbyte I can also use rdlut+getbyte, so no code space savings but slower execution?
Code execution in lutRAM is full speed with no penalties. Same as cogRAM. Only limitation is self-modifying doesn't have the flexibility of cogRAM.
RDLUT is the one that's slower. Although the biggest factor is GETBYTE can only be used upon cogRAM so any such use on data from lutRAM needs load and store operations around it.
The ALTxx prefixing instructions provide cogRAM table/buffer indexing in a very convenient package. The extra two clocks are easily made up for by their abilities.
I do know that rdlut need 3 clock instead of two, but I currently use all 512 LUT longs as - guess - Look Up Table, and am on the way to rework my code, I am down to 438 longs with long buffer addressing.
I am reworking the code to find any differences between the 1 pair of RX/TX to 2 pair of RX/TX. Fund some typos, but the main issue of 1 and 2 port failing with different errors has not lifted its head to greet me.
I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.
But when I save 4 bytes as longs in my buffer not 1 byte per long I will be able to move data faster between HUB and LUT, that will make a huge difference.
I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.
Chip has said the Smart Pins can manage gapless send and receive, (at least up to some high baud speeds).
It may be that echo needs asm coding, to copy incoming Rx byte to echo-Tx ?
Gapless UART transimission requires monitoring the smartpin IN status - Intended for event/IRQ generation. Using RDPIN can only tell when transmission has ceased.
Gapless UART transimission requires monitoring the smartpin IN status - Intended for event/IRQ generation. Using RDPIN can only tell when transmission has ceased.
Most the the P2 Smart pin DOC's are rather cryptic, but they do say this :
"X[5] selects the update mode:
X[5] = 0 sets continuous mode, where a first word is written via WYPIN during reset (DIR=0) to prime the shifter. Then, after reset (DIR=1), the second word is buffered via WYPIN and continuous clocking is started. Upon shifting each word, the buffered data written via WYPIN is advanced into the shifter and IN is raised, indicating that a new output word can be buffered via WYPIN. This mode allows steady data transmission with a continuous clock, as long as the WYPIN’s after each IN-rise occur before the current word transmission is complete.
X[5] = 1 sets start-stop mode, where the current output word can always be updated via WYPIN before the first clock, flowing right through the buffer into the shifter. Any WYPIN issued after the first clock will be buffered and loaded into the shifter after the last clock of the current output word, at which time it could be changed again via WYPIN. This mode is useful for setting up the output word before a stream of clocks are issued to shift it out.
X[4:0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.
WYPIN is used to load the output words. The words first go into a single-stage buffer before being advanced to the shifter for output. Each time the buffer is advanced into the shifter, IN is raised, indicating that a new output word can be written via WYPIN. During reset, the buffer flows straight into the shifter.
"
That does mention a separate buffer and shifter, so they should have a queue of about 1 char time, so update jitter within that window, should still give gapless transmit.
Hence this statement "This mode allows steady data transmission with a continuous clock, as long as the WYPIN’s after each IN-rise occur before the current word transmission is complete."
I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.
Chip has said the Smart Pins can manage gapless send and receive, (at least up to some high baud speeds).
It may be that echo needs asm coding, to copy incoming Rx byte to echo-Tx ?
since fastspin produces pasm I was not thinking so, but I can use the serial driver directly from pasm, so that is one of the next options
Gapless UART transimission requires monitoring the smartpin IN status - Intended for event/IRQ generation. Using RDPIN can only tell when transmission has ceased.
yes, I do use events/interrupts for reading the serial RX pins, int1 for RX1 and int2 for RX2 and that seems to work flawless and gapless (as long I can keep up reading my buffer) .
but a big setback is this
error: Third operand to setbyte must be an immediate
same with getbyte. That is bad.
because now I need 4 cmp and 4 getbytes/setbytes
so
rx1_isr rdpin rx1_char, rx1_pin 'get received chr
shr rx1_char, #32-8 'shift to lsb justify
mov rx1_byte_index, rx1_head
and rx1_byte_index, #%11
mov rx1_address, rx1_head 'adjust to buffer start
shr rx1_address, #2
add rx1_address, rx1_lut_buff 'by adding rx1_lut_buff
rdlut rx1_lut_value, rx1_address
setbyte rx1_lut_value, rx1_char, rx1_byte_index
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment buffer head
cmp rx1_head, rx1_tail wz 'hitting tail is bad
if_z incmod rx1_tail, rx1_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti1 'exit
does not compile. Will need to do
rx1_isr rdpin rx1_char, rx1_pin 'get received chr
shr rx1_char, #32-8 'shift to lsb justify
mov rx1_byte_index, rx1_head
and rx1_byte_index, #%11
mov rx1_address, rx1_head 'adjust to buffer start
shr rx1_address, #2
add rx1_address, rx1_lut_buff 'by adding rx1_lut_buff
rdlut rx1_lut_value, rx1_address
cmp rx1_byte_index, #0 wz
if_z setbyte rx1_lut_value, rx1_char, #0
cmp rx1_byte_index, #1 wz
if_z setbyte rx1_lut_value, rx1_char, #1
cmp rx1_byte_index, #2 wz
if_z setbyte rx1_lut_value, rx1_char, #2
cmp rx1_byte_index, #3 wz
if_z setbyte rx1_lut_value, rx1_char, #3
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment buffer head
cmp rx1_head, rx1_tail wz 'hitting tail is bad
if_z incmod rx1_tail, rx1_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti1 'exit
instead?
well I am getting there, just running out of longs...
The ALTGB and ALTSB instructions allow you to override the fixed third argument of GETBYTE and SETBYTE instructions. There are other similar instructions to override fixed fields of other instructions too.
EDIT: Those instructions override both the D and N fields, allowing you to access all of cogram as a word, byte, or nibble array with only two instructions per access.
rx2_isr rdpin rx2_char, rx2_pin 'get received chr
shr rx2_char, #32-8 'shift to lsb justify
mov rx2_byte_index, rx2_head
and rx2_byte_index, #%11
mov rx2_address, rx2_head 'adjust to buffer start
shr rx2_address, #2
add rx2_address, rx2_lut_buff 'by adding rx1_lut_buff
rdlut rx2_lut_value, rx2_address
' neg rx2_byte_index
' add rx2_byte_index, #4
' add rx2_byte_index, #rx2_lut_value<<2
' altsb rx2_byte_index
' setbyte 0-0, rx2_char, #0-0
cmp rx2_byte_index, #0 wz
if_z setbyte rx2_lut_value, rx2_char, #3
cmp rx2_byte_index, #1 wz
if_z setbyte rx2_lut_value, rx2_char, #2
cmp rx2_byte_index, #2 wz
if_z setbyte rx2_lut_value, rx2_char, #1
cmp rx2_byte_index, #3 wz
if_z setbyte rx2_lut_value, rx2_char, #0
'
wrlut rx2_lut_value, rx2_address 'write byte to circular buffer in lut
incmod rx2_head, rx2_lut_btop 'increment buffer head
cmp rx2_head, rx2_tail wz 'hitting tail is bad
if_z incmod rx2_tail, rx2_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti2 'exit
I do want to replace the 8 lines following the out commented altsb block to save 3 longs, but it does not work, what I am doing wrong with altsb and setbyte?
I do want to replace the 8 lines following the out commented altsb block to save 3 longs, but it does not work, what I am doing wrong with altsb and setbyte?
There is code in the ROM_Booter source that shuffles bytes into long, for the checksum, so you could check that ?
You could also look at
RCZR D {WC/WZ/WCZ} Rotate C,Z right through D. D = {C, Z, D[31:2]}. C = D[1], Z = D[0].
Not sure if there is any non-destructive version of that ?
Which gets 2 bits into CZ, you can test for 4 packed statements.
Or, maybe this pair can be even faster ?
DECOD D,{#}S Decode S[4:0] into D. D = 1 << S[4:0].
and
SKIPF {#}D Skip cog/LUT instructions fast per D. Like SKIP, but instead of cancelling instructions, the PC leaps over them.
Mike,
I'm not sure why the RDLUT code is there but here's all I think you need in there:
rx2_isr
rdpin rx2_char, rx2_pin 'get received chr
shr rx2_char, #32-8 'shift to lsb justify
altsb rx2_head, #rx2_buffer
setbyte rx2_char
incmod rx2_head, rx2_lut_btop 'increment buffer head
cmp rx2_head, rx2_tail wz 'hitting tail is bad
if_z incmod rx2_tail, rx2_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti2 'exit
Hi ozpropdev. Do we know which particular ALTxx instructions are broken? I think we might have been using some of them for HDMI bitbang, though there are a few variants.
Comments
IIRC, I think you just define TX as 9 bits, and align so the final sent bit is 1 - with the smart pins, you can thus define any number of stop bits this way, up to the 32b field width,
Interesting effect, - that seems quite a few Sysclks overhead, for a modest baud rate for P2 ?
Notice that elapsed time, is not a whole bit time. Most uarts derive a BAUD clock, and all TX's align to that.
That means sending "UUUUUUU" gives an exact baud/2 (5 pulses in 10 bit times) on most UARTS I've tested.
At 691200, you should have ~ whole bit time (~130 opcodes) from char-done, to load-next-char, for the interrupt, & more if the P2 interrupts on Tx buffer emptied.
What is the exact timing of the TX interrupt ?
Does P2 reset the TX timing on every byte ? (or is it jittering between 10 & 11 bits/char)
I would have expected P2 to be able to pack bytes continually in Tx and Rx. (certainly at 691200)
It certainly needs to be able to receive bytes continually (no gaps)
Any same-COG test is going to somewhat naturally self-pace. but a 2 COG test might have skews in paths in echo ?
I thought they could/should.
Is the UART TX baud-aligned between bytes ? (ie are fractional bit times between bytes impossible ?)
How much time margin is there on the TX side, and RX side, for interrupts ?
That would be useful in the DOCs, to see exactly when the TX and RX interrupts fire, and the best way to manage normal data, and RS485 data (which needs to wait for end of stop bit, before change of direction)
I think the pins are transmitting gapless. The 1 COG talking to itself driver runs astonishingly 90Mbaud with 180Mhz. Just when using my echo server on another COG in between it breaks down.
That might be a problem of my echo server, I just threw it together in Spin, maybe doing inline PAM can do better as fastspin, but I think the problem is still a foot away from the screen I am looking at.
EDIT: I should be more precise here, the pins are transmitting at 90Mbaut with 180Mhz clock, but my driver is not fast enough to feed them constantly so the driver maxes out at around 70Mbaud or so,
I am still working on it,
Enjoy!
Mike
You don't seem to be using your board. How about I take it off your hands.
I was not able to even envision how to use a Interrupt for sending, because when it fires at the time it can send, and I have nothing to send everything stops.
So I used INT 3 in mode #1 just firing every x clocks (currently 100) and checking if it has something to output in its buffer and can ctually output on the smartpin for both TX1 and TX2
The rest of the COG just takes care of the mailbox and transferring data from/to buffers and HUB.
Sadly I am running out of space and have to rethink, because I currently use LUT as buffer for bytes, but save them as longs in the LUT. thus wasting a lot of buffer space. Currently I have 4 128 byte buffer for RX1,TX1,RX2,TX2 but if I could address the LUT byte wise I could have 4 1K buffers.
I just need to figure out some small way to replace wrlut x,y/rdlut x,y with some call to something addressing bytes in the lut. And I am at 480 longs right now …
I do have a index from 0 to buffer-size for each buffer (currently in longs) and would like to access the LUT byte-wise. I do have very less code space left, but I have still reused init code space for variables.
this adds 6 instructions can I do this shorter?
Enjoy
Mike
EDIT: Added the absolute addressing to the jump. Avoids a bug in Pnut.
My question was more if there is a faster way to access bytes out of a long out of the LUT. something like wrlut_byte(lutadrss, byte0-3)
But GETBYTE SETBYTE just run in COGRAM, Still maybe faster as my current attempt, will test.
Thanks,
Mike
Code execution from LUT is slower then Code execution from RAM and if I use alts/d + getbyte I can also use rdlut+getbyte, so no code space savings but slower execution?
confused
Mike
RDLUT is the one that's slower. Although the biggest factor is GETBYTE can only be used upon cogRAM so any such use on data from lutRAM needs load and store operations around it.
The ALTxx prefixing instructions provide cogRAM table/buffer indexing in a very convenient package. The extra two clocks are easily made up for by their abilities.
I do know that rdlut need 3 clock instead of two, but I currently use all 512 LUT longs as - guess - Look Up Table, and am on the way to rework my code, I am down to 438 longs with long buffer addressing.
I am reworking the code to find any differences between the 1 pair of RX/TX to 2 pair of RX/TX. Fund some typos, but the main issue of 1 and 2 port failing with different errors has not lifted its head to greet me.
I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.
But when I save 4 bytes as longs in my buffer not 1 byte per long I will be able to move data faster between HUB and LUT, that will make a huge difference.
At least this is my current working plan.
Enjoy!
Mike
Chip has said the Smart Pins can manage gapless send and receive, (at least up to some high baud speeds).
It may be that echo needs asm coding, to copy incoming Rx byte to echo-Tx ?
Most the the P2 Smart pin DOC's are rather cryptic, but they do say this :
"X[5] selects the update mode:
X[5] = 0 sets continuous mode, where a first word is written via WYPIN during reset (DIR=0) to prime the shifter. Then, after reset (DIR=1), the second word is buffered via WYPIN and continuous clocking is started. Upon shifting each word, the buffered data written via WYPIN is advanced into the shifter and IN is raised, indicating that a new output word can be buffered via WYPIN. This mode allows steady data transmission with a continuous clock, as long as the WYPIN’s after each IN-rise occur before the current word transmission is complete.
X[5] = 1 sets start-stop mode, where the current output word can always be updated via WYPIN before the first clock, flowing right through the buffer into the shifter. Any WYPIN issued after the first clock will be buffered and loaded into the shifter after the last clock of the current output word, at which time it could be changed again via WYPIN. This mode is useful for setting up the output word before a stream of clocks are issued to shift it out.
X[4:0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.
WYPIN is used to load the output words. The words first go into a single-stage buffer before being advanced to the shifter for output. Each time the buffer is advanced into the shifter, IN is raised, indicating that a new output word can be written via WYPIN. During reset, the buffer flows straight into the shifter.
"
That does mention a separate buffer and shifter, so they should have a queue of about 1 char time, so update jitter within that window, should still give gapless transmit.
Hence this statement "This mode allows steady data transmission with a continuous clock, as long as the WYPIN’s after each IN-rise occur before the current word transmission is complete."
since fastspin produces pasm I was not thinking so, but I can use the serial driver directly from pasm, so that is one of the next options
yes, I do use events/interrupts for reading the serial RX pins, int1 for RX1 and int2 for RX2 and that seems to work flawless and gapless (as long I can keep up reading my buffer) .
but a big setback is this
error: Third operand to setbyte must be an immediate
same with getbyte. That is bad.
because now I need 4 cmp and 4 getbytes/setbytes
so
does not compile. Will need to do
instead?
well I am getting there, just running out of longs...
maybe I can use altd/s/I to shorten that up
Mike
I am not following you, wtf is ALTGB/ALTBS?
Mike
The ALTGB and ALTSB instructions allow you to override the fixed third argument of GETBYTE and SETBYTE instructions. There are other similar instructions to override fixed fields of other instructions too.
EDIT: Those instructions override both the D and N fields, allowing you to access all of cogram as a word, byte, or nibble array with only two instructions per access.
Thank you @Electrodude,
Mike
.
I read the docu but I do something wrong
I do want to replace the 8 lines following the out commented altsb block to save 3 longs, but it does not work, what I am doing wrong with altsb and setbyte?
unsure,
Mike
You could also look at
RCZR D {WC/WZ/WCZ} Rotate C,Z right through D. D = {C, Z, D[31:2]}. C = D[1], Z = D[0].
Not sure if there is any non-destructive version of that ?
Which gets 2 bits into CZ, you can test for 4 packed statements.
Or, maybe this pair can be even faster ?
DECOD D,{#}S Decode S[4:0] into D. D = 1 << S[4:0].
and
SKIPF {#}D Skip cog/LUT instructions fast per D. Like SKIP, but instead of cancelling instructions, the PC leaps over them.
Are those 8 lines working and you need to find a 5 instruction replacement for them?
I'm not sure why the RDLUT code is there but here's all I think you need in there:
IIRC sign extension (negative deltas)?