I found the problem. I was using the SCL instruction, instead of SUB+ABS+SHR+CMP, and the values were overflowing $4000 (1.0), due to dead time between characters. It was causing frequent failures that I couldn't see from the LED's. I thought to slow things down and then I could see some unexpected flickering. Got that fixed, but it cost two MAX instructions, which puts it back to the same size as before, but this method is much safer. These overflow problems might have been giving me subtle trouble, all along.
It's running very well now at 2M baud. It even runs at 3M baud if you pace the characters.
'
'
' Autobaud ISR - detects "> "
'
' falls |--7---|
' $3E -> ..10011111001..10000001001..
' highs |-5-|
'
autobaud_isr rdpin a0,#rx_tne '2 get fall-to-fall time (7x if $3E)
rdpin a1,#rx_ths '2 get high time (5x if $3E)
cmpr a0,limit wc '2 make sure both measurements are within limit
if_nc cmpr a1,limit wc '2
scl a0,norm0 '2 if they are within 1/64th of each other, $3E
if_nc cmpr a1,0 wc '2
scl a1,norm1 '2
if_nc cmpr a0,0 wc '2
if_c reti1 '2/4 if not $3E, exit
resi1 '4 got $3E, resume on next interrupt
akpin #rx_tne '2 acknowledge pin
dirl #rx_rcv '2 reset receiver
mul a0,baud0 '2 compute baud rate
setbyte a0,#7,#0 '2 set 8 bit word size
wxpin a0,#rx_rcv '2 set receiver baud rate and word size
resi1 '4 resume on next interrupt
akpin #rx_tne '2 acknowledge pin
dirh #rx_rcv '2 enable receiver before next start bit
mov t0,a0 '2 save baud rate for transmitter
mov ijmp1,#autobaud_isr '2 point back to initial ISR
reti1 '4 exit
limit long $5A04 'count limit ($5A04 = 1.4065, keeps SCL within $7FFF)
norm0 long $4100*5/7 'fall-to-fall normalization factor ($4100 = 1.015625)
norm1 long $4100*7/5 'high-time normalization factor
baud0 long $1_0000/7 'fall-to-fall baud computation factor
'
'
' Receiver ISR
'
receiver_ISR rdpin a2,#rx_rcv '2 get received chr
shr a2,#32-8 '2
wrlut a2,head '2 enter chr into receiver buffer
incmod head,#lut_btop '2
reti2
The SCL instruction does a signed multiply on the low words of the operands and returns a signed result whose LSB is bit 14 of the product, unlike bit 16 for SCLU (scale, unsigned). No result is written, but the result is substituted into the next instruction's S operand. This allows accumulation, comparison, etc, without requiring an extra register. Ariba came up with this scheme, which is pretty ingenious, I think. It works well here. We just need to limit the values so they remain positive through multiplication.
It's running very well now at 2M baud. It even runs at 3M baud if you pace the characters.
Good speed numbers, but does this still require a double-Autobaud before RX starts ? (which has sync issues, as you cannot guarantee pairs are always seen by P2 )
The ">" gives you 3 bit times at 1 Stop and 4 bit times at 2 Stop - seems enough ? - what Baud rate is possible, if you drop the double-char dictate ?
What UART do you use to test ?
The 2MBd may be on the lucky side, as I calculate better than one part in 64 measure, needs <= 1.5625MBd at 20MHz SysCLK
Does it need to be 1/64 for 5:7 ?
I think it can be <= 1/32 ( which can reach 3MBd for a better than one part in 32 measurment ?)
Also, 20 SysCLKS is 1us, which consumes 100% of CPU when receiving 2MBd 0x55, and proportionally less on fewer =\_ chars.
(or 50% of CPU at 1MBd, which shows how much possible SHA time, this super-jump capable AutoBaud is consuming.
A test string should have at least one 0x55, and ideally more, maybe 3 in a row ?
Another means to avoid this saturate effect, would be to skip using 0x55 in the Base64.
I already did that, expecting to use 0x55 as the tracking character.
Jmg, you are correct. Autobauding continuously is a huge bandwidth eater. I'm going back to the initial-then-maintenance technique. It is most practical that a host asserts reset, then communicates. I think I will add a 60-second terminal timeout, in case nothing happens, so that the chip will power down.
Jmg, you are correct. Autobauding continuously is a huge bandwidth eater. I'm going back to the initial-then-maintenance technique. It is most practical that a host asserts reset, then communicates.
Sounds good. I could not think of any low-overhead way to do Full-Range-AutoBaud continually, but tracking-trim via 0x55 should be low overhead, and able to track rather larger than the standard BAUD error bands. (allowing longer pauses in any download)
I think I will add a 60-second terminal timeout, in case nothing happens, so that the chip will power down.
OK. You could nudge that timeout up a little, as some 'modern networks & systems' can take quite some time to get their ducks in a row.
(eg someone may do a power-cycle, on something like a RaspPi, and it needs to boot, then start the link.)
Maybe 10 mins ?, or someone may have other numbers they can offer.. ?
Jmg, you are correct. Autobauding continuously is a huge bandwidth eater. I'm going back to the initial-then-maintenance technique. It is most practical that a host asserts reset, then communicates....
What is the current command codes for Half-duplex (one-Pin) and ENQ for OK or ERR signaling (reply now one char?)
My code has assumed two AutoBaud chars, one for AutoBaud_Set_Duplex, and another for AutoBaud_Set_HalfDuplex, as that simplifies the polling.
a)
repeat
Tx(AutoBaudChar) // Assumes AutoBaud echos when command checked and decoded.
until (Rx=Ack)
However, there are other solutions too :
You can reserve one char for First(raw) AutoBaud like '@', and test carefully for just that one value. Here tLL:7, tRR:8 = Valid.
Then, you need some echo to confirm, and I think you mentioned some single char ENQ commands are now there ?
The Simplex command can stand-alone, or it can alias with an ENQ, so a Simplex command always echos something as confirmation.
A complement SetDuplex char could make testing faster, and can give 2 similar commands.
b)
repeat
Tx(AutoBaudChar) // No echo
Tx(SetSimplexChar) // included Echo is simpler
until (Rx=Ack_Simplex)
or
c)
repeat
Tx(AutoBaudChar) // No echo
Tx(SetSimplexChar) // No echo
Tx(EnqChar) // if SetSimplex does not include echo
until (Rx=Ack)
b) is simpler than c)
This 2 & 3 char pattern, dictates that those command chars are carefully chosen to avoid any possible false triggers of AutoBaud Char, at any phase of repeat reset exit.
This leads to
Possible Simplex and Duplex commands : MUST avoid tLL:7, tRR:8 result ratios
; 02D "-" 0b 0010 1101 :
; =======\_s_/=0=.=1=\_2_/=3=\_4_/=5=\_6_._7_/=P==T=\_s_._0_._1_._2_._3_._4_._5_/=6=\_7_/=P==T=\_s_
; tFF 3|r 2|r 2|r 3+T|r 8|r 2+T |
; tLL 1| 1| 1| 2 | 7| 1|
^x ^x ^x ^x ^V ^x
; 03D "=" 0b 0011 1101 :
; =======\_s_/=0=.=1=\_2_/=3=.=4=.=5=\_6_._7_/=P==T=\_s_._0_._1_._2_._3_._4_._5_/=6=\_7_/=P==T=\_s_
; tFF 3|r 4|r 3+T|r 8|r 2+T |
; tLL 1| 1| 2 | 7| 1|
^x ^x ^x ^V ^x
; Both look ok for <> 7/8 in any phase.
These are easy to remember symbolically, for one and two wires...
"-" is then Set-Simplex & echo -> Set link as Simplex & report ok.err since last inquiry.
"=" is then Set-Duplex & echo -> Set link as Duplex, & report ok.err since last inquiry.
"@" is AutoBaud raw char, and a simple NOP when AutoBaud is in tracking/maintenance mode, to tolerate link latencies.
"U" (0x55) is tracking/maintenance Baud char
This is compatible with my smaller Base64 set of "0"-"9", then "A"-"w", skipping "U"
A test for this, is to send a long repeating string of "@-" or "@=" to a P2 coming out of reset.
Echo should be some lesser number of ok.Acks.
I've got everything running at 1.75Mbaud now, at 20MHz. I hope to get it a little faster, but derating for real-world RC variance means we are still solid at 1Mbaud.
I'm using ">" as the autobaud character. It needs an initial ">>" and periodic ">" characters to maintain baud. It times out at 60 seconds, fastest case, and then powers down as much as possible. So, now, a reset is needed to get its attention, which would have always a practical requirement, anyway.
Realizing a need for fast bit tables, I added a new instruction: ALTB. It's like ALTD, but instead of adding D[8:0] and S/#[8:0] to get a D field for the next instruction, it adds D[13:5] and S/#[8:0], so that D can be a bit number. This works nicely with the bit instructions to make randomly-accessible bit fields which can span the entire cog register memory:
ALTB bitnum,#bitbase 'uses bitnum[13:5] as long index
TESTB 0,bitnum WC 'uses bitnum[4:0] as bit index
(use C)
bitbase res 8 'field of 256 bits
In the booter ROM, this is useful for quickly checking if we have 1 of 7 whitespace characters.
Note: To make room for ALTB, I got rid of SETBYTS which set all D bytes to S/#[7:0].
I've got everything running at 1.75Mbaud now, at 20MHz. I hope to get it a little faster, but derating for real-world RC variance means we are still solid at 1Mbaud.
I'm using ">" as the autobaud character. It needs an initial ">>" and periodic ">" characters to maintain baud. It times out at 60 seconds, fastest case, and then powers down as much as possible. So, now, a reset is needed to get its attention, which would have always a practical requirement, anyway.
Some issues with that :
* ">" has more hi-time than "@" and has less sampling time, so lower precision.
* Double chars are a problem to manage - eg how does the host know when the P2 has come out of reset ?
* The "@" has tLL = 7, tFF =8, and gives a result on end of 6th bit - is that not enough time to use a single Char AutoBaud ?
Did you look at 0x55 as the periodic char to maintain baud, as the checking overhead on that is way less, than on ">" or "@" ?
ie it gives you the most cycles for SHA work.
Jmg, here is the current code. Do you see any improvements possible here? It works just fine, but could maybe be faster.
'
'
' Autobaud ISR - detects initial "> "
'
' falls |--7---|
' $3E -> ..10011111001..10000001001..
' highs |-5-|
'
autobaud_isr rdpin a0,#rx_tne '2 get fall-to-fall time (7x if $3E)
rdpin a1,#rx_ths '2 get high time (5x if $3E)
cmpr a0,limit wc '2 make sure both measurements are within limit
if_nc cmpr a1,limit wc '2
scl a0,norm0 '2 if they are within 1/35th of each other, $3E
if_nc cmpr a1,0 wc '2
scl a1,norm1 '2
if_nc cmpr a0,0 wc '2
if_c reti1 '2/4 if not $3E, exit
resi1 '4 got $3E, resume on next interrupt
akpin #rx_tne '2 acknowledge pin
mul a0,baud0 '2 compute baud rate
setbyte a0,#7,#0 '2 set word size to 8 bits
wxpin a0,#rx_rcv '2 set receiver baud rate and word size
resi1 '4 resume on next interrupt
dirh #rx_rcv '2 enable receiver before next start bit
wrpin mtpe,#rx_tne '2 change rx_tne to measure positive edges
setse1 #%110<<6+rx_rcv '2 set se1 to trigger on rx_rcv high
mov t0,a0 '2 save baud rate for transmitter
resi1 '4 resume on next interrupt
'
'
' Receiver ISR - detects maintenance ">" chrs
'
' rises |--7---|
' $3E -> ..10011111001..
'
rdpin a0,#rx_tne '2 get rise-to-rise time (7x if $3E)
rdpin a1,#rx_rcv wc '2 get received chr
if_c reti1 '2/4 ignore if msb set
shr a1,#32-8 '2 shift to lsb justify
cmp a1,#">" wz '2 autobaud chr?
if_nz wrlut a1,head '2 enter chr into receiver buffer
if_nz incmod head,#lut_btop '2 increment buffer head
if_nz reti1 '4 exit
.baud mul a0,baud0 '2 autobaud chr, compute baud rate
setbyte a0,#7,#0 '2 set word size to 8 bits
wxpin a0,#rx_rcv '2 set receiver baud rate and word size
mov t0,a0 '2 save baud rate for transmitter
reti1 '4 exit
limit long $58E4 'count limit ($58E4 = 1.3889, keeps SCL within $7FFF)
norm0 long $41D4*5/7 'fall-to-fall normalization factor ($41D4 = 1.0 + 1/(7*5))
norm1 long $41D4*7/5 'high-time normalization factor
baud0 long $1_0000/7 '7x baud computation factor
Jmg, here is the current code. Do you see any improvements possible here? It works just fine, but could maybe be faster.
1) I think there is just enough time, to AutoBaud from a single char, inside the first interrupt ? On ">" and maybe on "@"
ie by some code order shuffling like
' Autobaud Raw ISR - detects initial single AutoBaud Char "> "
' Another valid command may be right behind this, so timing matters to enable RX, do RX side asap
'
' falls |--7---|
' $3E -> ..10011111001..10000001001..
' highs |-5-|
'
autobaud_raw_isr rdpin a0,#rx_tne '2 get fall-to-fall time (7x if $3E)
rdpin a1,#rx_ths '2 get high time (5x if $3E)
cmpr a0,limit wc '2 make sure both measurements are within limit
if_nc cmpr a1,limit wc '2
scl a0,norm0 '2 if they are within 1/35th of each other, $3E
if_nc cmpr a1,0 wc '2
scl a1,norm1 '2
if_nc cmpr a0,0 wc '2
if_c reti1 '2/4 if not $3E, exit
' Passes Baud Char ratio tests, so extract time information, and apply to RX
mul a0,baud0 '2 compute baud rate
setbyte a0,#7,#0 '2 set word size to 8 bits
wxpin a0,#rx_rcv '2 set receiver baud rate and word size
dirh #rx_rcv '2 enable receiver before next start bit
' Rx done first, now can complete other housekeeping ~ 28 SysCLKs, to start RX engine
' budgets: 2.5 bits, at 1.75Mbd is 28.57 SysCLKs
' ">" edges have just under 3 or 4 bits of timing margin, for 1 or 2 Stop Bits.
wrpin mtpe,#rx_f5e '2 change rx_tne mode to measure 0x55 falling edges
' etc
mov t0,a0 '2 save baud rate for transmitter
reti1 '4 exit
RX setup gets pushed up earlier, as far as possible, and any Capture Mode changes can run after the RxPin is enabled.
ie This looks to have margin at your 1.75Mbd & ">" to do this in a single Char.
2) If you test for a valid ">" before applying AutoBaud-trim, that is a quite narrower catch range, as you must be inside valid-baud limits.
In contrast a 0x55 test needs only pass the Highest Frequency test, which can be something over 10%, so you can tolerate much longer pauses / drift.
I get ~15% minus Baud Tolerance, and no imposed limit on Baud Increased.
The code is very similar to what you have above, just a SmartPin cell mode change to X=5 edges-Time-capture.
Total flight time is less in re-trim, which matters as the RxINT may have been late
' Receiver ISR - detects maintenance "U" chrs, does not RxBuffer a "U"
'
' Falls 1 2 3 4 5
' $55 -> ..10101010101 Capture for 5 edges,
'
rdpin a0,#rx_tne '2 get X=5 Cycles time
wrpin ??,#?? '2 opcode to re-arm the next PinCell X=5 capture, < 5 will read 0, >= 5 is dT
cmp a0,MaintTrim wz '2 MaintTrim is time for 5 edges * ~90%, next slowest is > 20% longer
' branch here, nz is RX, z is update Baud
' Need prompt Baud update, as 0x55 can be followed by valid RX, and RI may be late due to drift. Use 2 Stop bits for highest Baud rates.
Rx:
rdpin a1,#rx_rcv wc '2 get received chr
shr a1,#32-8 '2 shift to lsb justify
...
wrlut a1,head '2 enter chr into receiver buffer
incmod head,#lut_btop '2 increment buffer head
reti1 '4 exit
.baud mul a0,baud0 '2 autobaud chr, compute baud rate, based on 5=\_ = 8b times
setbyte a0,#7,#0 '2 set word size to 8 bits
wxpin a0,#rx_rcv '2 set receiver baud rate and word size
mov t0,a0 '2 save baud rate for transmitter
reti1 '4 exit
Tightest times look to be on drift slower, so maybe a Smart Pin X=5 mode can interrupt, which only occurs on 0x55, as RxInt re-arms the counter. (all other possible chars are 4 or less, so no int).
This buys 1.5 more bit times for the detects maintenance case, as the baud INT is from last data bit fall, not mid-stop bit
ie, it then codes something like this, which looks smaller and even faster on Rx-chars
' Maint ISR - detects maintenance "U" chrs, does not RxBuffer a "U"
'
' Falls 1 2 3 4 5
' $55 -> ..10101010101 Capture for 5 edges,
'
Maint_ISR: ' At start of last Data bit, 1.5 bit times before Mid-Stop
rdpin a0,#rx_t5f '2 get X=5 Cycles time
' no value test needed as the 5 edges within RIs is enough, every RI re-arms the X=5 capture.
.baud mul a0,baud0 '2 autobaud chr, compute baud rate, based on 5x
setbyte a0,#7,#0 '2 set word size to 8 bits
wxpin a0,#rx_rcv '2 set receiver baud rate and word size << Re-Prime RX needs to be ASAP, for late RI cases.
mov t0,a0 '2 save baud rate for transmitter
reti1 '4 exit
' Receiver ISR - Mid Stop Bit , any 0x55 is trapped above, this code can purely receive.
Rx_ISR:
wrpin ??,#?? '2 opcode to re-arm the next PinCell X=5 capture, < 5 will read 0, >= 5 is dT
rdpin a1,#rx_rcv wc '2 get received chr
shr a1,#32-8 '2 shift to lsb justify
...
wrlut a1,head '2 enter chr into receiver buffer
incmod head,#lut_btop '2 increment buffer head
reti1 '4 exit
This also shifts the catch range, now, any step Up in Baud (down in SysCLK), is fine, as 0x55 occurs before RI int.
Drifts down in Baud (Up in SysCLK) need to react & reset before the Start bit, and may get a bit tangled if Maint_ISR and RI drift into the same time-space.
If Maint_ISR completes before RI, (and I guess, removes it, by reset of Rx state machine via new Baud) then it needs only that Maint_ISR enters first.
The margin for that is ~ +15.79% in SysCLK
Re the Raspberry Pi, bootup time using the 'Ultibo' baremetal kernel is less than one second.
There's also a healthy overlap between the Ultibo and P2 communities.
Realizing a need for fast bit tables, I added a new instruction: ALTB. ...
That's cheating!
Look at how it got rid of a string of seven 'IF_NZ CMP X,#whitechr WZ' instructions that was slowing things down:
'
' Get chr after any whitespace
'
get_chr call #get_rx 'get byte into x
altb x,#whitespace 'whitespace?
testb 0,x wz
if_nz jmp #get_chr 'if whitespace, get another byte
ret
whitespace long %00000000_00000000_00100110_00000000 'cr, lf, tab
long %00100000_00000000_01000000_00000011 '"=", ".", "!", space
long %00000000_00000000_00000000_00000000
long %00000000_00000000_00000000_00000000
Now, all seven whitespace characters are detected in just two instructions. The code is even one long smaller.
Jmg, I'll need to get some sleep before I'm going to understand fully what you posted. Thanks a lot for thinking about this.
Checking the ranges, I get this for 0x55 "Baud maintenance", two interrupts, 't5 Monostable' design.
* a -15.79% catch range in apparent Baud slow down (User lowers Baud, or SysCLK drifts up)
* an almost unlimited +% catch in apparent Baud increase. (you would likely have some upper sanity check ceiling ~ 2MBd? )
Inside the -15.79% drift, the earlier 0x55 INT resets the UART, so strips any pending RI (double INTs are avoided.)
Outside the -15.79% drift, in this extreme unworkable case, RI hits before 0x55, (& is certainly corrupt data), and it will reset the 0x55 path before it is read. ( I think this also avoids double INTs ) This is an outside-spec zone, but should avoid lockouts.
Note this is much wider catch range than your original Check Rx for ">", and it removes the check code from default Rx INT path.
I doubt the Osc drift, from a recent Reset/RawAutoBaud to next BaudTrim maintenance, will be anywhere near +/-15.79%, but it does mean longer pauses, or higher slews, are tolerated.
( eg I can imagine one use of many P2's in a T&M cycling chamber, capturing things like oscillator module frequencies. Ideally, tray measure does not include a re-boot, but it is possible a new-tray load, could need a new-code load, and that would be done on a high temp slew )
One practical use of a nice wide Up-Speed ability, is you can work to the PVT 20MHz corner, sync the P2, then read the actual RCOsc
(via a command to read your t0 above) .
Anyone on a system with good baud granularity, can then adjust Baud to remove the fat safety margin, and send a 0x55 then data. P2 will change gears on the fly.
It also allows a system wide bootup, and 'who is connected' check, at some lower clock speed on all parts, then when the 'board test' part is done, you change UART in the Download section, 10x+ (eg 115200 -> 1.5MBd) without needing a P2 reset.
Have OnSemi given you the spec margins for an uncalibrated Oscillator ? Was that 30% meaning +/- 30% ?
(I guess some of this will depend on how tightly you spec Vcc +/- 10% is common, but would give worse spans than +/- 5% )
You may need to spec two spreads, if you want to offer a similar T,V range to P1 ?
Addit: the 0x55 capture, is based on this Smart Pin mode
%10011 = for X periods, count time
which I have taken as capable of capturing time, over X-Whole-Periods. (here X=5)
As a period measure, it needs to Arm, wait for the next (=\_) edge then start timing, and counting periods (+1 on each =\_).
After 5 whole periods, it stores the 5P time, Clears time, and signals Done, & interrupts if enabled.
It does not re-arm, at least until read.
Assumes some means exists to re-set the monostable nature of this, eg Reset of Mode, clears all totals, and rearms.
Comment: It is unclear if RDPIN alone is enough to re-arm a Smart-Pin capture, but there could be use for an option to read with/without re-arm ?
It is certainly useful to configure many pin-cells, then arm all of them in the same SysCLK.
Slight diversion :
FWIR, the related mode, %10100 = For X periods, count states actually means collect tHH over X whole periods.
A precision duty cycle would then use two pin cells, with the common Arm mentioned above.
eg
Enable PinCell to X-tFF mode (%10011)
Enable PinCell to X-tHH mode (%10100)
Issue atomic Arm command to both, they then both wait for the same-next-edge to start*
Wait for interrupt.
Precision Duty = tHH/tFF
* to avoid partial-gate & phase errors, both tHH & tFF need to wait for the same specified (next) arming-edge
Re the Raspberry Pi, bootup time using the 'Ultibo' baremetal kernel is less than one second.
There's also a healthy overlap between the Ultibo and P2 communities.
That's another good spec-point.
I was mostly thinking about the 'any old/slow/non optimised Pi-like systems', such as may be lying about in classrooms & Labs.
There, a 60s timeout inside P2 could be a little short, based on the forum discussions.
During this time, P2 is running no code. just waiting on interrupts - the Raw AutoBaud one, or the Timeout.
Icc should be modest, at the RC osc into one COG ?
I see 32b at 20MHz is just over 3 1/2 minutes - seems a simple solution ?.
Why does the timeout have to be more than a few seconds at the most? Shouldn't whatever's programming it reset it first anyway, in case it's already running something and not waiting for a program?
Why does the timeout have to be more than a few seconds at the most? Shouldn't whatever's programming it reset it first anyway, in case it's already running something and not waiting for a program?
Sure, but it is also nice to have a System-Power outage and recovery, be able to boot normally, and that can occur with no additional reset lines, if you take simple care on the timing.
Given nothing is happening anyway, there is little down side to a longer timeout. The power impact looks modest, and only for a short time after a power cycle (which should be rare).
If someone does manage the reset, they are not affected.
Minor detail I notice, but a possible failure window, is that ">" has an alias case with stop-bits = 5.
That is rare, but not impossible, especially at higher baud rates.
There is a 'mirror' version of ">" in "0", which has same tFF=7, and uses tLL=5 instead of the tHH=5
By measuring tLL, you avoid the stop-bit alias case.
Look at how it got rid of a string of seven 'IF_NZ CMP X,#whitechr WZ' instructions that was slowing things down:
'
' Get chr after any whitespace
'
get_chr call #get_rx 'get byte into x
altb x,#whitespace 'whitespace?
testb 0,x wz
if_nz jmp #get_chr 'if whitespace, get another byte
ret
whitespace long %00000000_00000000_00100110_00000000 'cr, lf, tab
long %00100000_00000000_01000000_00000011 '"=", ".", "!", space
long %00000000_00000000_00000000_00000000
long %00000000_00000000_00000000_00000000
Now, all seven whitespace characters are detected in just two instructions. The code is even one long smaller.
Clever!
Does this mean you settled on 7-bit serial? I lost track of the conversation...
Look at how it got rid of a string of seven 'IF_NZ CMP X,#whitechr WZ' instructions that was slowing things down:
'
' Get chr after any whitespace
'
get_chr call #get_rx 'get byte into x
altb x,#whitespace 'whitespace?
testb 0,x wz
if_nz jmp #get_chr 'if whitespace, get another byte
ret
whitespace long %00000000_00000000_00100110_00000000 'cr, lf, tab
long %00100000_00000000_01000000_00000011 '"=", ".", "!", space
long %00000000_00000000_00000000_00000000
long %00000000_00000000_00000000_00000000
Now, all seven whitespace characters are detected in just two instructions. The code is even one long smaller.
Clever!
Does this mean you settled on 7-bit serial? I lost track of the conversation...
It's 8-bit serial, but any characters with bit 7 set are ignored at the ISR level, so they don't get through.
Minor detail I notice, but a possible failure window, is that ">" has an alias case with stop-bits = 5.
That is rare, but not impossible, especially at higher baud rates.
There is a 'mirror' version of ">" in "0", which has same tFF=7, and uses tLL=5 instead of the tHH=5
By measuring tLL, you avoid the stop-bit alias case.
Good eye. This could only blow up during initial autobaud, right? Once we sync, there's no possibility of alignment problems, assuming the baud rate doesn't change too fast.
Good eye. This could only blow up during initial autobaud, right?
Correct, it is a rare, but possible combination, only on the Raw Baud step.
One of my pencil tests is to consider the P2 coming out of reset in any bit-slot, and with any number of stop bits.
See my post above about repeating autobaud loops, where you would send a pair of chars.
The AutoBaud filter rejects the second char, and waits until the correct phase Raw Baud char.
Then, the second char passes into RX, and sets one/two pin modes, and echoes, allowing the host to sense exactly when
the P2 has come out of reset.
Comments
Does it looks like Opcode or Timing or Dual Interrutpt or Smart pin interface or smart pin cell issue ?
The SCL instruction does a signed multiply on the low words of the operands and returns a signed result whose LSB is bit 14 of the product, unlike bit 16 for SCLU (scale, unsigned). No result is written, but the result is substituted into the next instruction's S operand. This allows accumulation, comparison, etc, without requiring an extra register. Ariba came up with this scheme, which is pretty ingenious, I think. It works well here. We just need to limit the values so they remain positive through multiplication.
It was me, fortunately. The hardware is fine.
Good speed numbers, but does this still require a double-Autobaud before RX starts ? (which has sync issues, as you cannot guarantee pairs are always seen by P2 )
The ">" gives you 3 bit times at 1 Stop and 4 bit times at 2 Stop - seems enough ? - what Baud rate is possible, if you drop the double-char dictate ?
What UART do you use to test ?
The 2MBd may be on the lucky side, as I calculate better than one part in 64 measure, needs <= 1.5625MBd at 20MHz SysCLK
Does it need to be 1/64 for 5:7 ?
I think it can be <= 1/32 ( which can reach 3MBd for a better than one part in 32 measurment ?)
Also, 20 SysCLKS is 1us, which consumes 100% of CPU when receiving 2MBd 0x55, and proportionally less on fewer =\_ chars.
(or 50% of CPU at 1MBd, which shows how much possible SHA time, this super-jump capable AutoBaud is consuming.
A test string should have at least one 0x55, and ideally more, maybe 3 in a row ?
Another means to avoid this saturate effect, would be to skip using 0x55 in the Base64.
I already did that, expecting to use 0x55 as the tracking character.
Sounds good. I could not think of any low-overhead way to do Full-Range-AutoBaud continually, but tracking-trim via 0x55 should be low overhead, and able to track rather larger than the standard BAUD error bands. (allowing longer pauses in any download)
OK. You could nudge that timeout up a little, as some 'modern networks & systems' can take quite some time to get their ducks in a row.
(eg someone may do a power-cycle, on something like a RaspPi, and it needs to boot, then start the link.)
Maybe 10 mins ?, or someone may have other numbers they can offer.. ?
What is the current command codes for Half-duplex (one-Pin) and ENQ for OK or ERR signaling (reply now one char?)
My code has assumed two AutoBaud chars, one for AutoBaud_Set_Duplex, and another for AutoBaud_Set_HalfDuplex, as that simplifies the polling.
a)
repeat
Tx(AutoBaudChar) // Assumes AutoBaud echos when command checked and decoded.
until (Rx=Ack)
However, there are other solutions too :
You can reserve one char for First(raw) AutoBaud like '@', and test carefully for just that one value. Here tLL:7, tRR:8 = Valid.
Then, you need some echo to confirm, and I think you mentioned some single char ENQ commands are now there ?
The Simplex command can stand-alone, or it can alias with an ENQ, so a Simplex command always echos something as confirmation.
A complement SetDuplex char could make testing faster, and can give 2 similar commands.
b)
repeat
Tx(AutoBaudChar) // No echo
Tx(SetSimplexChar) // included Echo is simpler
until (Rx=Ack_Simplex)
or
c)
repeat
Tx(AutoBaudChar) // No echo
Tx(SetSimplexChar) // No echo
Tx(EnqChar) // if SetSimplex does not include echo
until (Rx=Ack)
b) is simpler than c)
This 2 & 3 char pattern, dictates that those command chars are carefully chosen to avoid any possible false triggers of AutoBaud Char, at any phase of repeat reset exit.
This leads to These are easy to remember symbolically, for one and two wires...
"-" is then Set-Simplex & echo -> Set link as Simplex & report ok.err since last inquiry.
"=" is then Set-Duplex & echo -> Set link as Duplex, & report ok.err since last inquiry.
"@" is AutoBaud raw char, and a simple NOP when AutoBaud is in tracking/maintenance mode, to tolerate link latencies.
"U" (0x55) is tracking/maintenance Baud char
This is compatible with my smaller Base64 set of "0"-"9", then "A"-"w", skipping "U"
A test for this, is to send a long repeating string of "@-" or "@=" to a P2 coming out of reset.
Echo should be some lesser number of ok.Acks.
Good to know.
The chatter in this thread from 2012, gives values from ~10s to 2m 27s
https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=6212
Does look like 60s is a bit light, for all possible (P2+ RaspPi combinations), from Power-On/Cycle
I'm using ">" as the autobaud character. It needs an initial ">>" and periodic ">" characters to maintain baud. It times out at 60 seconds, fastest case, and then powers down as much as possible. So, now, a reset is needed to get its attention, which would have always a practical requirement, anyway.
Realizing a need for fast bit tables, I added a new instruction: ALTB. It's like ALTD, but instead of adding D[8:0] and S/#[8:0] to get a D field for the next instruction, it adds D[13:5] and S/#[8:0], so that D can be a bit number. This works nicely with the bit instructions to make randomly-accessible bit fields which can span the entire cog register memory:
In the booter ROM, this is useful for quickly checking if we have 1 of 7 whitespace characters.
Note: To make room for ALTB, I got rid of SETBYTS which set all D bytes to S/#[7:0].
Some issues with that :
* ">" has more hi-time than "@" and has less sampling time, so lower precision.
* Double chars are a problem to manage - eg how does the host know when the P2 has come out of reset ?
* The "@" has tLL = 7, tFF =8, and gives a result on end of 6th bit - is that not enough time to use a single Char AutoBaud ?
Did you look at 0x55 as the periodic char to maintain baud, as the checking overhead on that is way less, than on ">" or "@" ?
ie it gives you the most cycles for SHA work.
1) I think there is just enough time, to AutoBaud from a single char, inside the first interrupt ? On ">" and maybe on "@"
ie by some code order shuffling like
RX setup gets pushed up earlier, as far as possible, and any Capture Mode changes can run after the RxPin is enabled.
ie This looks to have margin at your 1.75Mbd & ">" to do this in a single Char.
2) If you test for a valid ">" before applying AutoBaud-trim, that is a quite narrower catch range, as you must be inside valid-baud limits.
In contrast a 0x55 test needs only pass the Highest Frequency test, which can be something over 10%, so you can tolerate much longer pauses / drift.
I get ~15% minus Baud Tolerance, and no imposed limit on Baud Increased.
The code is very similar to what you have above, just a SmartPin cell mode change to X=5 edges-Time-capture.
Total flight time is less in re-trim, which matters as the RxINT may have been late
Tightest times look to be on drift slower, so maybe a Smart Pin X=5 mode can interrupt, which only occurs on 0x55, as RxInt re-arms the counter. (all other possible chars are 4 or less, so no int).
This buys 1.5 more bit times for the detects maintenance case, as the baud INT is from last data bit fall, not mid-stop bit
ie, it then codes something like this, which looks smaller and even faster on Rx-chars
This also shifts the catch range, now, any step Up in Baud (down in SysCLK), is fine, as 0x55 occurs before RI int.
Drifts down in Baud (Up in SysCLK) need to react & reset before the Start bit, and may get a bit tangled if Maint_ISR and RI drift into the same time-space.
If Maint_ISR completes before RI, (and I guess, removes it, by reset of Rx state machine via new Baud) then it needs only that Maint_ISR enters first.
The margin for that is ~ +15.79% in SysCLK
There's also a healthy overlap between the Ultibo and P2 communities.
That's cheating!
Then he really is trying hard enough.
Look at how it got rid of a string of seven 'IF_NZ CMP X,#whitechr WZ' instructions that was slowing things down:
Now, all seven whitespace characters are detected in just two instructions. The code is even one long smaller.
SETPEQ/SETPNE were replaced by SETPAT which uses the Z flag to select equal/not-equal and C to select INA/INB.
Checking the ranges, I get this for 0x55 "Baud maintenance", two interrupts, 't5 Monostable' design.
* a -15.79% catch range in apparent Baud slow down (User lowers Baud, or SysCLK drifts up)
* an almost unlimited +% catch in apparent Baud increase. (you would likely have some upper sanity check ceiling ~ 2MBd? )
Inside the -15.79% drift, the earlier 0x55 INT resets the UART, so strips any pending RI (double INTs are avoided.)
Outside the -15.79% drift, in this extreme unworkable case, RI hits before 0x55, (& is certainly corrupt data), and it will reset the 0x55 path before it is read. ( I think this also avoids double INTs ) This is an outside-spec zone, but should avoid lockouts.
Note this is much wider catch range than your original Check Rx for ">", and it removes the check code from default Rx INT path.
I doubt the Osc drift, from a recent Reset/RawAutoBaud to next BaudTrim maintenance, will be anywhere near +/-15.79%, but it does mean longer pauses, or higher slews, are tolerated.
( eg I can imagine one use of many P2's in a T&M cycling chamber, capturing things like oscillator module frequencies. Ideally, tray measure does not include a re-boot, but it is possible a new-tray load, could need a new-code load, and that would be done on a high temp slew )
One practical use of a nice wide Up-Speed ability, is you can work to the PVT 20MHz corner, sync the P2, then read the actual RCOsc
(via a command to read your t0 above) .
Anyone on a system with good baud granularity, can then adjust Baud to remove the fat safety margin, and send a 0x55 then data. P2 will change gears on the fly.
It also allows a system wide bootup, and 'who is connected' check, at some lower clock speed on all parts, then when the 'board test' part is done, you change UART in the Download section, 10x+ (eg 115200 -> 1.5MBd) without needing a P2 reset.
Have OnSemi given you the spec margins for an uncalibrated Oscillator ? Was that 30% meaning +/- 30% ?
(I guess some of this will depend on how tightly you spec Vcc +/- 10% is common, but would give worse spans than +/- 5% )
You may need to spec two spreads, if you want to offer a similar T,V range to P1 ?
Addit: the 0x55 capture, is based on this Smart Pin mode
%10011 = for X periods, count time
which I have taken as capable of capturing time, over X-Whole-Periods. (here X=5)
As a period measure, it needs to Arm, wait for the next (=\_) edge then start timing, and counting periods (+1 on each =\_).
After 5 whole periods, it stores the 5P time, Clears time, and signals Done, & interrupts if enabled.
It does not re-arm, at least until read.
Assumes some means exists to re-set the monostable nature of this, eg Reset of Mode, clears all totals, and rearms.
Comment: It is unclear if RDPIN alone is enough to re-arm a Smart-Pin capture, but there could be use for an option to read with/without re-arm ?
It is certainly useful to configure many pin-cells, then arm all of them in the same SysCLK.
Slight diversion :
FWIR, the related mode, %10100 = For X periods, count states actually means collect tHH over X whole periods.
A precision duty cycle would then use two pin cells, with the common Arm mentioned above.
eg
Enable PinCell to X-tFF mode (%10011)
Enable PinCell to X-tHH mode (%10100)
Issue atomic Arm command to both, they then both wait for the same-next-edge to start*
Wait for interrupt.
Precision Duty = tHH/tFF
* to avoid partial-gate & phase errors, both tHH & tFF need to wait for the same specified (next) arming-edge
I was mostly thinking about the 'any old/slow/non optimised Pi-like systems', such as may be lying about in classrooms & Labs.
There, a 60s timeout inside P2 could be a little short, based on the forum discussions.
During this time, P2 is running no code. just waiting on interrupts - the Raw AutoBaud one, or the Timeout.
Icc should be modest, at the RC osc into one COG ?
I see 32b at 20MHz is just over 3 1/2 minutes - seems a simple solution ?.
Sure, but it is also nice to have a System-Power outage and recovery, be able to boot normally, and that can occur with no additional reset lines, if you take simple care on the timing.
Given nothing is happening anyway, there is little down side to a longer timeout. The power impact looks modest, and only for a short time after a power cycle (which should be rare).
If someone does manage the reset, they are not affected.
Minor detail I notice, but a possible failure window, is that ">" has an alias case with stop-bits = 5.
That is rare, but not impossible, especially at higher baud rates.
There is a 'mirror' version of ">" in "0", which has same tFF=7, and uses tLL=5 instead of the tHH=5
By measuring tLL, you avoid the stop-bit alias case.
Clever!
Does this mean you settled on 7-bit serial? I lost track of the conversation...
It's 8-bit serial, but any characters with bit 7 set are ignored at the ISR level, so they don't get through.
Does this mean that the JUMP variants can wait too? Or would you still use a WAITxxx, followed by a JMP?
Good eye. This could only blow up during initial autobaud, right? Once we sync, there's no possibility of alignment problems, assuming the baud rate doesn't change too fast.
Correct, it is a rare, but possible combination, only on the Raw Baud step.
One of my pencil tests is to consider the P2 coming out of reset in any bit-slot, and with any number of stop bits.
See my post above about repeating autobaud loops, where you would send a pair of chars.
The AutoBaud filter rejects the second char, and waits until the correct phase Raw Baud char.
Then, the second char passes into RX, and sets one/two pin modes, and echoes, allowing the host to sense exactly when
the P2 has come out of reset.
Agreed. With 0x55 the baud rate can do a quite large step-up (>10x), but step downs need to be < -15.79% decrements to track.