P2 One-Pin boot from small MCU

jmg · 2016-10-04 01:05

Test results from some code to check into the One-Pin mode. Test MCU EFM8BB1, clock dropped to 12.25MHz to reduce power.

This is virtual Terminal Serial Capture of code, running with the test example

 Prop_Txt - 0 0 0 0 +/cj9v37I/YlJoD/KIBm/fD/n/3/////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////6

Measured time for dump : Sending 20 Char header and 863 code bytes as B64
101.412ms From MCU reset, at 12.25MHz

PredictTime = 101.298ms Char-times alone. at 115566 Baud.

Predicted time for a 2048 byte image, is 238.016ms, which is starting to climb.

Code sizes:
Size_Total_P2_OnePin 0x8c = 140 bytes total

Modules :
Size_MCU_Init EQU 0x26 = 38 bytes
Size_Repack EQU 0x15 = 21 Bytes
Size_EncP2_64 0x1e = 30 bytes

P2 uses a slightly strange Enc64 scheme, more complex than a simple 64Byte slice in ASCII, so that bumps that Size_EncP2_64

Speeds at possible higher Baud (340278) : PredictTime = 34.403ms, Measure 34.500ms from Reset 20+863
Reset delay to Tx is 87.1us

Assumptions: B64 will have some fractional left-overs, but it is assumed the BOOT loader only loads whole 32b pieces.
Code can be slightly simpler, if Send is 3-byte quantized, & assumes trailing fractions are simply discarded.

Outstanding questions and possible issues, from the other thread.

cgracey wrote: »

The booter waits 10ms, in any case, before responding serially.

~10ms from what ? From the end of a valid IP string, start bit, or any edge ?
What about any error messages - are they also 10ms from last-RX activity ?

With a 100ms upper limit, and a 10ms+ lower limit, plus add in reset-cap effects, and this all starts to narrow down....

How long from reset pin (rapid) rise, until the P2 can sense RX ?

What happens if a 20h char is partway through, when the P2 exits Reset ?
With two low regions, which one is used for AutoBaud ?

If the P2 sends its verbose CR+LF+”FAIL”+CR+LF in One-Pin mode, that's going to scramble the Rx and cause more fails... ?

"If an external pull-up resistor is sensed on P60 (SPI_CK):"

How is that external pull-up resistor sensed - does it drive P60 low, briefly, then read after release to see Floating ?

Typical One-Pin use would be to have the MCU pulse the P2 reset, but then it needs to wait for any RST cap to RAMP, so is unsure when to start.
If it starts wrong, the load will fail, and speed is quite important here.

Improvements :

If P2 sent 0xFF, or 0xFE, (at the uncal 115200?) on RXD(P63) when it was ready, the MCU could sense that, and start with best precision. It no longer has to hope.
This approach would also cover a Software Reset reboot.
With no Ready signal from P2, the MCU has no way to know it has just done a SW reset ?

What could the P2 Auto-baud up to ? Most small MCUs these days have CalOsc and can go well above 115200.
eg 345600 ? (22.1184M/64)

that could reduce the 2k load time from 238.016ms to around 80ms.

Addit: A variant of this, which would keep the P2 in passive mode more, would be to have the MCU use an ENQ/ACK scheme.

eg add $FF or $FE to the valid Char list, and echo a status byte like 0xFE for OK/Ready and 0xFF for error since last ACK

A 500us repeat rate ENQ polling loop can be done for `+29 bytes

cgracey · 2016-10-04 04:41

It's not the baud rate that can't go higher, it's the auto-baud detection that is limiting. It has to review every state duration and apply measurements. It's the single-bit-period 1's and 0's that push the limit.

About these possibly-unscheduled error messages, like from a bad character... You are right that they are a source of trouble. The setup for headaches is that the Prop_Hex and Prop_Txt commands have unknown number of data in them. Like you said, we should log the error. Then, we can report it back as part of Prop_Chk's response. This gets responses isolated down to one command that is fixed in its data length. What do you think about that?

jmg · 2016-10-04 04:54

cgracey wrote: »

It's not the baud rate that can't go higher, it's the auto-baud detection that is limiting. It has to review every state duration and apply measurements. It's the single-bit-period 1's and 0's that push the limit.

This uses WAITs ? Which can resolve to 1 SysCLK ?

cgracey wrote: »

About these possibly-unscheduled error messages, like from a bad character... You are right that they are a source of trouble. The setup for headaches is that the Prop_Hex and Prop_Txt commands have unknown number of data in them. Like you said, we should log the error. Then, we can report it back as part of Prop_Chk's response. This gets responses isolated down to one command that is fixed in its data length. What do you think about that?

Yes, I think a single char ENQ char, and a single char ACK/RDY echo, is an easy way to check progress at any time.
This also allows the One-Pin MCU to sit there, pinging the P2, as it waits for Reset Caps etc to charge.
Skewed Delays in either P2, or MCU, Power-Up are also tolerated in this design.
As soon (500us) as both parties are on-line, the boot process will launch.

If the MCU idles doing this, it can also catch a P2 Soft-reset and reboot it ok.

Cluso99 · 2016-10-04 05:59

Peronally, I'd rather have an I2C boot option than 1pin serial.

Also, the P2 should be passive (ie not output on P62/SO) until it receives a sequence on P63(SI).
This also means the boot code should wait and verify a pulldown/start/break condition is longer (ie not like the P1 which just just a simple bit test). This would mean there would be a limit to the minimum baud detected/used.

jmg · 2016-10-04 06:14

Cluso99 wrote: »

Peronally, I'd rather have an I2C boot option than 1pin serial.

One Pin comes almost for free. It adds some simple ignore-my-own-echo handling.
Costs about 9 bytes at the MCU end.

Of course, I'm fine with One-Pin and i2c

Cluso99 wrote: »

Also, the P2 should be passive (ie not output on P62/SO) until it receives a sequence on P63(SI).

Err, yes - this is exactly what ENQ / ACK does ?

Cluso99 wrote: »

This also means the boot code should wait and verify a pulldown/start/break condition is longer (ie not like the P1 which just just a simple bit test). This would mean there would be a limit to the minimum baud detected/used.

Yes, Chip already has Autobaud implemented ?

jmg · 2016-10-04 22:36

Baud management maths, and possible choices if the host can read P2_Osc via an ENQ variant.

Examples of ENQ variant, that reports the AutoBaud capture value. (maybe a simple 2 byte binary reply is fine ?)
This will report numbers like

Baud   Capture.4bT  P2_Osc Precision LSB
2400   33333        30ppm
9600    8333        120ppm
115200   694        0.144%
460800   174        0.576%
691200   116        0.864%

Baud   Capture.8bT  P2_Osc Precision LSB
9600    16667       60ppm
115200   1389       0.072%
460800   347        0.288%
691200   231        0.432%

; 'Baud grouping' is also possible to avoid worst-case offsets/bad pairings at higher baud rates.
; N=25;;  N=N+1;24M/(20M/N)  N=25;;  N=N+1;24M/(1.10*20M/N)  N=25; N=N+1;24.5M/2/(1.10*20M/N) N=25; N=N+1;24.5M/2/(0.92*20M/N)
; 26 = 31.2                    26 = 28.363                      26 = 14.477                   26 = 17.309
; 27 = 32.4                    27 = 29.454                      27 = 15.034                   27 = 17.975
; 28 = 33.6                    28 = 30.545                      28 = 15.590                   28 = 18.641
; 29 = 34.8                    29 = 31.636                      29 = 16.147                   29 = 19.307
; 30 = 36                      30 = 32.727                      30 = 16.704                   30 = 19.972
; 31 = 37.2                    31 = 33.818                      31 = 17.261                   31 = 20.638
; 32 = 38.4                    32 = 34.909                      32 = 17.818                   32 = 21.304
; 33 = 39.6                    33 = 36                          33 = 18.375                   33 = 21.970
; 34 = 40.8                    34 = 37.090                      34 = 18.931                   34 = 22.635
; 35 = 42                      35 = 38.181                      35 = 19.488                   35 = 23.301
; 36 = 43.2                    36 = 39.272                      36 = 20.045                   36 = 23.967
; 37 = 44.4                    37 = 40.363                      37 = 20.602                   37 = 24.633
; 38 = 45.6                    38 = 41.454                      38 = 21.159                   38 = 25.298
; 39 = 46.8                    39 = 42.545                      39 = 21.715                   39 = 25.964
; 40 = 48                      40 = 43.636                      40 = 22.272                   40 = 26.630
; 41 = 49.2                    41 = 44.727                      41 = 22.829                   41 = 27.296
; 42 = 50.4                    42 = 45.818                      42 = 23.386                   42 = 27.961
; 43 = 51.6                    43 = 46.909                      43 = 23.943                   43 = 28.627
; 44 = 52.8                    44 = 48                          44 = 24.5                     44 = 29.293
; 45 = 54                      45 = 49.090                      45 = 25.056                   45 = 29.959
;                              46 = 50.181                      46 = 25.613                   46 = 30.625
; 							                                  47 = 26.170                     47 = 31.290
; Example of Selective Baud choice - EFM8BB1, and 10% variant P2 osc                          
;  N=36;24.5M/2/(1.10*20M/N) = 20.045                                                         
;  24.5M/20/2 = 612500                                                                        
;  (1.10*20M/ans) = 35.918                                                                    
;  (1.10*20M/36)  = 611111.11                                                                 
;   1-ans/612500  = 0.226 % Baud                                                              
;
; 27 = 15.034 (P2 + 10%)
; MCUb=24.5M/2/15   = 816666.6
; P2b=(1.10*20M/27) = 814814.8
; 100*(1-MCUb/P2b)  = -0.227%

; 30 = 19.972 (P2 -8%)
; MCUb=24.5M/2/20 = 612500
; P2b=(0.92*20M/30) P2b = 613333.33
; 100*(1-MCUb/P2b)  Err = 0.135%
;
; and a 'bad choice/best avoided' example
; 28 = 15.590
; MCUb=24.5M/2/16   MCUb = 765625
; P2b=(1.10*20M/28)  P2b = 785714.2857142857142857143
; 100*(1-MCUb/P2b)   ans = 2.55%

jmg · 2016-10-05 08:47

Test results, pushing up BAUD.... device EFM8BB1

For these tests MCU is run at SysCLK/2, to save power, as it is expected this will be faster than P2 can manage.

                    ;              Predict     Measure(incl Reset.Init)
;BAUDr  EQU 115200    ; Target  /2   107494      107755
;BAUDr  EQU 460800    ; Target  /2    26366       26515    d149
;BAUDr  EQU 691200    ; Target  /2    18253       18390    d137
;BAUDr  EQU 1020000   ; Target  /2    12169       12297    d128

BAUDr  EQU 1225000   ; Target  /2     10140       10600    d460  Original 5-step 64B encode  SW showing @ 1.225MBd, 24.5/2
                     ;                10140       10265    d125, with in-line MACRO 64B, 2 Steps
;BAUDr  EQU 1531250    ; Target /2     8112        9091    d979    SW showing @ 1.53MBd, 24.5/2

At /1, this would be 2.45MBd as a sustained update loader.

Original P2 64B rules are complex :

; ~~~~~~~~~~~~~~~~~~ P2 Base64 Encode rules ~~~~~~~~~~~~~~~~
; More complex than a simple 64 byte ASCII subsection, so needs more code than a single AND.ADD
; “A”..”Z” = $00..$19
; “a”..”z” = $1A..$33
; “0”..”9” = $34..$3D
; “+” = $3E              asc2B
; “/” = $3F              asc2f

So I made a simpler, faster 64B that uses

;  Simpler Base64 Encode 
0x00     -> "A"
0x39     -> "z"
0x3a     -> "0"
0x3f      -> "5"

This still leaves 0x20..0x2F as control chars, and also '6'..'@', so '?' is available for AutoBAUD purposes.

Using this, drops a creepage of 460 cycles at 1.225MBd to reset-region delays of d125

jmg · 2016-10-17 20:17

Speed updates : Targeting cheapest small MCUs ...

* SiLabs cores can reach up to close to 3MBd, doing the read and (modified) b64, module size is 66 bytes.
SiLabs Baud is 24.5M/2N

* Nuvoton 4T part can reach the fastest UART setting of 1.3824Mbd, with an unrolled loop, and 131 bytes.
Nuvoton Baud is 22.1184M/16N

b64 now "0"-"9" then "A" to "w" with "U" skipped. Still much smaller and faster than original b64.
This mapping also leaves a number of useful AutoBAUD characters available.

( eg "@" and "x" both have tFF of 8, which likely fits the new fractional Baud hardware better.)

Skip 0x55/"U" allows that special case char to be used for Autobaud-Track via the Smartpins.

In X-edge timing mode, the smart pins can simply catch the unique Highest Freq signature of 0x55, and can re-trim the Baud to allow for RC Osc drift during the download.

P2 One-Pin boot from small MCU

Comments