Prop2 ROM code

cgracey · 2016-10-27 10:36

jmg wrote: »

cgracey wrote: »

I've got to go to bed now, since I need to be up in the morning - just so you don't wonder why my messaging stopped. I'll be here for another five minutes, though, in case you want to write anything quickly.

No, we are good now. Sleep is a priority

I really like the idea of reset and then immediate serial blasting, looking for responses. That simplifies the host software in some ways (reliability) and minimizes the overall download time.

jmg · 2016-10-27 10:39

cgracey wrote: »

I really like the idea of reset and then immediate serial blasting, looking for responses. That simplifies the host software in some ways (reliability) and minimizes the overall download time.

Yup. That's what my MCU design essentially does.
Simpler for the user, but the designer gets more grey hairs in the need to check for mid-char reset exit effects etc.....
Still, that is done only once, and there are a great many users out there...

cgracey · 2016-10-27 10:40

jmg wrote: »

cgracey wrote: »

Another cog, yes, but we've got people using DE0-Nano's and BeMicro boards that only fit one cog, so I've made do with one cog for the loader. Plus, it leaves the door open to single-cog chip implementations. (I know, no point)

Could perhaps the loader be smart enough to use 2, if >= 2 exist and 1 if only 1 exists ?

Could be made to work that way. We still couldn't get concurrent SHA-256/HMAC and 2Mbaud download, though. Maybe better to just keep it simple.

cgracey · 2016-10-27 10:44

jmg wrote: »

cgracey wrote: »

I really like the idea of reset and then immediate serial blasting, looking for responses. That simplifies the host software in some ways (reliability) and minimizes the overall download time.

Yup. That's what my MCU design essentially does.
Simpler for the user, but the designer gets more grey hairs in the need to check for mid-char reset exit effects etc.....
Still, that is done only once, and there are a great many users out there...

Yes, that's a puzzle of a strange sort. Akin to finding your way out of a house of mirrors.

jmg · 2016-10-27 10:44

cgracey wrote: »

Could be made to work that way. We still couldn't get concurrent SHA-256/HMAC and 2Mbaud download, though. Maybe better to just keep it simple.

Agreed.
I'll ponder some more on ways to bump the baud speed. The focus before was on smallest-cycle counts RX, assuming SHA was sharing.
If download is done first, faster baud does still buy some speed, and a nudge of the 20MHz Osc, to 20MHz min, will buy more

cgracey · 2016-10-27 10:50

jmg wrote: »

cgracey wrote: »

Could be made to work that way. We still couldn't get concurrent SHA-256/HMAC and 2Mbaud download, though. Maybe better to just keep it simple.

Agreed.
I'll ponder some more on ways to bump the baud speed. The focus before was on smallest-cycle counts RX, assuming SHA was sharing.
If download is done first, faster baud does still buy some speed, and a nudge of the 20MHz Osc, to 20MHz min, will buy more

I thought about making the whole thing event-driven, where the main code does a WAITINT and the received byte is ready on the next instruction. That would eliminate the circular buffer reading and writing, and the polling. There are a few junctures, though, where it's nice to have that shock-absorbing buffer. As it is, 2Mbaud is good. That's as fast as many FTDI chips can practically go in my experience, anyway. I think we are there. Maybe just some sanity checks and tweaks on the initial serial connection.

jmg · 2016-10-27 19:23

cgracey wrote: »

I thought about making the whole thing event-driven, where the main code does a WAITINT and the received byte is ready on the next instruction. That would eliminate the circular buffer reading and writing, and the polling. There are a few junctures, though, where it's nice to have that shock-absorbing buffer.

I'd agree to keep the interrupts, as there is significant re-grouping code going on in the background, and all bytes do not require the same time.
P2 has relatively low interrupt overheads, compared with other MCUs

cgracey wrote: »

As it is, 2Mbaud is good. That's as fast as many FTDI chips can practically go in my experience, anyway. I think we are there. Maybe just some sanity checks and tweaks on the initial serial connection.

Yup, the Baud Raw code looks pretty close, and the buffer handling is good.
The resi1 opcode is nifty

I would still suggest trial the 0x55 Baud-trim case, as that has significant advantages in drift range tolerance, which is why this ReTrim is there.
It also has better capture precision, and every bit helps when you push speeds.

Basically, 0x55 support just adds a small Trim_INT: code stub, and shifts the trim-code to there, with a minor edit for 8B vs 7B,
& you skip 0x55 in the Base64, which I see is Table-based, so that is no change in code-size.

Most of the work is done in the SmartPin hardware, the Trim_INT only occurs when a 0x55 arrives.
One line of code in the RxINT, resets the monostable effect of the 0x55 capture. All quite minor tweaks.

This is what I assumed about the SmartPin abilities, in analyzing this :

the 0x55 capture, is based on this Smart Pin mode
%10011 = for X periods, count time
which I have taken as capable of capturing time, over X-Whole-Periods. (here X=4 or 5 if you include the start edge ?)

Operation assumed:
As a period measure, it needs to Arm, wait for the next (=\_) edge then start timing, and counting periods (+1 on each =\_).
After 4 whole periods, it stores the 8B time, Clears time, and signals Done, & interrupts if enabled.
It does not re-arm, at least until read.
Assumes some means exists to re-set the monostable nature of this, eg Reset of Mode, clears all totals, and rearms.

0x55 "U" :        S       1       2       3       4 
xxxxxxxxxxxx=======\_s_/=0=\_1_/=2=\_3_/=4=\_5_/=6=\_7_/=P==T=
            ^ReArm                                 ^- Capture Interrupt, updates Baud Divider, resets Rx
                                                         ^- RxINT - Trim Smart Pin ReArms here
Timer Gate x\______/===============================\__________
TimerValue 0000000012345..
                                                  ^TimerCapture & Clear, X Count reached

cgracey · 2016-10-27 22:30

jmg wrote: »
cgracey wrote: »

I thought about making the whole thing event-driven, where the main code does a WAITINT and the received byte is ready on the next instruction. That would eliminate the circular buffer reading and writing, and the polling. There are a few junctures, though, where it's nice to have that shock-absorbing buffer.

I'd agree to keep the interrupts, as there is significant re-grouping code going on in the background, and all bytes do not require the same time.
P2 has relatively low interrupt overheads, compared with other MCUs

cgracey wrote: »

As it is, 2Mbaud is good. That's as fast as many FTDI chips can practically go in my experience, anyway. I think we are there. Maybe just some sanity checks and tweaks on the initial serial connection.

Yup, the Baud Raw code looks pretty close, and the buffer handling is good.
The resi1 opcode is nifty

I would still trial the 0x55 Baud-trim case, as that has significant advantages in drift range tolerance, which is why this ReTrim is there.
It also has better precision.

Basically, 0x55 support just adds a small Trim_INT: code stub, and shifts the trim-code to there, & you skip 0x55 in the Base64.
Most of the work is done in the SmartPin hardware, the Trim_INT only occurs when a 0x55 arrives.
One line of code in the RxINT,

This is what I assumed about the SmartPin abilities, in analyzing this :

the 0x55 capture, is based on this Smart Pin mode
%10011 = for X periods, count time
which I have taken as capable of capturing time, over X-Whole-Periods. (here X=4 or 5 if you include the start edge ?)

Operation assumed:
As a period measure, it needs to Arm, wait for the next (=\_) edge then start timing, and counting periods (+1 on each =\_).
After 4 whole periods, it stores the 8B time, Clears time, and signals Done, & interrupts if enabled.
It does not re-arm, at least until read.
Assumes some means exists to re-set the monostable nature of this, eg Reset of Mode, clears all totals, and rearms.
0x55 "U" :        S       1       2       3       4 
xxxxxxxxxxxx=======\_s_/=0=\_1_/=2=\_3_/=4=\_5_/=6=\_7_/=P==T=
            ^ReArm                                 ^- Capture Interrupt, updates Baud Divider, resets Rx
                                                         ^- RxINT - Trim Smart Pin ReArms here
Timer Gate x\______/===============================\__________
TimerValue 0000000012345..
                                                  ^TimerCapture & Clear, X Count reached

Okay. The %10011 mode rearms as soon as a measurement is finished, even dovetailing into the next measurement if the initial edge of interest is present on the same clock as the final edge of interest. Smart pins can be reset, though, by lowering and raising DIR. That would rearm the %10011 mode. I've really been itching to make an RSPIN instruction that would raise and lower DIR over back-to-back clocks, making RSPIN a two-clock instruction.

jmg · 2016-10-27 23:30

cgracey wrote: »

Okay. The %10011 mode rearms as soon as a measurement is finished, even dovetailing into the next measurement if the initial edge of interest is present on the same clock as the final edge of interest.

That's clever.

cgracey wrote: »

Smart pins can be reset, though, by lowering and raising DIR. That would rearm the %10011 mode.

Cool, I figured this would be there, just not sure which bits to wiggle....

cgracey wrote: »

I've really been itching to make an RSPIN instruction that would raise and lower DIR over back-to-back clocks, making RSPIN a two-clock instruction.

Can this even re-arm multiple pins on the same sysclk ?

I'm doing a rough draft of this 0x55 Trim (mainly moving lines about) - can you give an example line(s) to place in the RxINT to ReArm 4-period capture ?

jmg · 2016-10-28 02:30

cgracey wrote: »

I really like the idea of reset and then immediate serial blasting, looking for responses. That simplifies the host software in some ways (reliability) and minimizes the overall download time.

and

cgracey wrote: »

.. It needs an initial ">>" and periodic ">" characters to maintain baud.

As mentioned before, ">>" is less than ideal, but the code does not actually require exactly ">>", what it does need, is a ">" followed by a two =\_ char.
There are quite a few of those, and it is better to avoid a second >, as there is a small chance reset-exit will hit the second one, and then expect the next char to be a two =\_ char
This can be fixed by a pairings like "> " (0x3e,0x20), or "><" or if use "0" as AutoBaud, that allows "0 " or "0>" or "0<"

With this, only one char in the "immediate serial blasting" strings is active, and exit from any-phase is ok.
No change to ROM code is needed, only the host-side changes one char in string, slightly.

jmg · 2016-10-28 04:51

To help test this, and push up the baud-rate, it can help to separate the Capture errors, from baud errors.
This can be done by using two sets of Baud values, ones that align exactly with 20M/N, and ones that do not. (20M/N+0.5)

The best UART I've tested here, for Baud frequency granularity is the EXAR one
Mouser/Digikey have Eval Boards : XR21B1420IL28-0A-EVB - well worth having in the test lineup, not expensive at just over $30

Test: Send a large file of "U" repeating, one stop bit, and check the frequency counter.
This confirms both the Baud-value, and will catch any added Stop Bits. (None seen up to 4MBd)
Edge placement on these parts, I believe is to 1/48MHz or 20.83ns

First, run the exact matches to a FPGA Crystal 20.00MHz derived Baud.

> N=4;;N=N+1;20M/N  ->  EXAR USB-UART, first do 20.00MHz match values, 
Calc/Set Terminal      Counter     MBd Result ppm deviation 
4000000             -> 2.00023*2 = 4.00046
3333333.333         -> 1.66703*2 = 3.33406
2857142.857         -> 1.42882*2 = 2.85764   +174ppm
2500000             -> 1.25026*2 = 2.50052   +208ppm
2222222.222         -> 1.11134*2 = 2.22268   +206ppm
2000000             -> 1.00018*2 = 2.00036
1818181.818         ->  909920*2 = 1.81984
1666666.667         ->  833444*2 = 1.666888
1538461.538         ->  769393*2 = 1.538786
1428571.429         ->  714446*2 = 1.428892
1333333.333         ->  666799*2 = 1.333598
1250000             ->  625077*2 = 1.250154
1176470.588         ->  588357*2 = 1.176714
1111111.111         ->  555650*2 = 1.111300
1052631.579         ->  526435*2 = 1.052870
1000000             ->  500710*2 = 1.00142
952380.9524         ->
909090.9091         ->

next miss-align, and run some more Baud checks

N=4.5;; N=N+1;20M/N ->  Miss-align to worst 0.5 offset 
3636363.636         ->  1.81852M*2 = 3637040 1-ans/3636363 = 186ppm
3076923.077         ->  1.53883M*2 = 3077660 1-ans/3076923 = 239ppm
2666666.667         ->  1.33357M*2 = 2667140 1-ans/2666666 = 177ppm
2352941.176         ->  1.176777M*2= 2353554 1-ans/2352941 = 260ppm
2105263.158         ->  1.05279M*2 = 2105580 1-ans/2105263 = 150ppm
1904761.905         ->  0.952535M*2= 1905070 1-ans/1904761 = 162ppm
1739130.435         ->  0.869736M*2= 1739472 1-ans/1739130 = 196ppm
1600000             ->  0.800145M*2= 1600290 1-ans/1600000 = 181ppm
1481481.481         ->  0.740908M*2= 1481816 1-ans/1481481 = 226ppm
1379310.345         ->

and lets add some calculated numbers from a CP2102N

 CP2102N & 24MHz virtual baud clock UARTS 
( N=7;;N=N+1;48M/(2*N) )
3000000
2666666.66
2400000
2181818.18
2000000
1846153.84
1714285.71
1600000
1500000
1411764.70
1333333.33
1263157.89
1200000
1142857.14
1090909.09
1043478.26
1000000
 960000

tonyp12 · 2016-10-28 15:44

Uart is so 1980's, most moderns MCU are now instead doing i2c bootstrap loading.
There are plenty of USB to i2c breakout-boards for $5.

David Betz · 2016-10-28 15:45

tonyp12 wrote: »

Uart is so 1980's, most moderns MCU are now instead doing i2c bootstrap loading.
There are plenty of USB to i2c breakout-boards for $5.

You mean most modern MCU's like the Propeller 1? :-)

tonyp12 · 2016-10-28 16:27

Do you think it's modern?, I don't use a MCU that was not created in the last 3years,
I want the leading edge technology and not seeing NRD just after I designed the product.

David Betz · 2016-10-28 17:18

tonyp12 wrote: »

Do you think it's modern?, I don't use a MCU that was not created in the last 3years,
I want the leading edge technology and not seeing NRD just after I designed the product.

Talking about P1 is off topic for this thread but, no, I don't think it is modern. I do, however, think it is still useful. Too bad we can't get a really cheap P1.

Rayman · 2016-10-28 17:25

I wonder if low speed USB would work with RC oscillator...

If it would, then seems could direct boot over USB...

Maybe there's a way P2 could pretend to be HID and send data in HID reports.

Seairth · 2016-10-28 19:37

tonyp12 wrote: »

Uart is so 1980's, most moderns MCU are now instead doing i2c bootstrap loading.
There are plenty of USB to i2c breakout-boards for $5.

I'm a bit confused by this comment. Are you suggesting that, when bootstrapping from a computer, the P2 should use a USB-to-i2c bridge instead of a USB-to-TTL-serial bridge?

jmg · 2016-10-28 19:42

I believe there is a strong case to support i2c boot in P2, as well as UART, and that may yet happen.

tonyp12 wrote: »

Uart is so 1980's, most moderns MCU are now instead doing i2c bootstrap loading.

Nope - Not the modern MCU's I'm looking at, they are all either UART or USB Boot loading.... ?

tonyp12 wrote: »

There are plenty of USB to i2c breakout-boards for $5.

Maybe, but scratch the surface, and I've yet to find any that can do all of this
* i2c Slave to >> 1MBd (FTDI have one at 3.4MHz, most are slower)
* Control reset line via VCP (same as UARTs do - FTDI fails this )

issue #2 : At the P2 side, there is no native Smart-Pin HW support for i2c, so download will be slower than UART.

That said, there are use cases for i2c, such as ...

Rayman wrote: »

I wonder if low speed USB would work with RC oscillator...
If it would, then seems could direct boot over USB...
Maybe there's a way P2 could pretend to be HID and send data in HID reports.

USB-P2-ROM boot is unlikely, as it is rather complex and unstable code, and large. RC osc spec is too poor, even for LS USB.

In the ROM feature queue, I'd place i2c well ahead of USB.

However, a small i2c part could load a USB booter, which would likely need to run a PLL Osc for USB.

Or, a USB-Boot design could use a low-cost USB micro - SiLabs have cheap USB parts - the EFM8UB10F8G-C-QFN20R is 59c/50+
Choices there, are to modify their USB Bootloader, so it programs the P2 instead of the own-flash, or to modify their USB UART libraries...

The included One-Pin-UART P2 Boot mode trumps even i2c for compact boot, and I'm working on code examples that can use low cost MCUs as fast loader hosts, for modest code sizes.

tonyp12 · 2016-10-28 20:02

With I2C you got a clock line up to 5MHz, this removes the autobaud and 7bit solution that comes with clock-less uart.
No way P2 will have a autobaud uart RX at 5MHz with RC osc.

Tubular · 2016-10-28 20:14

Which modern mcus boot via i2c, Tony? I'm interested from a sensewire/i3c point of view.

I think the uart boot is great where you have a radio modem. Might even be able to boot over orbcomm satellite

Seairth · 2016-10-28 20:28

tonyp12 wrote: »

With I2C you got a clock line up to 5MHz, this removes the autobaud and 7bit solution that comes with clock-less uart.
No way P2 will have a autobaud uart RX at 5MHz with RC osc.

I wonder if the higher speed makes much difference. You still need time to process the incoming bytes. At some point, you shift from I/O bound to CPU bound, and with a 20MHz startup clock, I suspect you end up being CPU bound well before you get anywhere near 3.4Mbps (i2c high speed).

I can see, however, where i2c bootstrap might make it easier for another SoC/CPU/MCU/etc to bootstrap the P2 instead of using SPI flash.

edit: what do the USB-to-i2c adapers show up as on a computer? A serial port? Or will it require a special driver and/or software to access it?

jmg · 2016-10-28 20:40

tonyp12 wrote: »

With I2C you got a clock line up to 5MHz, this removes the autobaud and 7bit solution that comes with clock-less uart.
No way P2 will have a autobaud uart RX at 5MHz with RC osc.

This overlooks many real world issues...
* What will that 5MHz clock actually talk to ? I cannot find any i2c R/W spec over 3.4MHz ?
* Even assuming you could find a '5MHz $5 USB i2c board', the P2 lacks SmartPin hardware for i2c, so data pumping of i2c will always be slower than a UART, as so many CPU cycles are eaten bit-banging.
* Most i2c memories are limited to small, and ~1MHz, so i2c can never displace UART, merely augment it.

tonyp12 · 2016-10-28 20:45

By the time P2 is out, UFm 5 MHz I2C-bus will be common, a few devices on the marker already.
P2 should have hardware i2c with slave address wakeup from sleep, like any modern MCU.

PSoC® 3 and PSoC 5LP I2C Bootloader
http://www.cypress.com/documentation/application-notes/an60317-psoc-3-and-psoc-5lp-i2c-bootloader.

All new MSP430 have it (some with UART also), click on the BLS to sort them for the latest released models that is i2c only,
http://www.ti.com/lsds/ti/microcontrollers_16-bit_32-bit/msp/products.page#

That the P2 is created without external view of the world may be good is some way, but not going where the industry is headed will make it look like it was designed in 1990.

jmg · 2016-10-28 20:46

Seairth wrote: »

I can see, however, where i2c bootstrap might make it easier for another SoC/CPU/MCU/etc to bootstrap the P2 instead of using SPI flash.

Agreed. Small i2c memories are easy to understand, and cheap, and have a place.

If you have a MCU, then the UART in One-Pin mode can work well. The cheapest MCUs can get > 1MBd UART, but many have 400kHz i2c.
SoC/CPU connects are more likely to use Two-Pin-UART. Less common is finding i2c slave support, most Big Part designers, expected them to be masters.....

Seairth wrote: »

edit: what do the USB-to-i2c adapers show up as on a computer? A serial port? Or will it require a special driver and/or software to access it?

The FTDI one I checked does have a VCP driver, but it fails to control any pins via the handshake lines (no idea what they were thinking?).
That means it fails to be much use, as you cannot control the P2 reset pin.

jmg · 2016-10-28 20:59

tonyp12 wrote: »

PSoC® 3 and PSoC 5LP I2C Bootloader
http://www.cypress.com/documentation/application-notes/an60317-psoc-3-and-psoc-5lp-i2c-bootloader.

That seems to be an appnote ?

SiLabs have app-note examples for i2c too, but they have chosen to ship their newest parts with UART loaders in ROM.

tonyp12 wrote: »

All new MSP430 have it (some with UART also), click on the BLS to sort them.
http://www.ti.com/lsds/ti/microcontrollers_16-bit_32-bit/msp/products.page#

hmmm a lot of those show 'none' ?
Also, I look up MSP430G2230-EP, which shows i2c in BSL, but data sheet has no mention at all, of any loader system ?
It does say this
"The flash memory can be programmed by the Spy-Bi-Wire or JTAG port, or in-system by the CPU"

Does that column mean 'one can be user installed', rather than 'Chip comes with this in ROM' ? - very different things

tonyp12 wrote: »

That the P2 is created without external view of the world may be good is some way, but not going where the industry is headed will make it look like it was designed in 1990.

The UART is never going away, so I'm not following your point ?

Certainly, I was keen to avoid a SLOW UART solution, as that is 1990-ish, but Chip now has this working ~ 2MBd, which is plenty able to keep up with most companion parts.
3MBd would be nicer still, but 2MBd is ok

Chip is looking at bumping the RC Osc to a 20MHz min spec, and that may be enough to hit 3MBd

With a PLL, UARTS can hit 12-15MBd, and I don't see i2c offering anything close to that. Makes i2c look 'old' ?

Tubular · 2016-10-28 21:11

I3C goes up to something like 26 33Mbps.

But I think the uart as proposed is still the most flexible and most widely supported.

jmg · 2016-10-28 21:31

To bring this back on topic, here is my (rough) code shuffling, to use 0x55 Baud-Re-Trim.

This patch reduces the peak RxINT time, and has a (much) wider dynamic range, for similar overall code size.
It uses the nifty time for X-Periods Smart pin mode.

' Initial Coarse AutoBaud, has related Trim variant once initialized.
' Whole byte times are around 110~100 sysclks at 2Mbd,/20MHz but there is also background unpack to do, on eg Base64, or Hex
' .chr Base64 loop is 20 lines ~40 sysclks, RxINT is now 6 lines or ~14 sysclks (was 26c worst case), so ~ 50% at 2MBd ?
'
'
'
' Timing - 3 Chars, from Coarse AutoBaud ..      Dummy 2f.2r packer..                       RxCommand
'0x3e ">" -> OK tFF:7b,tHH:5b  2f,2r             0x3c "<" -> tFF:7b tHH = 4b   2f,2r        0x3f "?" : tFF 7 tHH 6 tLL 1 2f,2r 
'=====\_s_._0_/=1=.=2=.=3=.=4=.=5=\_6_._7_/=P==T=\_s_._0_._1_/=2=.=3=.=4=.=5=\_6_._7_/=P==T=\_s_/=0=.=1=.=2=.=3=.=4=.=5=\_6_._7_/=P==T=\_s_
'     ^A..Ax                      ^A..Av         ^B..Bx                      ^C..Cx  RxON                                         ^RxINT    

autobaud_isr    ' Every =\_ , then RI based runs after reset, until detects Coarse AutoBaud Char
'A
        rdpin   a0,#rx_tne      '2  get fall-to-fall time   (7x if $3E)
        rdpin   a1,#rx_ths      '2  get high time ">" or low time "0", (5x if $3E,$30)

        cmpr    a0,limitC   wc  '2  make sure both measurements are within limit
    if_nc   cmpr    a1,limitC   wc  '2

			scl a0,norm0        '2  if they are within 1/35th of each other, $3E
    if_nc   cmpr    a1,0    wc  '2
			scl a1,norm1        '2
    if_nc   cmpr    a0,0    wc  '2

    if_c    reti1               '2/4 Ax    if not $3E, exit

        resi1                   '4  Av     got $3E, resume on next interrupt 22c
' Coarse AutoBaud done, now wait for packing char, as not enough time to fully update UART from end-of-Coarse char.
' use one more to pack. Avoid a repeat of Coarse-Char, but can be ANY char with 2 falling edges
'B
        akpin   #rx_tne         '2  acknowledge pin
        mul 	a0,baud_t7      '2  compute baud rate
        setbyte a0,#7,#0        '2  set word size to 8 bits
        wxpin   a0,#rx_rcv      '2  set receiver baud rate and word size
        resi1   	            '4  Bx resume on next interrupt +12c
'last edge, at highest bauds use 2 Stop bits, and can also use lower Char values to move this edge left.
' very low values are not easy on some terminals.
'C
        dirh    #rx_rcv         '2  enable receiver before next start bit (OK to enable when RxD is LOW )
'Rx is enabled here, so can start to sense Start Bit fall		
        setse1  #%110<<6+rx_rcv '2  set se1 to trigger on rx_rcv high ie now RxINT
        wrpin   mt4p,#rx_t4p    '<< (re)configure Smart pin rx_t4p for timing 4 Periods & resets partial counts.
' to do: Enable Trim_ISR interrupt 
        mov 	t0,a0           '2  save baud rate for transmitter
        resi1   	            '4  Cx resume on next interrupt +12c
'
'
' Receiver ISR - Trim detect moved to Trim_ISR, RxINT now simply buffers chars.
' Now just 6 lines (no practical limit to baud ceiling) ~ 14+ SysClks (was 26+ worst case)
'D
RxINT
        dirh    #rx_t4p         '<< (re)Arm rx_t4p - resets any partial counts, aka monostable.

        rdpin   a1,#rx_rcv  wc  '2  get received chr
        shr 	a1,#32-8        '2  shift to lsb justify
        wrlut   a1,head         '2  write byte to circular buffer in lut
        incmod  head,#lut_btop  '2  ..increment buffer head
        reti1                   '2/4    ..exit

' Trim_ISR is ONLY called if 4 Periods fit inside RI resets, that can ONLY occur on 0x55
' ie for most chars this interrupt is idle.
' This vectors typically 1.5b times before RI, and will reset any pending RI
' Gives capture margins of
' a)baud slower relative to SysCLK   appx ~15% drift 
' b)baud faster relative to SysCLK  No ratio limit, only limited by highest Baud Speed

Trim_ISR  ' << new code, derived from Baud-Coarse above, active during RX, but rarely called
' ~ 20+ SysClks
        rdpin   a0,#rx_t4p      '2  get t4p time (8x if 0x55)
        cmpr    a0,limitT   wc  '2  optional sanity check on capture 
if_nc   reti1               	'2/4    optional, if not sensible, exit, noise filter. < or > test ?
' A too-fast limit may make more sense here, as other filtering is already in the RI-t4p-Reset 
' ie can ignore way-too-fast baud capture values.
        mul 	a0,baud_t8      '2  compute baud rate, on 8x sample Time
        setbyte a0,#7,#0        '2  set word size to 8 bits
' now need to retrim receiver baud rate and word size - & clear any pending RI 		
        wxpin   a0,#rx_rcv      '2  set receiver baud rate and word size
'       dirh    #rx_rcv         '2  may not be needed ? unless baud-reset or clear RI needs post enable ?
        mov 	t0,a0           '2  save baud rate for transmitter, and user readback.
        reti1               	'4  done


limitT      long    30          	'Trim count limit, tbf - maybe above this ? set some max baud
limitC      long    $58E4           'Coarse count limit ($58E4 = 1.3889, keeps SCL within $7FFF w/norm1)
norm0       long    $41D4*5/7       'fall-to-fall normalization factor
norm1       long    $41D4*7/5       'high-time normalization factor ($41D4 = 1.0 + 1/(7*5))
baud_t7     long    $1_0000/7       '7x baud computation factor
baud_t8     long    $1_0000/8       '8x baud computation factor, 4 Periods, 0x55
baud_t6     long    $1_0000/6       '6x baud computation factor, 3 Periods, 0x55 - test candidate

cgracey · 2016-10-29 09:19

jmg wrote: »

To bring this back on topic, here is my (rough) code shuffling, to use 0x55 Baud-Re-Trim.

This patch reduces the peak RxINT time, and has a (much) wider dynamic range, for similar overall code size.
It uses the nifty time for X-Periods Smart pin mode.

' Initial Coarse AutoBaud, has related Trim variant once initialized.
' Whole byte times are around 110~100 sysclks at 2Mbd,/20MHz but there is also background unpack to do, on eg Base64, or Hex
' .chr Base64 loop is 20 lines ~40 sysclks, RxINT is now 6 lines or ~14 sysclks (was 26c worst case), so ~ 50% at 2MBd ?
'
'
'
' Timing - 3 Chars, from Coarse AutoBaud ..      Dummy 2f.2r packer..                       RxCommand
'0x3e ">" -> OK tFF:7b,tHH:5b  2f,2r             0x3c "<" -> tFF:7b tHH = 4b   2f,2r        0x3f "?" : tFF 7 tHH 6 tLL 1 2f,2r 
'=====\_s_._0_/=1=.=2=.=3=.=4=.=5=\_6_._7_/=P==T=\_s_._0_._1_/=2=.=3=.=4=.=5=\_6_._7_/=P==T=\_s_/=0=.=1=.=2=.=3=.=4=.=5=\_6_._7_/=P==T=\_s_
'     ^A..Ax                      ^A..Av         ^B..Bx                      ^C..Cx  RxON                                         ^RxINT    

autobaud_isr    ' Every =\_ , then RI based runs after reset, until detects Coarse AutoBaud Char
'A
        rdpin   a0,#rx_tne      '2  get fall-to-fall time   (7x if $3E)
        rdpin   a1,#rx_ths      '2  get high time ">" or low time "0", (5x if $3E,$30)

        cmpr    a0,limitC   wc  '2  make sure both measurements are within limit
    if_nc   cmpr    a1,limitC   wc  '2

			scl a0,norm0        '2  if they are within 1/35th of each other, $3E
    if_nc   cmpr    a1,0    wc  '2
			scl a1,norm1        '2
    if_nc   cmpr    a0,0    wc  '2

    if_c    reti1               '2/4 Ax    if not $3E, exit

        resi1                   '4  Av     got $3E, resume on next interrupt 22c
' Coarse AutoBaud done, now wait for packing char, as not enough time to fully update UART from end-of-Coarse char.
' use one more to pack. Avoid a repeat of Coarse-Char, but can be ANY char with 2 falling edges
'B
        akpin   #rx_tne         '2  acknowledge pin
        mul 	a0,baud_t7      '2  compute baud rate
        setbyte a0,#7,#0        '2  set word size to 8 bits
        wxpin   a0,#rx_rcv      '2  set receiver baud rate and word size
        resi1   	            '4  Bx resume on next interrupt +12c
'last edge, at highest bauds use 2 Stop bits, and can also use lower Char values to move this edge left.
' very low values are not easy on some terminals.
'C
        dirh    #rx_rcv         '2  enable receiver before next start bit (OK to enable when RxD is LOW )
'Rx is enabled here, so can start to sense Start Bit fall		
        setse1  #%110<<6+rx_rcv '2  set se1 to trigger on rx_rcv high ie now RxINT
        wrpin   mt4p,#rx_t4p    '<< (re)configure Smart pin rx_t4p for timing 4 Periods & resets partial counts.
' to do: Enable Trim_ISR interrupt 
        mov 	t0,a0           '2  save baud rate for transmitter
        resi1   	            '4  Cx resume on next interrupt +12c
'
'
' Receiver ISR - Trim detect moved to Trim_ISR, RxINT now simply buffers chars.
' Now just 6 lines (no practical limit to baud ceiling) ~ 14+ SysClks (was 26+ worst case)
'D
RxINT
        dirh    #rx_t4p         '<< (re)Arm rx_t4p - resets any partial counts, aka monostable.

        rdpin   a1,#rx_rcv  wc  '2  get received chr
        shr 	a1,#32-8        '2  shift to lsb justify
        wrlut   a1,head         '2  write byte to circular buffer in lut
        incmod  head,#lut_btop  '2  ..increment buffer head
        reti1                   '2/4    ..exit

' Trim_ISR is ONLY called if 4 Periods fit inside RI resets, that can ONLY occur on 0x55
' ie for most chars this interrupt is idle.
' This vectors typically 1.5b times before RI, and will reset any pending RI
' Gives capture margins of
' a)baud slower relative to SysCLK   appx ~15% drift 
' b)baud faster relative to SysCLK  No ratio limit, only limited by highest Baud Speed

Trim_ISR  ' << new code, derived from Baud-Coarse above, active during RX, but rarely called
' ~ 20+ SysClks
        rdpin   a0,#rx_t4p      '2  get t4p time (8x if 0x55)
        cmpr    a0,limitT   wc  '2  optional sanity check on capture 
if_nc   reti1               	'2/4    optional, if not sensible, exit, noise filter. < or > test ?
' A too-fast limit may make more sense here, as other filtering is already in the RI-t4p-Reset 
' ie can ignore way-too-fast baud capture values.
        mul 	a0,baud_t8      '2  compute baud rate, on 8x sample Time
        setbyte a0,#7,#0        '2  set word size to 8 bits
' now need to retrim receiver baud rate and word size - & clear any pending RI 		
        wxpin   a0,#rx_rcv      '2  set receiver baud rate and word size
'       dirh    #rx_rcv         '2  may not be needed ? unless baud-reset or clear RI needs post enable ?
        mov 	t0,a0           '2  save baud rate for transmitter, and user readback.
        reti1               	'4  done


limitT      long    30          	'Trim count limit, tbf - maybe above this ? set some max baud
limitC      long    $58E4           'Coarse count limit ($58E4 = 1.3889, keeps SCL within $7FFF w/norm1)
norm0       long    $41D4*5/7       'fall-to-fall normalization factor
norm1       long    $41D4*7/5       'high-time normalization factor ($41D4 = 1.0 + 1/(7*5))
baud_t7     long    $1_0000/7       '7x baud computation factor
baud_t8     long    $1_0000/8       '8x baud computation factor, 4 Periods, 0x55
baud_t6     long    $1_0000/6       '6x baud computation factor, 3 Periods, 0x55 - test candidate

Super! I'll look at this tomorrow in detail. I've designed a PCB for the test chip and now I'm worn out. Thanks for thinking through all this. I will do some RC osc simulations and see what the error is and what we can do to get the speed up. In my past experience, 3Mbaud on the standard FTDI chips doesn't really work. That will probably be changing, if it hasn't, already. 2Mbaud has been very solid, though.

potatohead · 2016-10-29 09:36

but not going where the industry is headed will make it look like it was designed in 1990.

I'm actually very pleased with the effort to polish the boot code done by jmg and Chip so far. We've got a nice package here. And darn near anything ever made can boot this device.

Not going where the industry is headed actually defines most of the P2. It's not like we want to completely ignore it all, but we've very clearly made choices to emphasize things not always a focus elsewhere.

That's a good thing.

jmg · 2016-10-29 10:01

cgracey wrote: »

Super! I'll look at this tomorrow in detail. I've designed a PCB for the test chip and now I'm worn out. Thanks for thinking through all this. I will do some RC osc simulations and see what the error is and what we can do to get the speed up. In my past experience, 3Mbaud on the standard FTDI chips doesn't really work. That will probably be changing, if it hasn't, already. 2Mbaud has been very solid, though.

Sounds good

Remind me again of which FTDI part(s) you use for testing ?

I'd suggest you order a XR21B1420IL28-0A-EVB & a CP2102N-EK
I have a CP2102N-EK due next week, & measured numbers for EXAR part are above.(1 stop bit)
I'll add actual CP2102N values when that arrives.

With a 20MHz FPGA, you should be able to test at the better-aligned higher baud rates, then check the less-ideal align ones to separate measurement-resolution effects, from other effects.

Another easy-variable to tune, is the 2nd-edge position in the dummy padding character. - just in case the Rx enable needs more time.

Prop2 ROM code

Comments