I was thinking only that the receive line activity would tell the bootloader to try serial boot first as it normally would but if there wasn't any activity then it could safely skip the whole serial boot.
Do we really need 20ms of hardware reset? I don't mind if that was a power-up or hardware reset delay but on any soft reboot it shouldn't need that delay.
As for the 16ms from Flash I can't imagine loading all 512k anyway, maybe not even 128k as I have so much built in now in the 32k of P1. I can really see the need for a fast restart from some crash or glitch etc.
Also, these pins are going to float on reset. It's too late to change that around.
Pity.
Still, you can still sense a Fast-Boot Pull-down present by pulling the pin high, then some short time later, checking if it is still high. Still high = Floating, Low = Pull down present.
I was thinking only that the receive line activity would tell the bootloader to try serial boot first as it normally would but if there wasn't any activity then it could safely skip the whole serial boot.
Do we really need 20ms of hardware reset? I don't mind if that was a power-up or hardware reset delay but on any soft reboot it shouldn't need that delay.
As for the 16ms from Flash I can't imagine loading all 512k anyway, maybe not even 128k as I have so much built in now in the 32k of P1. I can really see the need for a fast restart from some crash or glitch etc.
The 20ms delay is built into the RESn pin and engages on any source of reset.
That 16ms is to load 512 longs from the SPI flash for the 2nd stage booter. There's more time taken to compute the SHA-256/HMAC signature.
How about just start with the SPI boot, then fall back to the serial. In other words, given the boot sequence above, take out the initial serial boot part. Instead, have the SPI routine first set /CS (pin 61) to input and see if it's pulled low (assuming a weak pull-up is also on the pin). If so, skip SPI boot. Under normal boots, the 20ms reset delay would be long enough for /CS to be pulled high through the resistor. During serial boot, /CS would be pulled low through a slightly less weak resistor enabled with a PGM switch (or similar appropriate technique).
I think anyone using it for development would want serial first, in production you most likely want flash boot first.
Maybe use a fuse to indicate skipping the first serial 100ms wait?
Why not do both at the same time? Isn't that what this thing is supposed to be good at?
Yes, but I'd like to make it use a single cog, in case we ever make single-cog versions.
Instead of using multiple cogs for parallel flash and PC boot, you could use interrupts instead.
That's how it's working now. It WAS in two cogs, but with interrupts, I got it down to one cog. I made 4 generic events/interrupt-sources that can respond to pins, locks, and lut r/w's. We needed more pin events to handle smart pins.
I think anyone using it for development would want serial first, in production you most likely want flash boot first.
Maybe use a fuse to indicate skipping the first serial 100ms wait?
The risk of a fuse, is it is one-time.
Common usage will be to do first load using UART Boot with delays, then normal operation is Fast SPI boot, and then updates may need a jumper to do UART boot again.
The important detail, is to avoid delays, in what may be a watchdog reset.
This is common flow with most MCUs with a BOOT ROM - they use some pin, to go into UART Loader after reset.
How about just start with the SPI boot, then fall back to the serial. In other words, given the boot sequence above, take out the initial serial boot part. Instead, have the SPI routine first set /CS (pin 61) to input and see if it's pulled low (assuming a weak pull-up is also on the pin). If so, skip SPI boot. Under normal boots, the 20ms reset delay would be long enough for /CS to be pulled high through the resistor. During serial boot, /CS would be pulled low through a slightly less weak resistor enabled with a PGM switch (or similar appropriate technique).
Yes, that's exactly what I was suggesting.
Even if the P2 has no weak pullup, you can still determine 'Not pulled down' with a 'drive high then quickly check'
I think I've heard that 100 ms is typical human response time.
So, maybe that's an acceptable delay before boot. Wouldn't want it any longer than that though...
If the P2 was only ever talking to humans, perhaps 100 ms is tolerable.
However, P2 will be controlling Drones, and machinery, and there will be system Watchdogs present in many P2 designs.
Those are the ones where least possible disturbance during Watchdog event is required (see Peter's post)
Other MCUs have times measured in sub 1us, or 10's of us for Reset release.
Some, have a POR delay that is inserted only on Power-Cycle, of 10~20ms - this longer delay is not present on Software reset, or Watchdog resets.
Devices are made with these numbers for a reason - the end designs demand them.
The 20ms delay is built into the RESn pin and engages on any source of reset.
Ouch
I guess that is another detail now locked-in, and unable to be fixed ?
Other MCUs have a POR delay of some-milliseconds only on Power-Up, but much faster delays on Pin-Reset or Software Reset, or WatchDog resets.
That 16ms is to load 512 longs from the SPI flash for the 2nd stage booter.
Even that seems slow ? - suggests ~1MHz clock ? - what is the P2 boot clock, and what is the SPI clock ?
Also, can the 2nd stage booter load less than 512 longs ?
It's taking 11ms just to read the 2nd stage booter from Flash. That's 2K bytes. Then, it takes 17ms to run the SHA-256/HMAC verification on it - this is software and cannot be sped up, unlike using smart pins to load flash. Anyway, that's 28ms, so far. Then, it takes 13ms to clear the remainder of RAM. That's now 41ms. The RAM clearing could be skipped for flash booting, but it's needed for serial, as people may be downloading entire apps that will rely on memory being cleared. At least, that's how I've been thinking about it. It's just a housekeeping matter that is easily taken care of.
Remember that on reset, code must be reloaded from SPI flash, it's just going to take some time. And variables will not be preserved.
I don't think this can be a sub-ms reset procedure, in any case. There's just too much to do.
Is the interest in fast reset response a power-saving issue, as in the part is kept off until the moment it is needed, then shut back down?
Note that using a pull-down on /CS is ill advised as that will enable the flash chip if present.
SCK, MISO and MOSI could be used with weak pull-up or pull-down resistors to allow for up to eight different boot states without interfering with SPI operation, to be checked before loading the default loader from SPI.
for example:
111 = regular SPI/serial boot
110 = USB bootloader
101 = uSD boot
100 = Chip's serial monitor
etc
Note this way if the pull-up/downs are omitted it will likely fall into the regular SPI/serial boot
And after whatever time the ROM takes, the 2nd-stage booter is going to be taking time to load in, possibly, 512KB of code. Even clocking nibble-wide data in at 40MHz would take 25ms. This is just not going to be booting fast enough for the kind of apps you guys are talking about.
Note that using a pull-down on /CS is ill advised as that will enable the flash chip if present.
Note the present P2 design floats all pins, so /CS is very likely to have any level at all.
The pull down is pulled high, as part of the test, and the flash chip is not clocked before then.
(again, with P2 pins all floating, the CS and CLK lines might want to have added resistors, otherwise you are hoping the floating pins never activate erase/write operations)
But a watchdog reset is still a reset, right? It begins a whole new boot.
Yes, but many MCU's make a distinction between a Power-Up (Vcc Ramping) reset, and all other resets.
The Vcc-Ramp case, where things like Crystals need to start, is often made longer (some milliseconds), whilst a device with a Calibrated RC osc, can be stable in microseconds. (some parts have Fuse-choices for these Reset Delays)
Other reset paths, like SW reset and Watchdog, skip the Vcc-Ramp delay as the Crystal is already running.
In the P2, boot is from RC Fast, and Crystal or PLL is not yet enabled.
Typically Crystal and PLL enable, both have user-added delays, but those are outside the Boot-ROM.
Note that using a pull-down on /CS is ill advised as that will enable the flash chip if present.
Note the present P2 design floats all pins, so /CS is very likely to have any level at all.
The pull down is pulled high, as part of the test, and the flash chip is not clocked before then.
(again, with P2 pins all floating, the CS and CLK lines might want to have added resistors, otherwise you are hoping the floating pins never activate erase/write operations)
SPI CS, being active low, should have a pull-up resistor on it, in any case. The only way a smart pin can pull is to configure it as a high output with either a resistor or current source. That means some cog has to always keep the DIR bits high, which is not practical.
Perhaps fast boot (skip serial) could be achieved by putting a pull-up/down on the SPI CLK pin. That would be safe.
Also, the 2nd-stage SPI booter could be made $100 longs, instead of $200 longs. That would cut the current 28ms down to 14ms. Without changing the 20ms reset timer, we'd have 20ms + 14ms + 25ms (512KB of nibbles at 40MHz). That would bring the boot time down to 49ms.
Also, the 2nd-stage SPI booter could be made $100 longs, instead of $200 longs. That would cut the current 28ms down to 14ms. Without changing the 20ms reset timer, we'd have 20ms + 14ms + 25ms (512KB of nibbles at 40MHz). That would bring the boot time down to 49ms.
Is the boot clock 40MHz ?
I thought the Boot process first read a Count value, to decide how many Longs to load ?
That would make those numbers worst case.
What sets the (fixed?) 20ms, and why was 20ms chosen, given this uses RC fast and not a crystal ?
Perhaps another number can be spec'd, which is fastest practical time to user-define-ports ?
The Pin config lines would be inserted at the very front of the stage 2 loader, which is also user-designed to be compact, to lower the load time.
Also, the 2nd-stage SPI booter could be made $100 longs, instead of $200 longs. That would cut the current 28ms down to 14ms. Without changing the 20ms reset timer, we'd have 20ms + 14ms + 25ms (512KB of nibbles at 40MHz). That would bring the boot time down to 49ms.
Is the boot clock 40MHz ?
I thought the Boot process first read a Count value, to decide how many Longs to load ?
That would make those numbers worst case.
What sets the (fixed?) 20ms, and why was 20ms chosen, given this uses RC fast and not a crystal ?
The boot clock is ~20MHz. I was thinking that you'd switch to crystal and PLL in the 2nd stage booter, so you were running 160MHz, and then get the 40MHz clock rate for SPI flash.
The 20ms silicon reset timer's period was chosen arbitrarily. It could be 10ms, or even 1ms. I just figured 20ms would be a safe value. For this chip, an external brownout detector will be needed for the 1.8V core supply. I didn't want to mess with a bandgap reference this time around.
after reset:
wait for serial indefinitely
wait for serial for some amount of time, then sleep
attempt to boot from flash immediately
if fail, wait for serial indefinitely
if fail, wait for serial for some amount of time, then sleep
wait for serial briefly, then attempt to boot from flash
if fail, wait for serial indefinitely
if fail, wait for serial for some amount of time, then sleep
Why not do both at the same time? Isn't that what this thing is supposed to be good at?
Yes, but I'd like to make it use a single cog, in case we ever make single-cog versions.
I've been thinking about this comment. While it is certainly possible that you could make a one-cog version of the chip, I just don't see it ever happening. Here's why:
The architecture is predicated on having multiple cogs. Existing single-core processors that were designed that way from the beginning will almost always be a better (and likely cheaper) choice.
Dwelling on the architecture aspect, the recent addition of paired cog communication through the LUT almost mandates that all Propeller variants must have 2n cogs. It would not be possible to have a one-cog variant without breaking part of the architecture.
It's just not exciting in the way that Propellers are exciting.
So use 2 cogs to boot, if it will help! For instance, you could use the shared LUT such that one cog is reading from the SPI flash (and writing to LUT) and the other cog is (reading from LUT and) validating the code before copying it to the hub.
...Also, the 2nd-stage SPI booter could be made $100 longs, instead of $200 longs....
The boot clock is ~20MHz.
I'd suggest the software is tuned for fastest SPI loading, and it use a Length option, to allow faster load of user defined smaller sizes. ie do not lock the size in Rom, read it from the Flash
I was thinking that you'd switch to crystal and PLL in the 2nd stage booter, so you were running 160MHz, and then get the 40MHz clock rate for SPI flash.
Yes, but that clock change would be done after the critical Pin-Init section, and would likely include delays for any Crystal-start and PLL lock.
Which raises the question - When is the Crystal running ? What is the PLL lock time ?
-ie how much time budget is needed for a change to Crystal+PLL ?
The 20ms silicon reset timer's period was chosen arbitrarily. It could be 10ms, or even 1ms. I just figured 20ms would be a safe value. For this chip, an external brownout detector will be needed for the 1.8V core supply. I didn't want to mess with a bandgap reference this time around.
Given an external reset is needed anyway, I would suggest lowering the internal hardwired Reset delay.
The RC FAST starts very quickly, I assume ?
Users can define reset times via PGOOD signals and Caps, using the hysteresis in the RESET pin.
I agree with this. A small, we'll defined thing can be debugged and be robust. Easy to document and use. Everyone makes tradeoffs to suit their requirements from there.
Comments
Do we really need 20ms of hardware reset? I don't mind if that was a power-up or hardware reset delay but on any soft reboot it shouldn't need that delay.
As for the 16ms from Flash I can't imagine loading all 512k anyway, maybe not even 128k as I have so much built in now in the 32k of P1. I can really see the need for a fast restart from some crash or glitch etc.
Still, you can still sense a Fast-Boot Pull-down present by pulling the pin high, then some short time later, checking if it is still high. Still high = Floating, Low = Pull down present.
Said with no smiley ?
The P2 is not going to compete with Rapsberry loading Unix.
It competes with Microcontrollers, running things like Drones, and with extenal WatchDog timers.
That means rapid boot will be important, (see Peter's comment).
One MCU I have here says 39us from reset to run, another says 16us.
Those are the ball-park numbers you need to think about, not booting unix !
The 20ms delay is built into the RESn pin and engages on any source of reset.
That 16ms is to load 512 longs from the SPI flash for the 2nd stage booter. There's more time taken to compute the SHA-256/HMAC signature.
Unless you are running Ultibo, which is bare metal programming in FreePascal, then the boot time is a couple of seconds at most. :-D
So, maybe that's an acceptable delay before boot. Wouldn't want it any longer than that though...
Instead of using multiple cogs for parallel flash and PC boot, you could use interrupts instead.
Maybe use a fuse to indicate skipping the first serial 100ms wait?
That's how it's working now. It WAS in two cogs, but with interrupts, I got it down to one cog. I made 4 generic events/interrupt-sources that can respond to pins, locks, and lut r/w's. We needed more pin events to handle smart pins.
The risk of a fuse, is it is one-time.
Common usage will be to do first load using UART Boot with delays, then normal operation is Fast SPI boot, and then updates may need a jumper to do UART boot again.
The important detail, is to avoid delays, in what may be a watchdog reset.
This is common flow with most MCUs with a BOOT ROM - they use some pin, to go into UART Loader after reset.
Even if the P2 has no weak pullup, you can still determine 'Not pulled down' with a 'drive high then quickly check'
However, P2 will be controlling Drones, and machinery, and there will be system Watchdogs present in many P2 designs.
Those are the ones where least possible disturbance during Watchdog event is required (see Peter's post)
Other MCUs have times measured in sub 1us, or 10's of us for Reset release.
Some, have a POR delay that is inserted only on Power-Cycle, of 10~20ms - this longer delay is not present on Software reset, or Watchdog resets.
Devices are made with these numbers for a reason - the end designs demand them.
I guess that is another detail now locked-in, and unable to be fixed ?
Other MCUs have a POR delay of some-milliseconds only on Power-Up, but much faster delays on Pin-Reset or Software Reset, or WatchDog resets.
Even that seems slow ? - suggests ~1MHz clock ? - what is the P2 boot clock, and what is the SPI clock ?
Also, can the 2nd stage booter load less than 512 longs ?
It's taking 11ms just to read the 2nd stage booter from Flash. That's 2K bytes. Then, it takes 17ms to run the SHA-256/HMAC verification on it - this is software and cannot be sped up, unlike using smart pins to load flash. Anyway, that's 28ms, so far. Then, it takes 13ms to clear the remainder of RAM. That's now 41ms. The RAM clearing could be skipped for flash booting, but it's needed for serial, as people may be downloading entire apps that will rely on memory being cleared. At least, that's how I've been thinking about it. It's just a housekeeping matter that is easily taken care of.
Remember that on reset, code must be reloaded from SPI flash, it's just going to take some time. And variables will not be preserved.
I don't think this can be a sub-ms reset procedure, in any case. There's just too much to do.
Is the interest in fast reset response a power-saving issue, as in the part is kept off until the moment it is needed, then shut back down?
Note that using a pull-down on /CS is ill advised as that will enable the flash chip if present.
SCK, MISO and MOSI could be used with weak pull-up or pull-down resistors to allow for up to eight different boot states without interfering with SPI operation, to be checked before loading the default loader from SPI.
for example:
111 = regular SPI/serial boot
110 = USB bootloader
101 = uSD boot
100 = Chip's serial monitor
etc
Note this way if the pull-up/downs are omitted it will likely fall into the regular SPI/serial boot
Are both those times code-size proportional, and can you load less than 2K bytes ?
But a watchdog reset is still a reset, right? It begins a whole new boot.
The pull down is pulled high, as part of the test, and the flash chip is not clocked before then.
(again, with P2 pins all floating, the CS and CLK lines might want to have added resistors, otherwise you are hoping the floating pins never activate erase/write operations)
Interesting idea.
How many loaders can fit into the 1 COG ? With floating pins, any simple level test is a lottery.
You can check for pulled-one-way, with an impulse and read-back.
The Vcc-Ramp case, where things like Crystals need to start, is often made longer (some milliseconds), whilst a device with a Calibrated RC osc, can be stable in microseconds. (some parts have Fuse-choices for these Reset Delays)
Other reset paths, like SW reset and Watchdog, skip the Vcc-Ramp delay as the Crystal is already running.
In the P2, boot is from RC Fast, and Crystal or PLL is not yet enabled.
Typically Crystal and PLL enable, both have user-added delays, but those are outside the Boot-ROM.
SPI CS, being active low, should have a pull-up resistor on it, in any case. The only way a smart pin can pull is to configure it as a high output with either a resistor or current source. That means some cog has to always keep the DIR bits high, which is not practical.
Perhaps fast boot (skip serial) could be achieved by putting a pull-up/down on the SPI CLK pin. That would be safe.
Also, the 2nd-stage SPI booter could be made $100 longs, instead of $200 longs. That would cut the current 28ms down to 14ms. Without changing the 20ms reset timer, we'd have 20ms + 14ms + 25ms (512KB of nibbles at 40MHz). That would bring the boot time down to 49ms.
Is the boot clock 40MHz ?
I thought the Boot process first read a Count value, to decide how many Longs to load ?
That would make those numbers worst case.
What sets the (fixed?) 20ms, and why was 20ms chosen, given this uses RC fast and not a crystal ?
Perhaps another number can be spec'd, which is fastest practical time to user-define-ports ?
The Pin config lines would be inserted at the very front of the stage 2 loader, which is also user-designed to be compact, to lower the load time.
The boot clock is ~20MHz. I was thinking that you'd switch to crystal and PLL in the 2nd stage booter, so you were running 160MHz, and then get the 40MHz clock rate for SPI flash.
The 20ms silicon reset timer's period was chosen arbitrarily. It could be 10ms, or even 1ms. I just figured 20ms would be a safe value. For this chip, an external brownout detector will be needed for the 1.8V core supply. I didn't want to mess with a bandgap reference this time around.
One watch dog COG could fast reset and reduce the number of time dependent cases being discussed here.
I've been thinking about this comment. While it is certainly possible that you could make a one-cog version of the chip, I just don't see it ever happening. Here's why:
So use 2 cogs to boot, if it will help! For instance, you could use the shared LUT such that one cog is reading from the SPI flash (and writing to LUT) and the other cog is (reading from LUT and) validating the code before copying it to the hub.
Again, IMHO, that specific case can be solved with a fuse(s). Otherwise, a slower more general boot case can do.
I'm not happy that smart pin 0 is being used as I'd rather see 61 or similar was used.
Also the pinouts do not account for SPI Quad mode to be used subsequently.
And of course, I would like to see an SPI SD Card boot as well. But we need to start somewhere for now.
I need a little more time to actually think about all this.
Yes, but that clock change would be done after the critical Pin-Init section, and would likely include delays for any Crystal-start and PLL lock.
Which raises the question - When is the Crystal running ? What is the PLL lock time ?
-ie how much time budget is needed for a change to Crystal+PLL ?
Given an external reset is needed anyway, I would suggest lowering the internal hardwired Reset delay.
The RC FAST starts very quickly, I assume ?
Users can define reset times via PGOOD signals and Caps, using the hysteresis in the RESET pin.
I think the booter should be small and fixed size, so that it can turn on the crystal and PLL and then do the big loading 8x faster.