P2 SD Boot Code (v32+)

Cluso99 · 2018-04-07 23:33

Need a new thread

P2 has the following preferred pins...

P63 = RXD serial
P62 = TXD serial
P61 = SPI_CS
P60 = SPI_CLK
P59 = SPI_DQ = SPI_DO
P58 = (optional SPI_DI)

Pullup definitions...
P61 = Pullup = load from attached SPI FLASH or SPI SD
P60 = Pullup = and program loaded successfully, run SPI FLASH program

Now, because there are now (and possibly more) SPI commands for the FLASH that may interfere with the SD setup, I propose we define an alternative...

P60 = Pulldown = used to determine SPI SD attached and if loaded successfully run SPI SD program

By testing the pullups and pulldowns before selecting the FLASH or SD, the boot code can determine whether FLASH or SD is attached. This is an either/or case. Both FLASH and SD is not being considered at this point. Both may work, but we are not necessarily catering for it now because there is insufficient time to ensure it works correctly.
So we have...

load = |          | FLASH    | FLASH   | SD       |  
boot = | Serial   | FLASH    | Serial  | SD       |
P61  = | none     | pullup   | pullup  | pullup   |
P60  = | xxxx     | pullup   | none    | pulldown |

If P61=pullup and P60=pullup/none, we could try FLASH and if not found, then try SD.
IMHO there is no need to have a load SD and then goto serial because we can remove the SD card if the code is faulty.

I am presuming here that serial will be used to start Tachyon. Chip/Peter will need to confirm.

Chip,
Why couldn't COG #0 be started to look for serial while COG #1 is started to try and load FLASH/EEPROM ?

Cluso99 · 2018-04-08 01:28

SD BOOT PROCESS

After a hardware reset, cog 0 loads and executes a booter program from an internal ROM. The booter program (ROM_Booter.spin2) performs the following steps:

If an external pull-up resistor is sensed on P61 (SPI_CS), and an external pull-down resistor is sensed on P60 (SPI_CLK), then attempt to boot from SPI SD:
    a)  Initialise the SD card. If successful, continue, else goto serial.
    b)  Load MBR Sector #0000_0000 (512 bytes / 128 longs) into the hub starting at $00000.
        i)  Compute the 32-bit sum of the 256 bytes / 64 longs from $00080-$0017F.
        ii)  If the sum is "Prop" ($706F7250):
            (1)  Copy the first 128 longs from hub into cog registers $000..$0FF.
            (2)  Execute ‘JMP #$080’ to run the program. Done.
    c)  Locate 1st sector of Ptn0 (VOL Sector)
    d)  Load VOL Sector (512 bytes / 128 longs) into the hub starting at $00000.
        i)  Compute the 32-bit sum of the 256 bytes / 64 longs from $00080-$0017F.
        ii)  If the sum is "Prop" ($706F7250):
            (1)  Copy the first 128 longs from hub into cog registers $000..$0FF.
            (2)  Execute ‘JMP #$080’ to run the program. Done.
    e)  Locate the FAT Table 1st cluster of Ptn0 (this is the start of the Directory Table)
    f)  Search the first cluster directory for a filename of “????????.???”, and if not found, then search the first cluster directory for a filename of “????????.???”. Note filenames TBD. If neither filenames found then goto serial.
        i)  Filename found, so locate file’s first data cluster, and
        ii)  Load first 32KB of file data into hub starting at $00000
            (1)  Copy the first 128 longs from hub into cog registers $000..$0FF.
            (2)  Execute ‘JMP #$000’ to run the program. Done.

The SD Card would connect as follows, for booting:

P61 = CSn	(needs an external 10k pull-up)
P60 = CLK	(needs an external 10k pull-down)
P59 = DO		
P58 = DI

Notes:
1.  The whole MBR/VOL 512 byte sector will be loaded, although only $080-$17F will be validated for the checksum. The program will JMP $080 if a valid checksum is determined. 
2.  However, the full 512 bytes will be available in COG. This BOOT program within $080-$17F will be able to determine (knows) if any of the remainder of those 512 bytes are valid/available. 
3.  This allows for future variations of the SD Boot Code, depending on what is available to use.
4.  By having the boot options in both the MBR and the VOL locations, it future-proofs SD Booting should FAT32 or the Partitioning subsequently change in the future.

jmg · 2018-04-08 01:38

Cluso99 wrote: »

...

P2 has the following preferred pins...

P63 = RXD serial
P62 = TXD serial
P61 = SPI_CS
P60 = SPI_CLK
P59 = SPI_DQ = SPI_DO
P58 = (optional SPI_DI)

Is that pinout streamer compatible with QuadSPI and OctaSPI ?
ISTR a pinout map went past that was QuadSPI and OctaSPI streamer aligned ?

Cluso99 wrote: »

Why couldn't COG #0 be started to look for serial while COG #1 is started to try and load FLASH/EEPROM ?

Anything is possible, but that's starting to move sway from KISS, and assumes 2 working COGS, and could get 'interesting' on a system with Flash and Serial, as flash will be busy or done, as serial
just gets started...

Cluso99 · 2018-04-08 08:11

QSPI doesn't map in this space. Neither does full SD mode.
That request was lost a long time ago in the interest of minimal pins. Any use of Quad or Octal mode will require additional separate 4 or 8 DQ pins.

evanh · 2018-04-08 08:32

I presume that's
P59 = SPI_MISO
P58 = SPI_MOSI

jmg · 2018-04-08 08:32

Cluso99 wrote: »

QSPI doesn't map in this space. Neither does full SD mode.
That request was lost a long time ago in the interest of minimal pins. Any use of Quad or Octal mode will require additional separate 4 or 8 DQ pins.

Really ? So that means a whole NEW part needs to be connected ? - which totally blows the 'P2 designed for minimal pins' claim.
It's only minimal pins, in minimal performance, and using a subset of the parts abilities. Can't really see the marketing sense of that ?

evanh · 2018-04-08 08:44

JMG,
The clock pin can still be shared. The CS is not shareable unless daisy chaining. That only leaves pins 58 and 59. I suspect those can be reused in a 4-bit mode, maybe with bit reordering in the Prop2 pin selectors.

evanh · 2018-04-08 08:50

Ouch! SD cards repurpose the CS pin as a data pin in 4-bit mode! Only way to make that ever fit is to move SPI_CLK to pin P61.

jmg · 2018-04-08 08:56

evanh wrote: »

The clock pin can still be shared. The CS is not shareable unless daisy chaining. That only leaves pins 58 and 59. I suspect those can be reused in a 4-bit mode, maybe with bit reordering in the Prop2 pin selectors.

That 'maybe' is the key question - the boot pin map does need to be compatible with a user running a QuadSPI/OctSPI mode part.
These parts are very common, and XIP is now defacto on larger SPI-connected MCUs.

The Streamer I think is the most constraining issue here, as I think the Streamer cannot map nibble-sets to ANY pin offset.
That must then impose a certain order on the boot pins ?

evanh · 2018-04-08 09:48

Oops, just remembered the mapping trick is only for inputs. Okay, throw that idea away completely. I'll butt out.

cgracey · 2018-04-08 14:24

To support all possible memory hookups, we make a huge mess of the pins on startup and, by doing so, decide for the customer that memory flexibility is more important than what he might want to do with his pins.

We only need three pins to achieve a 50ms 512KB boot load using an 8-pin SPI flash. We can add one more pin and make SD work, as well, on a mutually-exclusive basis.That's where we are headed for boot-up.

jmg · 2018-04-08 19:30

cgracey wrote: »

To support all possible memory hookups, we make a huge mess of the pins on startup and, by doing so, decide for the customer that memory flexibility is more important than what he might want to do with his pins.

We only need three pins to achieve a 50ms 512KB boot load using an 8-pin SPI flash. We can add one more pin and make SD work, as well, on a mutually-exclusive basis.That's where we are headed for boot-up.

I'm not talking about booting in Dual/Quad/Oct, boot is always going to be 1-bit SPI or 2 pin serial (not sure where 1 pin serial is ?)

Where care is needed, is in aligning those 1-bit SPI pins that are used with the streamer, so a user PCB design can run their code at full P2 speed.

You do not need any more boot pins to allow a user to run QuadSPI/OctSPI, as that has reset commands for single spi, and some parts also have RESET pin.

What needs to be avoided, is having to tell a user they need to connect TWO flash parts, if they want a fully operational P2.

Besides the PCB and BOM impact, just managing two lots of code, and getting that code into the two memories will be a user and support nightmare.

Cluso99 · 2018-04-08 22:04

Personally I don't see the use/requirement of QSPI. I know Peter J shares my view. If it was used for booting, then that's different.

For selfish reasons I'd like to try SD with Quad mode, but I can add extra pins to try it out. Not going to happen before silicon tho.

Anyway, Chip has decided the pinout. Done.

jmg · 2018-04-09 19:45

Cluso99 wrote: »

Personally I don't see the use/requirement of QSPI.

Hmm, when I look at all the chip vendors making QSPI & now OSPI Flash, and the MCU vendors busy adding support for QSPI and OSPI, I'm inclined to think they have many customers who CAN see uses, and have real requirements for those interfaces. It certainly gives measurably better performance - maybe that actually matters ?

If 1-SPI really was 'good enough', that's all that would be available.

Cluso99 wrote: »

Anyway, Chip has decided the pinout. Done.

Pinout is simple mapping. Someone needs to do a connection mapping for Boot-Flash and QuadSPI or OctSPI, and count the pins required & show that to someone like Ken, who has to sell this.

Then imagine Parallax taking that to any potential large customer, who says 'Err, What..? you expect me to have to connect TWO memories to this thing, to use it properly ?!"

cgracey · 2018-04-09 21:13

jmg wrote: »

Cluso99 wrote: »

Personally I don't see the use/requirement of QSPI.

Hmm, when I look at all the chip vendors making QSPI & now OSPI Flash, and the MCU vendors busy adding support for QSPI and OSPI, I'm inclined to think they have many customers who CAN see uses, and have real requirements for those interfaces. It certainly gives measurably better performance - maybe that actually matters ?

If 1-SPI really was 'good enough', that's all that would be available.

Cluso99 wrote: »

Anyway, Chip has decided the pinout. Done.

Pinout is simple mapping. Someone needs to do a connection mapping for Boot-Flash and QuadSPI or OctSPI, and count the pins required & show that to someone like Ken, who has to sell this.

Then imagine Parallax taking that to any potential large customer, who says 'Err, What..? you expect me to have to connect TWO memories to this thing, to use it properly ?!"

Jmg, I register your concern about this, but something like XIP is pointless on this chip. QSPI will never compete with internal memory bandwidth and who's going to want to run a (Q)SPI flash chip on a continuous basis, anyway? I see it necessary to hold the user program and some data that gets read periodically, but what's the point of a fast 4-data-pin ROM? QSPI does nothing to speed up writing, only reading. If I thought this was important, I'd accommodate it in the full sense. As it is, you CAN have QSPI, but with reversed data pins.

jmg · 2018-04-09 21:26

cgracey wrote: »

As it is, you CAN have QSPI, but with reversed data pins.

Can you include a pin diagram showing how that connects, and how it uses the streamer ?
Can you also connect an OctSPI, and boot from that in 1-SPI, but use the same memory in x8 later ?

cgracey wrote: »

QSPI will never compete with internal memory bandwidth and who's going to want to run a (Q)SPI flash chip on a continuous basis, anyway? I see it necessary to hold the user program and some data that gets read periodically, but what's the point of a fast 4-data-pin ROM? QSPI does nothing to speed up writing, only reading.

Some obvious use cases :
Font storage for LCD, where precious CODE memory is reserved in RAM, and Font info is stored in Flash. For that to work, users need to be able to quickly read the font info.
Likewise for graphics and background images for MMI - those update from flash.

Stage 2 boot and code overlays : Initial boot is always 1-SPI, but some users will want stage-2 fast boot, or faster load of code overlays during their system operation.

Ariba · 2018-04-09 21:33

jmg wrote: »

...
Pinout is simple mapping. Someone needs to do a connection mapping for Boot-Flash and QuadSPI or OctSPI, and count the pins required & show that to someone like Ken, who has to sell this.

Then imagine Parallax taking that to any potential large customer, who says 'Err, What..? you expect me to have to connect TWO memories to this thing, to use it properly ?!"

https://forums.parallax.com/discussion/comment/1424951/#Comment_1424951

cgracey · 2018-04-09 21:34

jmg wrote: »

cgracey wrote: »

As it is, you CAN have QSPI, but with reversed data pins.

Can you include a pin diagram showing how that connects, and how it uses the streamer ?
Can you also connect an OctSPI, and boot from that in 1-SPI, but use the same memory in x8 later ?

cgracey wrote: »

QSPI will never compete with internal memory bandwidth and who's going to want to run a (Q)SPI flash chip on a continuous basis, anyway? I see it necessary to hold the user program and some data that gets read periodically, but what's the point of a fast 4-data-pin ROM? QSPI does nothing to speed up writing, only reading.

Some obvious use cases :
Font storage for LCD, where precious CODE memory is reserved in RAM, and Font info is stored in Flash. For that to work, users need to be able to quickly read the font info.
Likewise for graphics and background images for MMI - those update from flash.

Stage 2 boot and code overlays : Initial boot is always 1-SPI, but some users will want stage-2 fast boot, or faster load of code overlays during their system operation.

Can all that not be accommodated with reversed data pins? I hate to interrupt the pin map for a use case that is going to be rare.

jmg · 2018-04-09 22:49

cgracey wrote: »

Can all that not be accommodated with reversed data pins? I hate to interrupt the pin map for a use case that is going to be rare.

I'm not sure what you mean by 'reversed data pins' - do you mean bridged pins ?

The SCH link Ariba gives above, shows use of bridged pins, where 2 prop pins connect together - but the data pins do look to be in the right order, not reversed ?

That's done in order to align with the streamer, as the current 1-SPI pin map does not streamer align.

A quad-connect then consumes 8 pins, when it might have been 6, and an Oct connect consumes 12 when it might have been 10.
It does save having to connect two memories, (which was a real 'ugh'), but the downside is it does consume 2 more Prop pins, and orphans 2 others...

Maybe that's a tolerable compromise ? Still seems quite high on the 'kludge meter' ?

If someone wanted to connect a Oct/Quad part, with no lost or orphaned pins, they could use a small sub 30c MCU as the Stage 1 boot, and connect like this :


     Serial        Octal         Quad           Dual/Single        Boot MCU           Possible  Possible 
     Stage1 Boot   Stage2 Boot   Stage2 Boot    Stage2 Boot        Stage 1            PC BBU ?  PC LdMCU
63   rx_pin        DQ7           DQ3            nu                 [MCU.TX   --[2k2]-- USB.TX    USB-RX 
62   tx_pin        DQ6           DQ2            nu                 [MCU.RX   --[2k2]-- USB-RX    USB.TX
61                 DQ5           DQ1.DO         DQ1.DO            
60                 DQ4           DQ0.DI         DQ0.DI            
59                 DQ3           spi_ck         spi_ck            
58                 DQ2           spi_cs Pullup  spi_cs Pullup     
57                 DQ1.DO                                         
56                 DQ0.DI                            
55                 spi_ck                            
54                 spi_cs  Pullup                    
                   spi_rst ------spi_rst--------[MCU pin

During and out of P2 reset, the SPI memory is held CS=H, and boot MCU loads ~8-12-16k of code, which then loads from SPI. (MCU tristates)
It's 2 chips, but small MCU are getting ever cheaper, and smaller, and this connection is used by someone who really needed all those prop pins, in neat streamer groups.
Such a MCU can manage the SPI memory HW RESET and even P2 reset & watchdog function too..

Addit: added Dual/Single connect, which looks to be able to co-exist with P63.62 UART

If someone expands to Quad, the rule is cannot use P63.62 UART at the same time, but could BBU on that UART. Note 1-SPI Boot could be possible from that pin map (minor pin change).
If someone expands to Octal, the rule is cannot use P63.62 UART at the same time, but could BBU on that UART & there is no 1-SPI Boot (needs MCU stage 1 boot)

Peter Jakacki · 2018-04-10 01:07

This issue of QSPI/OSPI can be handled very easily. The one making a use case for it shows where they would use it and why it is better.
@jmg - We appreciate the information and input but none of us can see where we would use it although I can see plenty of uses for P2 and how I would use it.
Do you have an application in mind?
How will not having wide SPI affect the application's outcome?

jmg · 2018-04-10 01:17

Peter Jakacki wrote: »

This issue of QSPI/OSPI can be handled very easily. The one making a use case for it shows where they would use it and why it is better.
@jmg - We appreciate the information and input but none of us can see where we would use it although I can see plenty of uses for P2 and how I would use it.
Do you have an application in mind?

Some use cases are 4 post up ^, and some pin-outs to suit 8/4/2/1 SPI use, are 2 posts up ^. (included caveats/rules)

Peter Jakacki wrote: »

How will not having wide SPI affect the application's outcome?

Speed always matters, and limiting users to only 1-SPI guarantees things are slower than they could have been.
In some cases wider-spi will allow lower clocks speeds and thus lower power.

evanh · 2018-04-10 01:32

So, JMG, the suggestion looks to really be using only async Rx/Tx for booting. Any SPI, or others, is completely optional for secondary/extended support.

I quite like that as a minimal pin use scenario.

Peter Jakacki · 2018-04-10 01:48

@jmg - P2 P1 has font tables in a part of its 32k ROM and I would expect that in P2 those font tables, even if they are larger than that are in the 512k RAM. As for graphic HMI screens I expect to load those from "slow" SD as I can load a raw 128k image in a blink of an eye. I say 128k as I don't expect to use the whole 512k just for graphics, we already have chips that do that better so we are not trying to compete in that area.

But what I am asking is do you have a real application in mind because unless it is a real application we tend to over-spec our whiz-bang project just in case and as experience has shown us, we end up not using those fancy features and get by just fine. You can still hook-up fast & wide Flash as one extra part for that esoteric application.

evanh · 2018-04-10 01:54

Peter,
If I've read JMGs writings correctly, only two pins, P62 for Tx and P63 for Rx, are required for minimal booting. Admittedly, that involves programming another "boot" MCU that has onboard Flash.

jmg · 2018-04-10 02:41

evanh wrote: »

So, JMG, the suggestion looks to really be using only async Rx/Tx for booting. Any SPI, or others, is completely optional for secondary/extended support.
I quite like that as a minimal pin use scenario.

Yes, boot does that now, it starts checking UART, but can skip if a pin is pulled low.
I think it is still useful to be able to SPI-BOOT, the question is really over the pin map to use.
Focusing on that minimal pin use scenario for Quad spi users, is the pin map I suggested above.
That has some caveats in simultaneous use of UART and SPI, but for those using UART to bring up the board, that's likely fine.

evanh wrote: »

If I've read JMGs writings correctly, only two pins, P62 for Tx and P63 for Rx, are required for minimal booting. Admittedly, that involves programming another "boot" MCU that has onboard Flash.

Close, however that MCU is only mandatory for the 10 pin OctaSPI connection, when the Boot-via-1-SPI is not possible.
In the other connections, Boot-via-1-SPI is still possible. 1-2 SPI connect leaves the UART completely free.
4-SPI means you cannot use UART and Quad at the same time. Is the user needs a system uart, choose 2 other pins.

evanh · 2018-04-10 03:02

None of this matters then. If one wants to use a partnering boot MCU to manage the whole thing from Tx/Rx then it can already be done irrespective of a 3-pin SPI option being in the mask ROM.

jmg · 2018-04-10 03:38

evanh wrote: »

None of this matters then. If one wants to use a partnering boot MCU to manage the whole thing from Tx/Rx then it can already be done irrespective of a 3-pin SPI option being in the mask ROM.

I think you missed the point.

Yes, a partner MCU allows you to do anything, but you do not want to be bumped to a partner MCU 'band aid' too early.

ie P2 boot should be able to allow your minimal pin use scenario, for as many SPI modes as practical.

Looks to me like it can support UART (2 pins), 1-SPI or 2-SPI in (4 more pins), and even 4-SPI (same pin cost) with no MCU is possible, with some simultaneous use exclusions.
The MCU use is only really mandated for OctaSPI.

The current ROM mapping, is poorly compatible with a 4-SPI connection, and is even incompatible with a 2-SPI connection.
I've not seen how SD mixes into this yet either...

evanh · 2018-04-10 04:02

Your pinout table for 4-SPI says "Stage2". What is Stage1 in that config?

BTW: The existing basic 3-pin booter (1-SPI) will be fine for extending to 2-SPI. The pin order isn't going to matter.

jmg · 2018-04-10 04:13

Peter Jakacki wrote: »

But what I am asking is do you have a real application in mind because unless it is a real application we tend to over-spec our whiz-bang project just in case and as experience has shown us, we end up not using those fancy features and get by just fine. You can still hook-up fast & wide Flash as one extra part for that esoteric application.

OK, lets take something quite mainstream, like a 800x480 LCD display, HMI ( - neither over-spec'd nor esoteric. )
Memory map that inside the P2, and we have 2^19-800*480-2^14 = ~123904 free code space.

Not much room for your fonts in there, barely enough room for code !

Now imagine you want to support 16 bit font index (multi language, global product) and 64x32 fonts, because you want this to look, well, not amateurish & to sell global.
Such a font can need 16MBytes, and each char is 256 bytes. (+ more for smaller fonts...)
You claim 2048 clocks is fine to load a char, but the designer might beg to differ, and prefer 512 or 256 clocks, or 128 if brave....

My point is P2 should not fight that choice.

evanh · 2018-04-10 05:06

Ah! Pin reordering possibly can be done for 4-SPI modes when burst reading with the Streamer. Limited to the +-3 pin range and won't work for writes but then Flash writes are slow anyway so switching back to 1-SPI and bit-bashing or Smartpin would be fine for that.

jmg · 2018-04-10 05:25

evanh wrote: »

Ah! Pin reordering possibly can be done for 4-SPI modes when burst reading with the Streamer. Limited to the +-3 pin range ...

Interesting idea, but I fear those +/- 3 muxes do not apply to the streamer mapping ?
Streamer mapping is rather block-allocated.

P2 SD Boot Code (v32+)

Comments