P2 SD Boot Code (v32+)

Peter Jakacki · 2018-04-10 05:42

jmg wrote: »

Peter Jakacki wrote: »

But what I am asking is do you have a real application in mind because unless it is a real application we tend to over-spec our whiz-bang project just in case and as experience has shown us, we end up not using those fancy features and get by just fine. You can still hook-up fast & wide Flash as one extra part for that esoteric application.

OK, lets take something quite mainstream, like a 800x480 LCD display, HMI ( - neither over-spec'd nor esoteric. )
Memory map that inside the P2, and we have 2^19-800*480-2^14 = ~123904 free code space.

Not much room for your fonts in there, barely enough room for code !

Now imagine you want to support 16 bit font index (multi language, global product) and 64x32 fonts, because you want this to look, well, not amateurish & to sell global.
Such a font can need 16MBytes, and each char is 256 bytes. (+ more for smaller fonts...)
You claim 2048 clocks is fine to load a char, but the designer might beg to differ, and prefer 512 or 256 clocks, or 128 if brave....

My point is P2 should not fight that choice.

But you are picking a "just fits" use case and then trying to say that it must cater for multi-language etc on top of that. Sure, I'd like it to handle the graphics and have plenty of room left over but what if P2 had 1M of RAM? Would you find another use case to show us that it needs wide Flash too?

cgracey · 2018-04-10 05:46

jmg wrote: »

evanh wrote: »

Ah! Pin reordering possibly can be done for 4-SPI modes when burst reading with the Streamer. Limited to the +-3 pin range ...

Interesting idea, but I fear those +/- 3 muxes do not apply to the streamer mapping ?
Streamer mapping is rather block-allocated.

It would work for reading the pins via the streamer, but not for setting them.

evanh · 2018-04-10 05:49

Yep, I think all of the pin control bits work whether it's operating as a Smartpin or not. Including features like the 3-sample filtering.

cgracey · 2018-04-10 05:57

evanh wrote: »

Yep, I think all of the pin control bits work whether it's operating as a Smartpin or not. Including features like the 3-sample filtering.

That's right.

jmg · 2018-04-10 06:42

Peter Jakacki wrote: »

But you are picking a "just fits" use case and then trying to say that it must cater for multi-language etc on top of that. Sure, I'd like it to handle the graphics and have plenty of room left over but what if P2 had 1M of RAM? Would you find another use case to show us that it needs wide Flash too?

No need to find another use case for 1M, this example fails just as badly.
Perhaps you missed the 16MB of Flash needed ?
Multi-language is nothing unusual in modern design, and 800x480 is average on today's screens.

Cluso99 · 2018-04-10 07:40

I don't see a requirement for QSPI either.

But, WHOA !!! Suggesting to use P62 & P63 is a definate NO-GO IMHO !!!!

evanh · 2018-04-10 08:33

I think we're all good with just using a basic SPI arrangement now. 4-bit modes can be optionally tacked on just by adding lower pins on the Prop2 and doing pin remapping for burst reads.

One thing likely to help is swapping the Prop2's SPI_CS for SPI_CLK. This is because SD cards use CS for bit3 in 4-bit mode.

evanh · 2018-04-10 10:26

P63 = RXD serial
P62 = TXD serial
P61 = SPI_CLK
P60 = SPI_CS      (SD bit3)  (SQI CE)
P59 = SPI_MISO    (SD bit0)  (SQI bit1)
P58 = (SPI_MOSI)  (SD CMD)   (SQI bit0)
P57 =             (SD bit2)  (SQI bit3)
P56 =             (SD bit1)  (SQI bit2)

() = Optional

evanh · 2018-04-10 10:43

The input mappings for SD would be: P59->56, P60->59, P57->58, P56->57.
And for SQI: P59->57, 57->59, P58->56, P56->58.

SQI demands that commands and writes use all 4 bits. This will mean some massaging but not a major for Flash parts. SQI RAMs would suffer but I don't know if they're even a thing. I'd probably go 8-bit instead, Eg: HyperRAM.

EDIT: SD 4-bit mode may also be a one-way trip, and that would also demand data writes to use the 4 data pins. Again, doable, albeit the messiest and slowest 4-bit config.

evanh · 2018-04-10 11:40

Ooooh, I hadn't read what Chip has written in the Prop2 Doc. He's tying both SO and SI together. On a single Flash part, that would be impossible to switch to 4-bit mode.

So everything I've written doesn't work!

In hindsight, that should have been obvious. I had sort of assumed there was a default sequential read mode that didn't need any commands.

EDIT: All is good again.

cgracey · 2018-04-10 12:13

evanh wrote: »

Ooooh, I hadn't read what Chip has written in the Prop2 Doc. He's tying both SO and SI together. On a single Flash part, that would be impossible to switch to 4-bit mode.

So everything I've written doesn't work!

In hindsight, that should have been obvious. I had sort of assumed there was a default sequential read mode that didn't need any commands.

I'm going to separate DI and DO, in order to support the SD card pinout on the same pins as SPI flash, though they'd be mutually exclusive.

This also facilitates easier smart pin usage, as you don't have to micro-manage pin direction.

Cluso99 · 2018-04-10 12:14

Evan,
I suggested to Chip, and he agreed, that P2 read both P59 & P58 to see which pin is connected to DI. That way 3 or 4 pins will both work.
I don't think that helps arrange the pins in any better order, but it will allow the pins to be separate so QSPI can be connected without DI & DO joined by a resistor.

cgracey · 2018-04-10 12:19

This is the pinout for boot memory:

cgracey · 2018-04-10 12:22

Cluso99,

It sounds like your code runs in hub exec?

If I see a pull-up on P61, but it doesn't test out as Flash, could I call your code to do its test to determine of an SD card is present?

It could go ahead and load however many bytes we were talking about before, if it sees an SD card.

There will be some finesse to making this work with respect to potential serial timing. We'll get it worked out.

evanh · 2018-04-10 12:32

cgracey wrote: »

This also facilitates easier smart pin usage, as you don't have to micro-manage pin direction.

That one could be got around by using a, hopefully otherwise unused, neighbouring Smartpin with pin remap for the input.

cgracey · 2018-04-10 12:36

evanh wrote: »

cgracey wrote: »

This also facilitates easier smart pin usage, as you don't have to micro-manage pin direction.

That one could be got around by using a, hopefully otherwise unused, neighbouring Smartpin with pin remap for the input.

Switching between input and output and related activity is what causes the headache. Having those two pins physically separate simplifies software.

If we weren't going to support SD card on the same pins, I would keep it at three pins.

evanh · 2018-04-10 12:56

It'll be a challenge to keep sane with any input remapping me thinks.

rjo__ · 2018-04-10 13:56

This is way above my pay grade... but after you are done loading, the assets are still going to be there.
To make best use of them, I don't think you would want to share any pin resources.

evanh · 2018-04-10 14:09

Mostly this is the opposite discussion - about having a single Flash part with multiple resources that might be loaded again and again as the operator uses the equipment in varying ways.

So, the details of what we are nutting out, is how to have a single basic boot method that works with all flavours of SPI while also allowing for faster modes of operation if that one part can do it and the system builder wants to have it that way.

evanh · 2018-04-10 14:16

Eg: The SD option is not likely to be sharing with another Flash SPI part, it would be just the SD card only. If that SD card is removed then the equipment will not boot until the SD card is reinserted.

Cluso99 · 2018-04-10 15:08

evanh wrote: »

Eg: The SD option is not likely to be sharing with another Flash SPI part, it would be just the SD card only. If that SD card is removed then the equipment will not boot until the SD card is reinserted.

Yes.

Cluso99 · 2018-04-10 15:34

cgracey wrote: »

Cluso99,

It sounds like your code runs in hub exec?

If I see a pull-up on P61, but it doesn't test out as Flash, could I call your code to do its test to determine of an SD card is present?

It could go ahead and load however many bytes we were talking about before, if it sees an SD card.

There will be some finesse to making this work with respect to potential serial timing. We'll get it worked out.

Actually it's a mix of cog and hubexec, although there is really no reason it won't run in hubexec excepting the working registers.

IMHO, the easiest way is in HUBEXEC to test for a pull-up on P61 (CSn) which means SPI FLASH or SD present, and then test for a pull-up/none/pull-down on P60 (CLK). A pull-up or none means FLASH, and pull-down means SD. Then load the Cog to boot FLASH or SD accordingly. This way we get the SPI load speed in the Cog.

My SD Boot code...

If there is a "Prop" signature in bytes $080-$17F in the MBR or PTN0 VOL sector, then I will load the first 4 or 8 (undecide yet) into Hub $0_0000... and then copy Hub $0_0000-$0-01EF (496 longs) into cog and JMP $080. It will be up to those 64 instructions to determine how the rest of the sectors loaded can/will be used. Often, some of those sectors have been used for further boot code by Microsoft etc, so it's a valid use.

If neither are present, and it's FAT32 format, then I search the directory (only the first directory cluster) for <filename1>, then if not found, re-search for <filename2>. If either are found, I currently load the first 32KB of the file's data into hub $0_0000... and then copy the first 496 longs to cog and JMP $000. I am thinking that perhaps it may be better to check the file size, and copy these bytes to Hub $0... If I do this, it will be a requirement that the file is contiguous as I won't be checking fornon-sequential clusters.

Today I ran the SD Boot from the RC osc. Worked flawlessly as expected. Previously I have used the 80MHz stall option.

jmg · 2018-04-10 19:29

rjo__ wrote: »

This is way above my pay grade... but after you are done loading, the assets are still going to be there.
To make best use of them, I don't think you would want to share any pin resources.

If by 'share any pin resources' you mean pin-bridging, which was the earlier means to support Quad/OctSPI parts, then I agree.

cgracey · 2018-04-10 19:36

Cluso99 wrote: »

cgracey wrote: »

Cluso99,

It sounds like your code runs in hub exec?

If I see a pull-up on P61, but it doesn't test out as Flash, could I call your code to do its test to determine of an SD card is present?

It could go ahead and load however many bytes we were talking about before, if it sees an SD card.

There will be some finesse to making this work with respect to potential serial timing. We'll get it worked out.

Actually it's a mix of cog and hubexec, although there is really no reason it won't run in hubexec excepting the working registers.

IMHO, the easiest way is in HUBEXEC to test for a pull-up on P61 (CSn) which means SPI FLASH or SD present, and then test for a pull-up/none/pull-down on P60 (CLK). A pull-up or none means FLASH, and pull-down means SD. Then load the Cog to boot FLASH or SD accordingly. This way we get the SPI load speed in the Cog.

My SD Boot code...

If there is a "Prop" signature in bytes $080-$17F in the MBR or PTN0 VOL sector, then I will load the first 4 or 8 (undecide yet) into Hub $0_0000... and then copy Hub $0_0000-$0-01EF (496 longs) into cog and JMP $080. It will be up to those 64 instructions to determine how the rest of the sectors loaded can/will be used. Often, some of those sectors have been used for further boot code by Microsoft etc, so it's a valid use.

If neither are present, and it's FAT32 format, then I search the directory (only the first directory cluster) for <filename1>, then if not found, re-search for <filename2>. If either are found, I currently load the first 32KB of the file's data into hub $0_0000... and then copy the first 496 longs to cog and JMP $000. I am thinking that perhaps it may be better to check the file size, and copy these bytes to Hub $0... If I do this, it will be a requirement that the file is contiguous as I won't be checking fornon-sequential clusters.

Today I ran the SD Boot from the RC osc. Worked flawlessly as expected. Previously I have used the 80MHz stall option.

Cluso99, this is very exciting! I agree about the pull-up and pull-down usage. Better to act deterministically than experimentally.

How long does it take for your code to run, worst-case?

jmg · 2018-04-10 19:40

evanh wrote: »

The input mappings for SD would be: P59->56, P60->59, P57->58, P56->57.
And for SQI: P59->57, 57->59, P58->56, P56->58.

be interesting to see this actually tested.... to see the R/W/Streamer differences.

evanh wrote: »

SQI demands that commands and writes use all 4 bits. This will mean some massaging but not a major for Flash parts. SQI RAMs would suffer but I don't know if they're even a thing. I'd probably go 8-bit instead, Eg: HyperRAM.

EDIT: SD 4-bit mode may also be a one-way trip, and that would also demand data writes to use the 4 data pins. Again, doable, albeit the messiest and slowest 4-bit config.

To clarify, newest Quad parts, seem to have TWO QuadEnable bits, and a software reset, and most have a HW reset pin on appropriate package. OctaSPI always have HW.Reset, as they are > 8 pins
One QE bit is Non-volatile, set that, and the part restarts in Quad Mode.
Conversely, use the Non-volatile QE bit, and a Reset will revert to 1-SPI.

This second mode is what P2 will boot from.

Chip issues various reset commands in the BOOT code, to exit Quad modes and go to 1-SPI, but newer parts have varying RESET exit delays, so I've suggested a smarter connected part loop.
ie rather than a fixed-guess-delay (as now), instead P2 polls until a valid SPI is seen, or some time limit is hit (120ms?)

That exits faster when it can, giving quicker reboot, but is also safe should reset occur some pgm/erase operation, which are slower to exit.

jmg · 2018-04-10 19:42

cgracey wrote: »

This is the pinout for boot memory:

Not all pins are showing here - does this mean P2 disturbs other pins during boot, even if only Serial is being used ? That's certainly less desirable.

cgracey · 2018-04-10 21:00

jmg wrote: »

cgracey wrote: »

This is the pinout for boot memory:

Not all pins are showing here - does this mean P2 disturbs other pins during boot, even if only Serial is being used ? That's certainly less desirable.

It checks for a pull-up on P61 before trying to talk to the SPI flash chip.

It looks like it will start checking for a pull-down on P61, as well, to detect SD card.

So, only P61 gets disturbed if there is no memory attached.

Cluso99 · 2018-04-10 21:44

cgracey wrote: »

Cluso99 wrote: »

cgracey wrote: »

Cluso99,

It sounds like your code runs in hub exec?

If I see a pull-up on P61, but it doesn't test out as Flash, could I call your code to do its test to determine of an SD card is present?

It could go ahead and load however many bytes we were talking about before, if it sees an SD card.

There will be some finesse to making this work with respect to potential serial timing. We'll get it worked out.

Actually it's a mix of cog and hubexec, although there is really no reason it won't run in hubexec excepting the working registers.

IMHO, the easiest way is in HUBEXEC to test for a pull-up on P61 (CSn) which means SPI FLASH or SD present, and then test for a pull-up/none/pull-down on P60 (CLK). A pull-up or none means FLASH, and pull-down means SD. Then load the Cog to boot FLASH or SD accordingly. This way we get the SPI load speed in the Cog.

My SD Boot code...

If there is a "Prop" signature in bytes $080-$17F in the MBR or PTN0 VOL sector, then I will load the first 4 or 8 (undecide yet) into Hub $0_0000... and then copy Hub $0_0000-$0-01EF (496 longs) into cog and JMP $080. It will be up to those 64 instructions to determine how the rest of the sectors loaded can/will be used. Often, some of those sectors have been used for further boot code by Microsoft etc, so it's a valid use.

If neither are present, and it's FAT32 format, then I search the directory (only the first directory cluster) for <filename1>, then if not found, re-search for <filename2>. If either are found, I currently load the first 32KB of the file's data into hub $0_0000... and then copy the first 496 longs to cog and JMP $000. I am thinking that perhaps it may be better to check the file size, and copy these bytes to Hub $0... If I do this, it will be a requirement that the file is contiguous as I won't be checking fornon-sequential clusters.

Today I ran the SD Boot from the RC osc. Worked flawlessly as expected. Previously I have used the 80MHz stall option.

Cluso99, this is very exciting! I agree about the pull-up and pull-down usage. Better to act deterministically than experimentally.

How long does it take for your code to run, worst-case?

Unsure Chip. However, the SD card itself has quite a time delay after the software reset command is issued. This will have quite an impact, as will the enforced 75+ clocks to initialise. I have also deliberately inserted tiny delays in the clocking.

IMHO, with SD, it's not so important that P2 boot as fast as it is with FLASH.

Once it's all working in final form I can time it, but as I said, the card itself has quite an impact.

cgracey · 2018-04-10 23:17

Cluso99 wrote: »

cgracey wrote: »

Cluso99 wrote: »

cgracey wrote: »

Cluso99,

It sounds like your code runs in hub exec?

If I see a pull-up on P61, but it doesn't test out as Flash, could I call your code to do its test to determine of an SD card is present?

It could go ahead and load however many bytes we were talking about before, if it sees an SD card.

There will be some finesse to making this work with respect to potential serial timing. We'll get it worked out.

Actually it's a mix of cog and hubexec, although there is really no reason it won't run in hubexec excepting the working registers.

IMHO, the easiest way is in HUBEXEC to test for a pull-up on P61 (CSn) which means SPI FLASH or SD present, and then test for a pull-up/none/pull-down on P60 (CLK). A pull-up or none means FLASH, and pull-down means SD. Then load the Cog to boot FLASH or SD accordingly. This way we get the SPI load speed in the Cog.

My SD Boot code...

If there is a "Prop" signature in bytes $080-$17F in the MBR or PTN0 VOL sector, then I will load the first 4 or 8 (undecide yet) into Hub $0_0000... and then copy Hub $0_0000-$0-01EF (496 longs) into cog and JMP $080. It will be up to those 64 instructions to determine how the rest of the sectors loaded can/will be used. Often, some of those sectors have been used for further boot code by Microsoft etc, so it's a valid use.

If neither are present, and it's FAT32 format, then I search the directory (only the first directory cluster) for <filename1>, then if not found, re-search for <filename2>. If either are found, I currently load the first 32KB of the file's data into hub $0_0000... and then copy the first 496 longs to cog and JMP $000. I am thinking that perhaps it may be better to check the file size, and copy these bytes to Hub $0... If I do this, it will be a requirement that the file is contiguous as I won't be checking fornon-sequential clusters.

Today I ran the SD Boot from the RC osc. Worked flawlessly as expected. Previously I have used the 80MHz stall option.

Cluso99, this is very exciting! I agree about the pull-up and pull-down usage. Better to act deterministically than experimentally.

How long does it take for your code to run, worst-case?

Unsure Chip. However, the SD card itself has quite a time delay after the software reset command is issued. This will have quite an impact, as will the enforced 75+ clocks to initialise. I have also deliberately inserted tiny delays in the clocking.

IMHO, with SD, it's not so important that P2 boot as fast as it is with FLASH.

Once it's all working in final form I can time it, but as I said, the card itself has quite an impact.

I need to think about how the timing will work. The nice thing about SPI flash is that it can be read in a known amount of time, which works well with the serial window.

jmg · 2018-04-10 23:56

evanh wrote: »

Eg: The SD option is not likely to be sharing with another Flash SPI part, it would be just the SD card only. If that SD card is removed then the equipment will not boot until the SD card is reinserted.

I'm not certain about that 'not likely' part...
Hobbyists might tolerate SD card insert, and having a dead-duck system with removed/faulty SD, but more serious industrial users are likely to want SD and SPI Flash.
eg SD can be used for logging and updates, but mission critical code comes from SPI.
Key difference : The system is still operational with removed/faulty SD.

P2 SD Boot Code (v32+)

Comments