P2 SD BOOT ROM v2 (for P2 respin Feb 2019) +exFAT trial

Cluso99 · 2019-02-06 04:59

I have just been doing some experiments with SD boot times.

These times are a little higher than the real code will be, since I have a small section of debugging code enabled.

These times are with a Mizo 8GB SDHC U1 (Class 10) and FAT32 and loading the small boot file "_BOOT_P2.BIX" which is located in the first ~60 directory entries. I have estimated the rcfast at 23MHz.

All cards do vary considerably.

I have included warm boot results. A cold boot means the SD card has not been previously initialised since power-up. The SD ROM Booter will only be run from a cold boot. This code has been downloaded first before running. I cannot tell if the cold code will run any slower from the physical ROM due to the SD having power applied for a shorter time before initialisation.

                COG         HUB         %faster
Cold boot       78.3ms      85.3ms       8.2%            
Warm boot       23.4ms      30.7ms      23.7%

From this, you will see there isn't much benefit of having the SD Boot ROM run its' code from COG versus HUB. I haven't included the copy from hub to cog but it will be minimal.

Note: The new SD Boot ROM is not expected to support calling these routines from user programs. This also means the Monitor will no longer support loading or running files from SD.

Cluso99 · 2019-02-06 05:16

These are from a recently purchased and the lowest size from reputable retailers - a SanDisk Ultra 16GB SDHC U1 C10
Note the much slower cold boot times!!!

                COG         HUB         %faster
Cold boot      151.3ms     158.5ms       4.5%            
Warm boot       33.2ms      40.4ms      17.8%

msrobots · 2019-02-06 05:16

I strongly disagree with that, please do NOT do that.

It is perfectly OK like in the current ROM

Mike

cgracey · 2019-02-06 05:28

I asked Cluso99 to focus on minimizing the boot time over having the routines be callable at any speed. All we need is a fast boot time. It looks to me like running the boot code from the hub is quite sufficient, speed-wise. Doing so also simplifies the interface back to my booter program, to which control may return.

Cluso99 · 2019-02-06 05:28

This is what has been requested.

I disagree (with moving to COG/LUT to speed up the code) which is why I am posting the timing, because moving the routine(s) to COG/LUT do not translate to the benefits expected.

For reference, here is what I sent to Chip today for confirmation. I am sure he won't mind me posting this.

You can discuss this here if you wish.

1. Will no longer be re-entrant (ie callable from a users program). This means the Monitor will NOT be able to load or run a file from SD.

2. Part or all the SD Boot code will load into COG/LUT to make booting faster. I am working on timing the difference as much of the delay is caused by the SD card itself.

3. Old standard SD cards will not be supported.

    a. pre SDHC cards

    b. cards using byte addressing for sectors

4. Cards not formatted in FAT32 (eg FAT16, exFAT, etc) can only use the MBR boot method

5. Cards that cannot validly boot will return to your serial loader

There has been discussion about clock frequency, clock mode, crystal frequency and serial baud being at certain hub positions and used for faster booting.

IMHO this can be done using dual boot code and therefore not impact the ROM code. For instance, for SD it can either boot a program from MBR or VOL sector(s) or FAT32 files _BOOT_P2.BIX or _BOOT_P2.BIY.

Any of these can be a short program that knows, and can switch to faster loading, and can preset any desired locations with parameters, and load another file/location determined by this program.

So, the SD Booter does not need to know any of these things in advance, and they can be changed by software because they are not fixed in ROM.

Is this OK, and have I missed anything?

As far as I can tell, most/all are in agreement to remove support for older SD cards (prior to SDHC and those SD & SDHC using the old byte mode sector addressing). SDXC cards will be supported. Only cards formatted with FAT32 will support file load/run plus MBR & VOL load/run, while those that are not FAT32 formatted will only support MBR load/run (eg exFAT which is patented).

Cluso99 · 2019-02-06 05:33

For reference, here is the SD Boot sequence

1. MBR sector 0: Offset $17C = "Prop" then load and JMP $080
2.                             "ProP" then $174 = sector to load, $178 = bytes to load, JMP $000
If FAT32 continue (exFAT = stop)
3. VOL sector x: Offset $17C = "Prop" then load and JMP $080
4.                             "ProP" then $174 = sector to load, $178 = bytes to load, JMP $000
5. Search DIR = "_BOOT_P2.BIX" and if found, load file data sector(s) for file size bytes, JMP $000
5.              "_BOOT_P2.BIY" and if found, load file data sector(s) for file size bytes, JMP $000
else stop.

cgracey · 2019-02-06 05:34

Looks good, Cluso99, even ideal, to me.

Msrobots, what were you not liking? Was it not being able to run the booter from the monitor? Or not being able to call the boot routine?

Cluso99 · 2019-02-06 05:42

Chip,

Moving the SD Booter to COG/LUT doesn't give much gain, and removes being able to call the SD Booter from a user program (not very likely as would be preferable to load full-blown support anyway). But this also means that the Monitor cannot load and/or run files from a FAT32 formatted card either - currently it can.

Your thoughts???

Tubular · 2019-02-06 05:46

7msec faster vs hub / program callability and promote code re-use? Definitely go the hub option.

and... thanks for your efforts again Cluso

cgracey · 2019-02-06 05:58

Cluso99 wrote: »

Chip,

Moving the SD Booter to COG/LUT doesn't give much gain, and removes being able to call the SD Booter from a user program (not very likely as would be preferable to load full-blown support anyway). But this also means that the Monitor cannot load and/or run files from a FAT32 formatted card either - currently it can.

Your thoughts???

I think the question is whether it should run in the cog or in the hub, right? I think running in the hub is much better, seeing that there is only a little speed penalty. There is a chance that you will return to my booter, correct? And you only use a few cog registers for your code. That leaves my code intact in the cog, right, so that it can be returned to and my booter can attempt a serial connection?

Peter Jakacki · 2019-02-06 06:23

Ok, running RCFAST is fine but why can't we have the option of specifying the clock config word in the MBR? If I go into production with a particular board that uses a 12MHz oscillator like the P2D2 does now, and I want to be able to have that boot from the SD card "without" "having" to go through a 2nd stage loader.

For instance, on the P2D2 $0100_0EFB is the value I can HUBSET with to change the clock to exactly what I need which in this case to run at 180MHZ.
At RCFAST speeds this is the best I can get with multiblock reads (it's an old Sandisk card, not the best).

TAQOZ# .SPEEDS --- 
    SECTOR READ SPEEDS.............. 2602us,2752us,2746us,2750us,2746us,2732us,2750us,2644us,
    BLOCK READ RATE................. 235kB/second @22,400,000Hz  ok

But at 180MHZ:

TAQOZ# .SPEEDS --- 
    SECTOR READ SPEEDS.............. 503us,645us,629us,647us,628us,624us,641us,533us,
    BLOCK READ RATE................. 1,889kB/second @180MHz ok

So this is already 8 times faster and after card initialization a 64kB file will load in 35ms rather than 280ms. All it needs is for the SD loader to switch from RCFAST to the specified clock mode. It doesn't even have to think, all it has to do is read the value and HUBSET. If the value is zero it will stay in RCFAST (hubset #0 == RCFAST).

BTW, I haven't updated my V2 images yet but I am adding the final touches to the built-in SD formatter that can format an SD perfectly and compliantly, excepting of course that you can format 64GB cards and up with FAT32. Also there are some extra options for BACKUP:
BACKUP <filename> --- backup to the specified file if found (creates a file if needed)
BACKUP BIX --- backup to _BOOT_P2.BIX
BACKUP MBR --- backup to sector 1 and set MBR signature (FAT32 not required)
BACKUP FLASH --- backup to serial Flash (^R restores but does not boot yet, needs 2nd stage loader or new ROM)

Cluso99 · 2019-02-06 06:25

cgracey wrote: »

Cluso99 wrote: »

Chip,

Moving the SD Booter to COG/LUT doesn't give much gain, and removes being able to call the SD Booter from a user program (not very likely as would be preferable to load full-blown support anyway). But this also means that the Monitor cannot load and/or run files from a FAT32 formatted card either - currently it can.

Your thoughts???

I think the question is whether it should run in the cog or in the hub, right? I think running in the hub is much better, seeing that there is only a little speed penalty. There is a chance that you will return to my booter, correct? And you only use a few cog registers for your code. That leaves my code intact in the cog, right, so that it can be returned to and my booter can attempt a serial connection?

Yes, it currently does this and runs in hub. So, that is precisely what it does now

I currently jump to "try_serial" if the SD boot fails, and your code is intact as you freed up the cog space required for the variables $1C0-$1EF.
The Monitor overwrites your boot code at $FC000-$FC0FF for the serial read buffer (which you said I could use).

So it's only a minor tweek as I found a few instructions to shave, and by removing support for the older cards (which haven't been sold for years anyway) shaves a few more. It's amazing when you look at code again, what you can shave.

Peter Jakacki · 2019-02-06 06:31

Alternatively, just pass control to TAQOZ and I can use my fast boot method,

Cluso99 · 2019-02-06 07:06

Peter Jakacki wrote: »
Ok, running RCFAST is fine but why can't we have the option of specifying the clock config word in the MBR? If I go into production with a particular board that uses a 12MHz oscillator like the P2D2 does now, and I want to be able to have that boot from the SD card "without" "having" to go through a 2nd stage loader.

For instance, on the P2D2 $0100_0EFB is the value I can HUBSET with to change the clock to exactly what I need which in this case to run at 180MHZ.
At RCFAST speeds this is the best I can get with multiblock reads (it's an old Sandisk card, not the best).
TAQOZ# .SPEEDS --- 
    SECTOR READ SPEEDS.............. 2602us,2752us,2746us,2750us,2746us,2732us,2750us,2644us,
    BLOCK READ RATE................. 235kB/second @22,400,000Hz  ok
But at 180MHZ:
TAQOZ# .SPEEDS --- 
    SECTOR READ SPEEDS.............. 503us,645us,629us,647us,628us,624us,641us,533us,
    BLOCK READ RATE................. 1,889kB/second @180MHz ok
So this is already 8 times faster and after card initialization a 64kB file will load in 35ms rather than 280ms. All it needs is for the SD loader to switch from RCFAST to the specified clock mode. It doesn't even have to think, all it has to do is read the value and HUBSET. If the value is zero it will stay in RCFAST (hubset #0 == RCFAST).

For the SanDisk Ultra 16GB SDHC U1 C10 (the smallest and cheapest at Officeworks) It will take ~158ms to boot a single sector file, so maybe its ~150ms to boot the MBR record. Now we switch speeds to 180MHz and load your 64KB file in a further 35ms. So the total is ~185ms.

You still need to write to the MBR, so why don't you just write a quick loader into the MBR that will switch clocks and load your 64KB file? You will still need to search the directory and load a file, so maybe that will not take 8ms, but say 2ms.
I now have to ensure that the MBR clockmode is also valid.

But if you are willing to sacrifice that 8ms less 2ms (ie 6ms) in a total of say 185ms, then you can have _BOOT_P2.BIX be a short file to switch up the frequency and load/run a new file. No special writes to the MBR either.

All this to save ~6ms in a total 185ms.

BTW the 2 stage loader can use a generic _BOOT_P2.BIX first stage loader that can be published (with variants) already in compiled binary form, just waiting to be copied to an SD card.

Just my 2c

BTW, I haven't update my V2 images yet but I adding the final touches to the built-in SD formatter that can format an SD perfectly and compliantly, excepting of course that you can format 64GB cards and up with FAT32. Also there are some extra options for BACKUP:
BACKUP <filename> --- backup to the specified file if found (creates a file if needed)
BACKUP BIX --- backup to _BOOT_P2.BIX
BACKUP MBR --- backup to sector 1 and set MBR signature (FAT32 not required)
BACKUP FLASH --- backup to serial Flash (^R restores but does not boot yet, needs 2nd stage loader or new ROM)

rogloh · 2019-02-06 07:08

Is there going to be a P2 feature that TAQOZ can be entered if all other boot methods timeout without requiring some serial interaction to trigger it? For example using some pin high/low setting to enable this case, or is that going to risk/upset the existing boot sequence logic too much?

I think Peter had talked about this idea in the past. Is it far too dangerous to add at this point? Maybe it doesn't add so much benefit if there are no default input/output device pins pre-allocated, but the general thought was it could allow a standalone system to boot without even needing a flash/sd device fitted or even a serial console present to trigger entering it from reset.

Perhaps the lack of a serial console to enter TAQOZ is a step too far... as what would the P2 then do without some default IO channel to interact with and control it?

Cluso99 · 2019-02-06 07:16

Ouch!!!

I just ran the SD Boot test from hub up to reading the MBR on a SanDisk Ultra 64GB SDHA U1 C10 which comes formatted with exFAT.

                COG         HUB         %faster
Cold boot                  236.5ms

Peter Jakacki · 2019-02-06 07:21

Cluso99 wrote: »
Ouch!!!

I just ran the SD Boot test from hub up to reading the MBR on a SanDisk Ultra 64GB SDHA U1 C10 which comes formatted with exFAT.
                COG         HUB         %faster
Cold boot                  236.5ms                   

And this is before you actually load the program which we could assume for the moment would typically be in the 64k to 128k range. So we need figures not just for being ready to load the boot code, but fully loaded ready to run. TAQOZ may be optional for some, but all of us will depend upon this bootloader, so it needs to be flexible.

Cluso99 · 2019-02-06 07:28

Peter Jakacki wrote: »
Cluso99 wrote: »
Ouch!!!

I just ran the SD Boot test from hub up to reading the MBR on a SanDisk Ultra 64GB SDHA U1 C10 which comes formatted with exFAT.
                COG         HUB         %faster
Cold boot                  236.5ms                   
And this is before you actually load the program which we could assume for the moment would typically be in the 64k to 128k range. So we need figures not just for being ready to load the boot code, but fully loaded ready to run. TAQOZ may be optional for some, but all of us will depend upon this bootloader, so it needs to be flexible.

Agreed. And it is flexible already.

The load time is mostly dependant upon the SD card used.

Peter Jakacki · 2019-02-06 07:44

Cluso99 wrote: »

Agreed. And it is flexible already.

The load time is mostly dependant upon the SD card used.

Well, call this a user request if you like, but for the sake of reading one location and performing a hubset, that is all you need to add. That is being flexible.
I don't understand why you wouldn't run the P2 faster because once it eventually boots, it will be switched to a much higher frequency in almost all cases.

The other user request is simply this, if serial boot fails, if Flash boot fails, if SD boot fails, then pass control to TAQOZ and it can do its own boot tests etc. If there is nothing to boot and no serial terminal etc then it can do a shutdown.

VonSzarvas · 2019-02-06 09:47

Cluso99 wrote: »

... which comes formatted with exFAT.

The load time is mostly dependant upon the SD card used.

Or dependant upon the "format" used ?

If you reformat that SD card from exFat to FAT32, does that change the boot test timings significantly ?

msrobots · 2019-02-06 10:20

It would be very nice if the first 3(2) longs of a binary would contain clockmode and clockspeed.

And still asking if there will be a way to start TAQOZ with a mailbox instead pins 63/62, or at least 2 other pins with the ability to set the RX mode to listen to another pin?

Enjoy!

Mike

Cluso99 · 2019-02-06 11:18

Peter Jakacki wrote: »

Cluso99 wrote: »

Agreed. And it is flexible already.

The load time is mostly dependant upon the SD card used.

Well, call this a user request if you like, but for the sake of reading one location and performing a hubset, that is all you need to add. That is being flexible.
I don't understand why you wouldn't run the P2 faster because once it eventually boots, it will be switched to a much higher frequency in almost all cases.

The new clockmode long in the MBR has to be validated. We already have two possible validations. It also will not be backward compatible with the current P2-ES chips although this may not matter.

But i say again, the maximun saving over the current supported methods is ~6ms out of a total 85ms/158ms/236ms/etc depending on which SD card you use. A two stage boot is simple and foolproof and is currently a supported option. I am not being difficult, i am just unconvinced.

Chip, what do you think?

The other user request is simply this, if serial boot fails, if Flash boot fails, if SD boot fails, then pass control to TAQOZ and it can do its own boot tests etc. If there is nothing to boot and no serial terminal etc then it can do a shutdown.

I pass control back to try_serial as requested by Chip.

You will need to get Chip to change his serial code to pass to TAQOZ if it times out.

I am happy with that provided there is no pulldown on P59 which is a lockout of serial as you requested. Though, equally, it is quite simple to enter the 5 character serial sequence to go to TAQOZ.

Cluso99 · 2019-02-06 11:24

VonSzarvas wrote: »

Cluso99 wrote: »

... which comes formatted with exFAT.

The load time is mostly dependant upon the SD card used.

Or dependant upon the "format" used ?

If you reformat that SD card from exFat to FAT32, does that change the boot test timings significantly ?

There is little difference in load time between MBR, VOL and FAT32 options. exFAT requires the MBR option. The FAT32 is marginally slower because the directory tree has to be searched to find the files location, so a number of extra sectors need to be read over and above the MBR (and VOL).

By little, maybe 1-3ms which is nothing compared to card choice.

Cluso99 · 2019-02-06 11:30

msrobots wrote: »

It would be very nice if the first 3(2) longs of a binary would contain clockmode and clockspeed.

You can do this now by placing a jump over the clock longs as the very first instruction. Then your code just has to use it. No changes necessary.

And still asking if there will be a way to start TAQOZ with a mailbox instead pins 63/62, or at least 2 other pins with the ability to set the RX mode to listen to another pin?

Enjoy!

Mike

Either you patch the code or TAQOZ patches its’ code. We are way short on ROM space so my guess is that it’s not going to happen in the TAQOZ ROM.

samuell · 2019-02-06 12:59

rogloh wrote: »

Is there going to be a P2 feature that TAQOZ can be entered if all other boot methods timeout without requiring some serial interaction to trigger it? For example using some pin high/low setting to enable this case, or is that going to risk/upset the existing boot sequence logic too much?

I think Peter had talked about this idea in the past. Is it far too dangerous to add at this point? Maybe it doesn't add so much benefit if there are no default input/output device pins pre-allocated, but the general thought was it could allow a standalone system to boot without even needing a flash/sd device fitted or even a serial console present to trigger entering it from reset.

Perhaps the lack of a serial console to enter TAQOZ is a step too far... as what would the P2 then do without some default IO channel to interact with and control it?

The current method is fine. Nobody wants TAQOZ showing up in a terminal, because the firmware failed to boot. If that happened, it would be disastrous, IMHO. I rather prefer a blank terminal, and pressing "> [Enter]" to call it if I choose so.

Plus, TAQOZ will send signals and use IOs that potentially could be used for other functions. That would trigger devices connected to those pins. You don't want that either. Anyway, it is just a matter of pressing three keys to call TAQOZ. Over-simplification ruins things, like M$ does.

Kind regards, Samuel Lourenço

Peter Jakacki · 2019-02-06 13:08

I like to be flexible so TAQOZ will have a mailbox feature. I normally use hub memory for this since TAQOZ will be sitting in the first 64kB anyway plus it won't use up any smartpins.

@Cluso99 - what size boot file are you talking about when you quote boot times? It seems as if you are only quoting SD initialization because a 64kB "load" (after init) still takes 280ms at RCFAST even with TAQOZ doing a fast multiblock read. This is where having a simple clock long in the MBR (that can also easily be validated next to a copy of itself) is all that is required to make the difference between slow or fast boot. Initially TAQOZ was going to handle the SD boot, and this would have been one of the options.

@Chip - since this is an option it doesn't affect normal RCFAST boot, but if the user sets the boot clock long in the MBR, then that is up to the user to validate, just as the actual boot image is up to the user to validate. If the user has a bad boot image there is nothing the booter can do anyway whereas the clock config word can easily be validated as mentioned.

Now the other thing is, is there any reason why the final boot stage can't fall through to TAQOZ? At the very least when you power up your new P2 it can come to life and the user can even format or check the SD card and hardware etc.

msrobots · 2019-02-06 14:05

Peter Jakacki wrote: »

I like to be flexible so TAQOZ will have a mailbox feature. I normally use hub memory for this since TAQOZ will be sitting in the first 64kB anyway plus it won't use up any smartpins.

@Cluso99 - what size boot file are you talking about when you quote boot times? It seems as if you are only quoting SD initialization because a 64kB "load" (after init) still takes 280ms at RCFAST even with TAQOZ doing a fast multiblock read. This is where having a simple clock long in the MBR (that can also easily be validated next to a copy of itself) is all that is required to make the difference between slow or fast boot. Initially TAQOZ was going to handle the SD boot, and this would have been one of the options.

@Chip - since this is an option it doesn't affect normal RCFAST boot, but if the user sets the boot clock long in the MBR, then that is up to the user to validate, just as the actual boot image is up to the user to validate. If the user has a bad boot image there is nothing the booter can do anyway whereas the clock config word can easily be validated as mentioned.

Now the other thing is, is there any reason why the final boot stage can't fall through to TAQOZ? At the very least when you power up your new P2 it can come to life and the user can even format or check the SD card and hardware etc.

Yeah, finally.

It is perfectly OK if the mailbox is in HUB not ROM.

It is just needed to jmp COG0 (or even any COG?) into some ROM address, so it can load TAQOZ into the lower 64K and TAQOZ uses a Mailbox instead Smartpins. This would help a lot to use TAQOZ as a smart IO cog from other languages.

Where you place the mail box does not matter as long as it is a fixed address after loading TAQOZ from ROM. Then any program can find it.

Thanks for considering this.

Mike

Cluso99 · 2019-02-06 14:21

Peter,

My SD times are for loading a file less than 2KB. So as you can see, those significant times are all to do with initialising the SD card. Nothing can be done to shorten these times. There is no miraculous time benefit in running at 180MHz or even 3GHz for that matter. It is what the SD card takes.

That <2KB file can contain a complete SD Boot code loader running at whatever crystal/clock you desire, and load/run another file (say _BOOT_P2.BIZ) at full speed. This is known as a 2 stage loader. Very little time is lost using this method over using a value in the MBR to set a higher clock frequency (maybe 6ms) over anything from 80-250+ms plus the actual file load time which Peter suggested was 35ms for 64KB (i havent timed this part).
And this works now with the current ROM.

BTW I hadn’t seen a valid reason for having a 2 stage loader for the Flash. But this is a reason to have it.

Electrodude · 2019-02-06 15:22

But why bother with a two-stage loader when adding one simple feature would allow you to efficiently get away with only a single stage? How hard can it be? You've already loaded the MBR to look for the ProP or Prop signature - just add a simple test to determine if it should do the hubset, and then call hubset if the test succeeds. A backwards-compatible way to do the test could be to make the capitalization of the "o" in the signature denote that it should do hubset: if the signature is PrOp or PrOP, then hubset a certain long of the MBR.

cgracey · 2019-02-06 16:10

Cluso99, I like the way you have it, already. I don't think it is a good idea to switch the clock in your code. Sure, you could make the chip run faster, based on some custom clock setting, but now your code has to accommodate a huge variance of clock rates, which is completely impractical. It's better to assume ~24MHz and allow second-stage code to do something unique at higher clock speeds.

I also don't want to fall back into TAQOZ if it means twiddling a bunch of pins, and causing surprise behavior on people's hardware. That would be a disaster, as Samuell pointed out.

cgracey · 2019-02-06 18:49

Peter, can you make a case for falling back to TAQOZ when all else fails?

I kind of like the idea of just shutting down if nothing boots.

P2 SD BOOT ROM v2 (for P2 respin Feb 2019) +exFAT trial

Comments