Bootloader

pedward · 2013-04-22 11:15

I'm starting to work on the second stage bootloader that runs from flash.

I'm planning on having the bootloader and supporting data use addresses $0-$FFF in flash, to mimic the memory layout of the ROM and mailboxes of the P2. This way $1000-$1FFFF in Flash will be the hub memory image.

The $0-$7FF are the bootloader which ROM_Booter reads and authenticates. $800-$80F will be the AES-128 CBC IV.

The bootloader will save the fuse bits and read $1000-$1FFFF into the corresponding HUB memory locations.

Next it will initialize the IV and decrypt addresses $1000-$1FFFF in place.

It seems to me that it would be worthwhile to include a CRC-32 (ANSI X3.66) checksum of the encrypted contents at location $810, the bootloader will compare this to the checksum it calculates, if they match, it will decrypt HUB memory and do a COGINIT at location $1000.

The checksum helps to ensure the data is valid and prevent weird runaway conditions in the event the FLASH is corrupted.

The other option is to do yet another HMAC authentication of the ciphertext to ensure it's not corrupted or tainted. The presence of AES-128 alone should be sufficient to ensure authenticity, because you will have a GIGO condition otherwise.

Here's why I think a CRC-32 will be sufficient:

If you don't know the secret key, you will need to produce a ciphertext, that when decrypted with the secret key will produce some malicious, yet useful result.

Furthermore, to gauge success you need to be able to debug the memory contents to determine what input bytes result in a selected output byte.

This is effectively the same as "breaking" the algorithm and getting the key, I just don't see any possible way to craft a malicious payload by the blind trial method.

Presumably the target is the ciphertext, not the hardware which the ciphertext is running on. You would need physical ownership of the device to compromise it, of which typically most systems are not hardened against, they are only hardened against remote attack or limited physical attack.

David Betz · 2013-04-22 11:28

I assume that the AES-128 code will be part of the image authenticated and loaded by the ROM loader? Do you have any idea how big that code is? I'm wondering if it will be small enough to merge with the rest of my bootloader code. If not, I guess I could run it in another COG but that would mean that the bootloader would only be able to load up to 6 COG drivers before loading the main program.

pedward · 2013-04-22 11:34

David Betz wrote: »

I assume that the AES-128 code will be part of the image authenticated and loaded by the ROM loader? Do you have any idea how big that code is? I'm wondering if it will be small enough to merge with the rest of my bootloader code. If not, I guess I could run it in another COG but that would mean that the bootloader would only be able to load up to 6 COG drivers before loading the main program.

At the least, AES-128 requires 256 bytes for the inverse S-boxes, plus 2 working buffers of 16 bytes. The AES-128 code will be the bulk of the 2nd stage loader footprint, but the rest will be trivial in size I think.

The P2 instructions are around 30% denser, sometimes more, sometimes less. I'm guessing that AES-128 decrypt will take half a COG and the loading and CRC-32 will take maybe 50 longs, so that leaves around 200 longs.

What's your loader doing?

Aside from the onboard SPI flash, I could see SD taking a bunch of code space.

If space becomes tight, I can include the s-boxes as data at the end of the COG and load that into the CLUT after the initial coginit.

David Betz · 2013-04-22 11:40

pedward wrote: »

At the least, AES-128 requires 256 bytes for the inverse S-boxes, plus 2 working buffers of 16 bytes. The AES-128 code will be the bulk of the 2nd stage loader footprint, but the rest will be trivial in size I think.

The P2 instructions are around 30% denser, sometimes more, sometimes less. I'm guessing that AES-128 decrypt will take half a COG and the loading and CRC-32 will take maybe 50 longs, so that leaves around 200 longs.

What's your loader doing?

Aside from the onboard SPI flash, I could see SD taking a bunch of code space.

If space becomes tight, I can include the s-boxes as data at the end of the COG and load that into the CLUT after the initial coginit.

The only thing my loader is trying to do that is a bit tricky is that it allows you to load hub memory from flash and then call coginit to load drivers into COGs. You can then load a full hub memory image for the main program. This has the slight advantage that up to 7 COG images can be loaded without reducing the hub memory available to the main program by 7*2K. I guess this is more of an issue on P1 than P2 since P2 has nearly 128k of COG memory but I still thought being able to save 14K of memory might be useful.

pedward · 2013-04-22 19:40

(continued from the mailbox thread)

We also need a version indicator of some sort. If there are 2 images in flash, and they both validate CRC, then you still need to know which one to boot.

The basic FLASH is 8Mbits, so it could be possible to spare a page for this information?

Perhaps one solution is to have the first sector be the bootloader code.

Then the next 2 128KB chunks would be program1 and program2. The first sector of program1 would contain a version number, AES CBC IV, CRC-32, optionally a SHA-256 hash (for later), then the following $1000-$1FFFF pages.

program2 would start immediately after program1, same format.

The bootloader would random read these 2 pages and look for the highest version number, and a flag set to 0 which indicates a successful write, which is only updated after the full write and read verification.

In practice you would flip-flop between copies, picking the one with the highest version number. If the CRC-32 fails, or the clear-to-zero flag isn't zero, then you load the other image.

The bootloader will need to store a pointer to the image that is currently running, so a user space programming routine will know which image shouldn't be overwritten.

The programming routine first erases all of the sectors associated with the memory being programmed, next the program data is written to flash and verified.

The header sector would contain these items to start:

$000-$003 CRC-32 checksum
$004-$007 Version number
$008-$017 AES-128 CBC IV
$018-$037 SHA-256 HMAC hash (optional for future use, calculated the same as the bootloader)
$038-$039 clear-to-zero success flag to indicate successful write to flash
$040-$E7F reserved for future use
$E80-$FFF HUB memory contents of $E80-FFF encrypted with AES-128 using the same IV at $008

In practice, the HUB contents from $E80-$FFF would be loaded last, then a pointer to the current flash image would be written to the reserved $E8C location, then a COGINIT at $1000 would be executed, with a pointer of $E90 to indicate COG0 mailbox location.

The full map would be:

$00000-$00FFF second stage bootloader
$01000-$01FFF program1 header
$02000-$20FFF program1
$21000-$21FFF program2 header
$22000-$40FFF program2
$42000-$FFFFF free space

David Betz · 2013-04-22 20:12

Looks good to me.

Bill Henning · 2013-04-23 07:50

Looks good, but if it would fit in less, I'd try partitioning the flash as:

$00000-$1FFFF - slot 0
$20000-$3FFFF - slot 1
...
$E0000-$FFFFF - slot 7

Within each slot:

$00000-$007FF bootloader
$00800-$00FFF program header
$01000-$01FFF first 4 k of program
$02000-$1FFFF rest of program / data area for program, potential rest of hub image

Reason for this map: fits nicely in 4Mbit and 8Mbit (or larger) flash chips, can erase 128KB slot with two 64KB erases; it is very useful to line up on "natural" boundaries.

Also the bootloader has to support unsigned binaries if the fuses are not blown

pedward · 2013-04-23 11:30

The primary limitation is the 4K block erase. The bootloader only resides in one location on the chip, $000-$7FF, this is fixed in the ROM. I don't want to change the bootloader when a program is written to flash, because it then exposes the device to a situation where it is unrecoverable if a write fails.

Since there is only 1 bootloader needed, and you can't erase less than 1 sector, I put the bootloader in a sector by itself. The security bits of the FLASH can be used to protect that bootloader too, to avoid accidental overwrite.

Each program image needs meta data that is unique to that image, so I put it in a sector by itself.

It is possible to write a less robust bootloader that doesn't use as much memory in FLASH.

If the chunk erase has limitations on addresses (modulus), then I will revisit the locations to put them on boundaries, at the expense of wasting more flash. I was planning for the programmer program, embedded or remote, to erase and program 4K sectors at a time: erase, program, verify because it requires the smallest memory footprint and allows for aborting a failed write cycle before committing too far.

Bootloader

Comments