I agree that the blocks wouldn't need to be contiguous. They just need to be easy to identify.
Chip,
I thought that with contiguous data located anywhere (reserved space, reserved partition, ...) you have just to read a starting pointer in the MBR (so well known absolute location) and start streaming-in the firmware image by reading raw contiguous data (better if you can choose the lenght dynamically) thus filling the hub. This way you haven't to know nothing about the file system, you haven't to track the next block and the code is simpler/shorter, it becomes like spi flash boot (after the SD init).
It will also allow the system designer to have, in the so loaded firmware, the file system support for any other kind of file system. Who says that the SD should be FAT formatted? Maybe someone want to explore other possibilities eg ext3.
If the firmware resides in the file system that that do not mean it is simpler for the end user because if you want a cut&paste (drag) operation from windows/linux (...) the file will not be contiguous and the boot code in the rom will need to track deeply the file system internals thus increasing in size and complexity (so it has a duty, also, to comply with the standards).
BTW: I love the monitor option and I am not advocating the SD boot against the monitor. I'll like to have both: a simple (minimal) monitor and the simplest raw data SD boot (even if released as reserved (initially undocumented) feature)
PEDIT: PS. I agree with the idea of the rom on bottom
... This way you haven't to know nothing about the file system, you haven't to track the next block ...
The non-contiguous option doesn't know anything about the filesystem during it's loading. The full list of blocks to load is stored in the initial block(s) loaded. And, vice versa, the filesystem is not aware of the importance of this list either. Hense, why a tool is needed to generate it.
The advantage this method has over a contiguous bootloader file is the bootloader code can be written to the volume with normal filesystem accesses, as a normal file. After it is written the tool is run to generate the list of blocks the bootloader file is occupying. So only this list has to be contiguous. It will easily fit in a single block so no problem there.
The non-contiguous option doesn't know anything about the filesystem during it's loading. The full list of blocks to load is stored in the initial block(s) loaded. And, vice versa, the filesystem is not aware of the importance of this list either. Hense, why a tool is needed to generate it.
The advantage this method has over a contiguous bootloader file is the bootloader code can be written to the volume with normal filesystem accesses, as a normal file. After it is written the tool is run to generate the list of blocks the bootloader file is occupying. So only this list has to be contiguous. It will easily fit in a single block so no problem there.
The non-contiguous option doesn't know anything about the filesystem during it's loading. The full list of blocks to load is stored in the initial block(s) loaded. And, vice versa, the filesystem is not aware of the importance of this list either. Hense, why a tool is needed to generate it.
This is just to complicate the rom built-in loader routine
The advantage this method has over a contiguous bootloader file is the bootloader code can be written to the volume with normal filesystem accesses, as a normal file. After it is written the tool is run to generate the list of blocks the bootloader file is occupying. So only this list has to be contiguous. It will easily fit in a single block so no problem there.
Where is the difference: you have to use a tool to write the list of blocks (in other words to track the file fragmentation on media) in the common contiguous area (even if just in one block). When you overwrite the firmware using win/linux/... you will have to use the utility again to rewrite the new fragmentation list because the new file will be allocated differently on the media.
So if you have to use an utility than it's the same if you use it to write the contiguous image somewhere (better in an unformatted, reserved, hidden partition, declared in the MBR so other OS will stay away). This will decrease the P2 ROM code complexity and size, KISS principle.
EDIT: This post is in response to previous page of postings.
Does no one read what is posted?
These methods are not in the ROM. The ROM code knows nothing of any of these methods. The ROM code knows nothing of any filesystem, FAT or otherwise.
The only thing the ROM knows about is a pointer, or two, in the first block. From that it loads maybe a few consecutive blocks and runs them. That's it.
The rest is up to those few blocks ... which might just simply continue loading more consecutive blocks.
A tool is required no matter what. Given the small size of ROM that is.
EDIT2: Here's a repost of some example methods that are supported (You can replace my naming of FAT with a filesystem of your choice):
Off the top of my head for SD card booting there can be support for:
- Prebuilt image written to raw blocks. No partition, no volume. Everything contained in the image. This is a direct alternative to a SPI flash.
- Prebuilt image written to raw blocks within custom partition. Ditto for SPI flash.
- Bootloader written to raw blocks within custom partition. This contains filesystem handler code for booting main program from a second partition containing a filesystem.
- Bootloader written to raw blocks within reserved space of FAT volume. Ditto for filesystem handler code.
- Bootloader written to contiguous file within FAT volume. Ditto for filesystem handler code.
- Bootloader written to non-contiguous file within FAT volume. It has a split loader for handling the distribution of it's blocks or clusters. Ditto for filesystem handler code.
All of those are supportable with the tiny ROM code as planned - simply loading two blocks via a pointer.
With ROM mapped low, does that mean cog addresses for the range fill are implied below $1ff? Actually, just how are COG addresses specified? Thinking about the range fill dumped into OUTA, for example.
Will the monitor be a security hole? Or does it only work when the fuse bits are equal to 0?
As a guess, I presume it'll only run the monitor if there is no boot code found anywhere. In such a case there is no decrypted code loaded to pry into.
Yeah Kye, Chip said it's an option when fuse bits are zero. Doesn't make sense otherwise as somebody could then just inject unauthorized code and run it on a device where that's not intended.
I agree with ROM at bottom too, BTW. There are a few pointers possible, so when it's really needed, "zero page" is still on the table; otherwise, all the new P2 ways of doing things are there to be explored and optimized. I like the idea of being able to declare big chunks of RAM with no mask and offset and think that's worth a more limited Z-page capability personally.
Looks to me like not obtaining a license might be defensible now. Enter that exFAT standard... Probably no breaking that one, but in micro-land it's not going to be that big of an issue for a while, just because there are a ton of existing devices, and standard FAT32 could likely be used with them in various non-infringing ways anyway.
There are a few pointers possible, so when it's really needed, "zero page" is still on the table; otherwise, all the new P2 ways of doing things are there to be explored and optimized.
I don't think there is a (register+PTR) address mode. I'm pretty certain the zero page is still useful as a result.
I like the idea of being able to declare big chunks of RAM with no mask and offset and think that's worth a more limited Z-page capability personally.
The big chunks idea can be allowed for with ROM after RAM by defining the ROM at the very end of 32 bit address space. The existing Prop2 design allows for this view due to it's high address bits not being decoded.
I don't think there is a (register+PTR) address mode. I'm pretty certain the zero page is still useful as a result.
The big chunks idea can be allowed for with ROM after RAM by defining the ROM at the very end of 32 bit address space. The existing Prop2 design allows for this view due to it's high address bits not being decoded.
There is no more RDxxxx reg,#imm, so there's no more direct hub addressing. There is only RDxxxx reg,PTRx[-32..+31]. Since we aren't going to have lots of 'zero-page' memory, I re-purposed the bit that toggled #/PTRx mode and gave PTRx one more bit of offset range (was -16..+15).
What is this Monitor? (Please don't ask me to read this whole thread!).
Is the idea that you'll boot from flash and then start up this monitor in cog to supervise the running program?
Or, is this just the mode you're in when SPI boot fails?
Rayman, you can click on the ">>" icon in the quoted text to go to the orginal post. This way you don't have to search through all 57 pages of this thread to find it.
What is this Monitor? (Please don't ask me to read this whole thread!).
Is the idea that you'll boot from flash and then start up this monitor in cog to supervise the running program?
Or, is this just the mode you're in when SPI boot fails?
If all boots fail, the chip goes into monitor mode (assuming the security key is still 0). In monitor mode, you can have a conversation with the chip where you can look at memory, write memory, start and stop cogs, and so on. The monitor will always be availble for general run-time use, too, via COGNEW(16, rx_pin<<8 | tx_pin). This is actually the booter routine, which on reset gets 0 for the parameter, so it knows to boot, not be a monitor. Someone pointed out last night that this facilitates wireless loading, which is serendipitous.
Monitor, HEX file loading. You mean like Intel Hex with checksums and all?
I was just thinking streaming hex data. You'd do a command '>LH 800 1FFFF', and then stream out enough hex characters to fill that range. No spaces or returns required. No checksum's either, but this is a simple mode, unlike the real boot sequence. If someone cares about checksums, I could add a checksum command '>CS 800 1FFFF' that would report back a 32-bit sum of bytes in a range. Does this sound okay?
I was just thinking streaming hex data. You'd do a command '>LH 800 1FFFF', and then stream out enough hex characters to fill that range. No spaces or returns required. No checksum's either, but this is a simple mode, unlike the real boot sequence. If someone cares about checksums, I could add a checksum command '>CS 800 1FFFF' that would report back a 32-bit sum of bytes in a range. Does this sound okay?
If you add a per word delimiter like $D someone could make a switchboard and program P2 like a PDP-8.
That would also make the protocol more reliable and allow for short words.
Chip,
Not sure I was going in any direction with my Intel Hex comment. Your checksum idea sounds OK to me. As a minimum we should have some confidence that the code was correctly loaded before it is run.
As you now have some serious security hashing in the ROM the checksum could use those routines perhaps?
It is probably too late, however if the 'U' update bit is not set (ie indexed only) the offset could be -64..+63 by using the 'P' bit as an extra index bit, which would be very useful for accessing more local variables for C
There is no more RDxxxx reg,#imm, so there's no more direct hub addressing. There is only RDxxxx reg,PTRx[-32..+31]. Since we aren't going to have lots of 'zero-page' memory, I re-purposed the bit that toggled #/PTRx mode and gave PTRx one more bit of offset range (was -16..+15).
I was just thinking streaming hex data. You'd do a command '>LH 800 1FFFF', and then stream out enough hex characters to fill that range. No spaces or returns required. No checksum's either, but this is a simple mode, unlike the real boot sequence. If someone cares about checksums, I could add a checksum command '>CS 800 1FFFF' that would report back a 32-bit sum of bytes in a range. Does this sound okay?
with rom down there below $100 things will be less easy to port from P1 to P2. Like jmg said.
I guess I have to state that it is impossible to do so, to get it done. And then cluso or somebody else will write a P1 emulator for the P2 ...
Surely porting P1 code to P2 should be something easy to do, not something that needs manual work-overs & 'impossible' tags ?
To me, the ideal is a sufficiently smart Assembler, that it can create P1 or P2 binaries, from a common (subset) source.
ie any existing library, should need the smallest work, to target both P1 and P2.
From what I can see, many opcodes have direct clones, and others have clear alternatives.
That would allow all the valuable library work, to be quickly available on Prop2.
Of course it can be further optimised for Prop 2, but even there a good Assembler should allow a single source file, and use conditionals to do more optimised work.
If so could you then run secured code, disable spi flash,
reset, burn all fuses, reset, run monitor
and then you could see secrete program code?
I doubt it. Once the fuses are set, you can't burn more. Keep in mind though, ROM code to clear all RAM would be extremely fast/simple to write; probably two longs.
Comments
Chip,
I thought that with contiguous data located anywhere (reserved space, reserved partition, ...) you have just to read a starting pointer in the MBR (so well known absolute location) and start streaming-in the firmware image by reading raw contiguous data (better if you can choose the lenght dynamically) thus filling the hub. This way you haven't to know nothing about the file system, you haven't to track the next block and the code is simpler/shorter, it becomes like spi flash boot (after the SD init).
It will also allow the system designer to have, in the so loaded firmware, the file system support for any other kind of file system. Who says that the SD should be FAT formatted? Maybe someone want to explore other possibilities eg ext3.
If the firmware resides in the file system that that do not mean it is simpler for the end user because if you want a cut&paste (drag) operation from windows/linux (...) the file will not be contiguous and the boot code in the rom will need to track deeply the file system internals thus increasing in size and complexity (so it has a duty, also, to comply with the standards).
BTW: I love the monitor option and I am not advocating the SD boot against the monitor. I'll like to have both: a simple (minimal) monitor and the simplest raw data SD boot (even if released as reserved (initially undocumented) feature)
PEDIT: PS. I agree with the idea of the rom on bottom
The non-contiguous option doesn't know anything about the filesystem during it's loading. The full list of blocks to load is stored in the initial block(s) loaded. And, vice versa, the filesystem is not aware of the importance of this list either. Hense, why a tool is needed to generate it.
The advantage this method has over a contiguous bootloader file is the bootloader code can be written to the volume with normal filesystem accesses, as a normal file. After it is written the tool is run to generate the list of blocks the bootloader file is occupying. So only this list has to be contiguous. It will easily fit in a single block so no problem there.
There may be some presumption that most people will be using FAT though.
BUT that is totally incompatible with others types of File systems.
And this one that advocate for That ---- Don't know what it talk about.
This thread reminds me of the butterfly effect.
This is just to complicate the rom built-in loader routine
Where is the difference: you have to use a tool to write the list of blocks (in other words to track the file fragmentation on media) in the common contiguous area (even if just in one block). When you overwrite the firmware using win/linux/... you will have to use the utility again to rewrite the new fragmentation list because the new file will be allocated differently on the media.
So if you have to use an utility than it's the same if you use it to write the contiguous image somewhere (better in an unformatted, reserved, hidden partition, declared in the MBR so other OS will stay away). This will decrease the P2 ROM code complexity and size, KISS principle.
Does no one read what is posted?
These methods are not in the ROM. The ROM code knows nothing of any of these methods. The ROM code knows nothing of any filesystem, FAT or otherwise.
The only thing the ROM knows about is a pointer, or two, in the first block. From that it loads maybe a few consecutive blocks and runs them. That's it.
The rest is up to those few blocks ... which might just simply continue loading more consecutive blocks.
A tool is required no matter what. Given the small size of ROM that is.
EDIT2: Here's a repost of some example methods that are supported (You can replace my naming of FAT with a filesystem of your choice):
That monitor command set looks great to me.
Coupla guesses at the commands:
TB, TW, TL = "talk bytes", "talk words", "talk longs?"
The rest of it makes great sense.
With ROM mapped low, does that mean cog addresses for the range fill are implied below $1ff? Actually, just how are COG addresses specified? Thinking about the range fill dumped into OUTA, for example.
As a guess, I presume it'll only run the monitor if there is no boot code found anywhere. In such a case there is no decrypted code loaded to pry into.
I agree with ROM at bottom too, BTW. There are a few pointers possible, so when it's really needed, "zero page" is still on the table; otherwise, all the new P2 ways of doing things are there to be explored and optimized. I like the idea of being able to declare big chunks of RAM with no mask and offset and think that's worth a more limited Z-page capability personally.
http://www.wired.com/wiredenterprise/2012/03/ms-patent/
Looks to me like not obtaining a license might be defensible now. Enter that exFAT standard... Probably no breaking that one, but in micro-land it's not going to be that big of an issue for a while, just because there are a ton of existing devices, and standard FAT32 could likely be used with them in various non-infringing ways anyway.
The big chunks idea can be allowed for with ROM after RAM by defining the ROM at the very end of 32 bit address space. The existing Prop2 design allows for this view due to it's high address bits not being decoded.
There is no more RDxxxx reg,#imm, so there's no more direct hub addressing. There is only RDxxxx reg,PTRx[-32..+31]. Since we aren't going to have lots of 'zero-page' memory, I re-purposed the bit that toggled #/PTRx mode and gave PTRx one more bit of offset range (was -16..+15).
Great ideas! I will implement them.
Is the idea that you'll boot from flash and then start up this monitor in cog to supervise the running program?
Or, is this just the mode you're in when SPI boot fails?
If all boots fail, the chip goes into monitor mode (assuming the security key is still 0). In monitor mode, you can have a conversation with the chip where you can look at memory, write memory, start and stop cogs, and so on. The monitor will always be availble for general run-time use, too, via COGNEW(16, rx_pin<<8 | tx_pin). This is actually the booter routine, which on reset gets 0 for the parameter, so it knows to boot, not be a monitor. Someone pointed out last night that this facilitates wireless loading, which is serendipitous.
I was just thinking streaming hex data. You'd do a command '>LH 800 1FFFF', and then stream out enough hex characters to fill that range. No spaces or returns required. No checksum's either, but this is a simple mode, unlike the real boot sequence. If someone cares about checksums, I could add a checksum command '>CS 800 1FFFF' that would report back a 32-bit sum of bytes in a range. Does this sound okay?
If you add a per word delimiter like $D someone could make a switchboard and program P2 like a PDP-8.
That would also make the protocol more reliable and allow for short words.
The monitor loads only if security is NOT enabled.
Not sure I was going in any direction with my Intel Hex comment. Your checksum idea sounds OK to me. As a minimum we should have some confidence that the code was correctly loaded before it is run.
As you now have some serious security hashing in the ROM the checksum could use those routines perhaps?
It may be useful to add commands for reading/writing PINn and DIRn (if there is room in the rom)
reset, burn all fuses, reset, run monitor
and then you could see secrete program code?
Surely porting P1 code to P2 should be something easy to do, not something that needs manual work-overs & 'impossible' tags ?
To me, the ideal is a sufficiently smart Assembler, that it can create P1 or P2 binaries, from a common (subset) source.
ie any existing library, should need the smallest work, to target both P1 and P2.
From what I can see, many opcodes have direct clones, and others have clear alternatives.
That would allow all the valuable library work, to be quickly available on Prop2.
Of course it can be further optimised for Prop 2, but even there a good Assembler should allow a single source file, and use conditionals to do more optimised work.