Yes, and that makes great sense. Jmg's post implied 16k for vectors. !!! That's a freaking ton of vectors. Kind of funny to think about.
Not really, that's 16k simply because that's the next-increment in memory size. Vectors actually use is a tiny portion of that.
The rest is valid variable or code space.
Some have indicated locking the vectors, is less than ideal for debug, & moving them out of ROM Locked block
is one way to solve that.
I think that too, but I don't think it's worth extending to 32k, which is a pretty big chunk of RAM. 16k seems like an amount that won't cost us too much. Global variables, data structures, other things can be packed in there with no real impact, when the region isn't being used as a kernel, OS, or support software.
I also think the possible solutions are pretty reasonable and workable.
They can be redirected, at one instruction cost, or a routine can update them at a few instruction cost.
Or, don't write inhibit.
Edit: Its also nice to have those vectors be write inhibited. The code in the region owns them, and that can make recovery from ugly bugs possible. Bonus for debug tools written to run there.
The reason it's 16KB is because that's the ROM size. We had 8 longs up there to cover the debug interrupt instructions, but I expanded the area to accommodate the whole ROM, then added a write-protect mechanism. I think it's just peachy now. We've got the ROM loading into it in just 1.5ms on boot.
I used to have memory wrap, but that caused some real gotcha's. Now, there's a big gap that reads $00's and ignores writes. That is safe and reasonable, I think.
Chip
Is it possible to tweak Pnut to allow loads into the top 16k ($FC000-$FFFF)?
This would help Nano users in particular who have lost half theeir hub memory.
For example the following code compiles Ok from Pnut but never loads the top memory.
Ctrl-M shows object file is correct although "OBJ byes" value is weird.
Chip
Is it possible to tweak Pnut to allow loads into the top 16k ($FC000-$FFFF)?
Does P2load work ?
Seems pnut should allow the full 1M download, as some FPGAs do have that ?
It could easily report the memory slice sizes, on a simple bottom-up and top-down data inspection.
Seems pnut should allow the full 1M download, as some FPGAs do have that ?
It could easily report the memory slice sizes, on a simple bottom-up and top-down data inspection.
The big A9 FPGA's have been brought back to 512k to represent the final silicon.
What is wrong with putting this 16KB in the bottom 16KB of HUB RAM?
The JMP vectors could be placed from $0_1000 upwards (need to be =>$0_1000 to run hubexec).
We would like to be able to unlock the lower 128 byte block due to rdxxxx/wrxxxx immediate access.
Maybe the write protection could be done in 4x 4KBblocks ?
This way, hub is still contiguous. The ROM is copied into the bottom 16KB of hub. If necessary, the user program can move any/all of this to wherever. It permits a much better use of hub for buffers (particularly screen buffers). It also allows subsequent P2 extended versions to have >1MB of hub (with some caveats due to instruction bits).
Whichever the case, the debug interrupt instructions (1 per cog) ought to be placed at one end of memory or the other, as they must be in fixed locations. It seems to me that there is a need for 16KB of write-protectable RAM, as well, and that might as well overlap the debug interrupt instructions, as they need write-protecting, too.
Locating all that at $00000 would be cleaner and would not bifurcate hub memory, but then programs could not start at $00000, like they do now. I think it is nice that beginners get to orient their programs at the start of memory. They can be oblivious, for a time, about things like protected memory at the end of the map which contains the debug interrupt instructions.
Anyway, I think much deference should be given to locating applications at $00000. The end of memory can almost be forgotten about, while frontloading the protected area makes it stick out like a sore thumb. It's kind of like giving your guest the best seat in the room.
My first assumption was that the ROM would be loaded from 0 upwards, containing booter and SHA-stuff.
I am delighted about the current solution. But even with a gap in between, it needs to be possible to load something there while booting.
The simplest way would be to allow Pnut and P2load to load the complete address space even if no RAM is present. Can do no harm?
@ozpropdev's example (as usual) is clean to read and doing a ORG $FC000 for higher RAM or ORG $FFFFC to write a debug vector makes quite sense to any assembler programmer.
But if one wants to use the ROM content and set a debug vector while loading a binary he will need to include the ROM content of the upper area, thus loading a 1MB image.
The other solution would be a change in the binary format, instead of saving a copy of the RAM image, saving every ORG based block with address to load to and size.
Then the P2 booter would need to walk down the list and load each block at each address.
Personally, I would prefer it load at addresses contained in the format, just like the P2 monitor would do on data cut n paste. That kind of thing rocks.
We should do it.
Programmers set ORG where needed, go. Simple, lean, fast, robust.
I dislike having to push a whole megabyte when it's just not gonna get used.
And, if we support ORG blocks, developers can still push a megabyte and zero / data fill the gaps in the image if they want or somehow need to.
....
Locating all that at $00000 would be cleaner and would not bifurcate hub memory, but then programs could not start at $00000, like they do now. I think it is nice that beginners get to orient their programs at the start of memory. They can be oblivious, for a time, about things like protected memory at the end of the map which contains the debug interrupt instructions.
...
Other MCUs have reset/interrupts at 0000, which means you always know where they are, no matter what future memory size you may have.
If you cannot add 16k of memory above 512k, you are forcing a split on what was a clean binary block, & then I'd say placing the ROM at 00H becomes more important. ( That split has already bitten written code.. )
With other MCUs the offsets are largely managed by the tools, (so invisible to any beginnner) and you can use segments in assembler, so that CSEG ORG 00 is still the first byte of code...
( in P2, first byte of HUB code would be something like HSEG 00 ?)
You would probably want a ROM segment in the Assembler, no matter where the base of that is.
But if one wants to use the ROM content and set a debug vector while loading a binary he will need to include the ROM content of the upper area, thus loading a 1MB image.
The other solution would be a change in the binary format, instead of saving a copy of the RAM image, saving every ORG based block with address to load to and size.
Then the P2 booter would need to walk down the list and load each block at each address.
That's called intel hex
Certainly, you do not want to be sending a large 1MB blob & even many files of 1MB are less than ideal...
Sure a second stage loader could do that also, but then you would NEED a second stage loader to access the upper ROM/RAM.
Would it be possible that either the address counter wraps at $FFFFF or the 16 K placed at $FFFFC000 so it wraps with the long boundary?
Then a loader could load a continuous image, say ORG $FFFFFFFC to set a debug vector and then the program image follows at address 0?
Would allow to load continuous starting at the debug vectors leaving BIOS/ROM/RAM unchanged or starting at 0 without changing the upper RAM or start at (FFF)FC000 to load a continuous image in one block?
I'll make PNut.exe, for now, just load up to $FFFBF, if there's data ORGH'd up that high. That will protect the last 16 longs, which are the debug interrupt instructions.
I still need to get this BeMicro-A9 problem solved, somehow. I could just make two different images, but that seems ridiculous.
While you can place code currently starting at $0_0000, users cannot run code from there (hubexec) due to mapping of the cog and lut addresses for the program counter.
So that has to be explained.
Why is that any different to explaining that their hubexec code starts at $0_1000 with the first $xxx bytes reserved for the Interrupt vectors.
And the ROM is initially copied to $0_0000-$0_3FFF (bottom 16KB of HUB RAM).
The pnut2 (or whatever) compiler could default to compile at ORGH $0_4000.
These days, memory maps on micros are often quite complex, with maps including ram, bootloaders, flash, and eeprom, registers, etc.
The P2 would still be extremely simple, and wouldn't require the hub to be broken into two blocks, just one contiguous block. This is far superior, especially for some of the proposed later versions with less cogs that most likely will have smaller hub ram.
Contiguous memory is IMHO always better. Think VGA where you want a large frame buffer. In this P2, you have a max frame buffer size of 512KB-16KB= 496KB.
A 256KB P2 would have a max buffer of 240KB, and a 128KB would give 112KB.
Remember all the old discussions about having a place for mailboxes, etc. These could all fit naturally in the bottom 4KB of Hub below the JMP vectors.
BTW I haven't checked lately. I have assumed the Interrupt Vectors to be physical JUMP instructions. If they are in fact just addresses, they could be placed much lower in Hub, just above the 128 bytes that can be directly accessed using immediate addressing in RDxxxx/WRxxxx instructions.
I still need to get this BeMicro-A9 problem solved, somehow. I could just make two different images, but that seems ridiculous.
Did you check to confirm the DIP sw is actually wired as expected ? - can you activate some other pin, based on the DIP setting to confirm - even using a similar equation syntax, in case Altera gets confused there ?
Comments
I don't know. Kind of like it where it is. Flowing around 0 is just asking for trouble, if you ask me.
Didn't refresh!
BeMicro_A9_Prop2_v27y.jic is still a flat liner.
Thanks for testing it. This just doesn't make any sense.
The debug interrupt vectors start at the last long ($FFFFC) for cog0 and go down. There's one for each COG.
Not really, that's 16k simply because that's the next-increment in memory size. Vectors actually use is a tiny portion of that.
The rest is valid variable or code space.
Some have indicated locking the vectors, is less than ideal for debug, & moving them out of ROM Locked block
is one way to solve that.
I also think the possible solutions are pretty reasonable and workable.
They can be redirected, at one instruction cost, or a routine can update them at a few instruction cost.
Or, don't write inhibit.
Edit: Its also nice to have those vectors be write inhibited. The code in the region owns them, and that can make recovery from ugly bugs possible. Bonus for debug tools written to run there.
I used to have memory wrap, but that caused some real gotcha's. Now, there's a big gap that reads $00's and ignores writes. That is safe and reasonable, I think.
Is it possible to tweak Pnut to allow loads into the top 16k ($FC000-$FFFF)?
This would help Nano users in particular who have lost half theeir hub memory.
For example the following code compiles Ok from Pnut but never loads the top memory.
Ctrl-M shows object file is correct although "OBJ byes" value is weird.
and here the source program
Seems pnut should allow the full 1M download, as some FPGAs do have that ?
It could easily report the memory slice sizes, on a simple bottom-up and top-down data inspection.
Once the chip is released, some may want to use 1M FPGAs for development platforms.
The JMP vectors could be placed from $0_1000 upwards (need to be =>$0_1000 to run hubexec).
We would like to be able to unlock the lower 128 byte block due to rdxxxx/wrxxxx immediate access.
Maybe the write protection could be done in 4x 4KBblocks ?
This way, hub is still contiguous. The ROM is copied into the bottom 16KB of hub. If necessary, the user program can move any/all of this to wherever. It permits a much better use of hub for buffers (particularly screen buffers). It also allows subsequent P2 extended versions to have >1MB of hub (with some caveats due to instruction bits).
Locating all that at $00000 would be cleaner and would not bifurcate hub memory, but then programs could not start at $00000, like they do now. I think it is nice that beginners get to orient their programs at the start of memory. They can be oblivious, for a time, about things like protected memory at the end of the map which contains the debug interrupt instructions.
Anyway, I think much deference should be given to locating applications at $00000. The end of memory can almost be forgotten about, while frontloading the protected area makes it stick out like a sore thumb. It's kind of like giving your guest the best seat in the room.
Agreed with the sore thumb perception. Doing it this way keeps the number of things one must know to get started down lower.
Write protecting the vectors makes a ton of sense. It's an opportunity for those to be managed by system code, should it be in play.
I am delighted about the current solution. But even with a gap in between, it needs to be possible to load something there while booting.
The simplest way would be to allow Pnut and P2load to load the complete address space even if no RAM is present. Can do no harm?
@ozpropdev's example (as usual) is clean to read and doing a ORG $FC000 for higher RAM or ORG $FFFFC to write a debug vector makes quite sense to any assembler programmer.
But if one wants to use the ROM content and set a debug vector while loading a binary he will need to include the ROM content of the upper area, thus loading a 1MB image.
The other solution would be a change in the binary format, instead of saving a copy of the RAM image, saving every ORG based block with address to load to and size.
Then the P2 booter would need to walk down the list and load each block at each address.
Mike
Not that I mind an upgrade.
Personally, I would prefer it load at addresses contained in the format, just like the P2 monitor would do on data cut n paste. That kind of thing rocks.
We should do it.
Programmers set ORG where needed, go. Simple, lean, fast, robust.
I dislike having to push a whole megabyte when it's just not gonna get used.
And, if we support ORG blocks, developers can still push a megabyte and zero / data fill the gaps in the image if they want or somehow need to.
If you cannot add 16k of memory above 512k, you are forcing a split on what was a clean binary block, & then I'd say placing the ROM at 00H becomes more important. ( That split has already bitten written code.. )
With other MCUs the offsets are largely managed by the tools, (so invisible to any beginnner) and you can use segments in assembler, so that CSEG ORG 00 is still the first byte of code...
( in P2, first byte of HUB code would be something like HSEG 00 ?)
You would probably want a ROM segment in the Assembler, no matter where the base of that is.
I'm guessing P2load already does that, and pnut certainly should be fixed.
That's called intel hex
Certainly, you do not want to be sending a large 1MB blob & even many files of 1MB are less than ideal...
Would it be possible that either the address counter wraps at $FFFFF or the 16 K placed at $FFFFC000 so it wraps with the long boundary?
Then a loader could load a continuous image, say ORG $FFFFFFFC to set a debug vector and then the program image follows at address 0?
Would allow to load continuous starting at the debug vectors leaving BIOS/ROM/RAM unchanged or starting at 0 without changing the upper RAM or start at (FFF)FC000 to load a continuous image in one block?
just asking,
Mike
I still need to get this BeMicro-A9 problem solved, somehow. I could just make two different images, but that seems ridiculous.
debug is mostly used in development, so uploading a image with activated debug vector might come in handy?
Mike
So that has to be explained.
Why is that any different to explaining that their hubexec code starts at $0_1000 with the first $xxx bytes reserved for the Interrupt vectors.
And the ROM is initially copied to $0_0000-$0_3FFF (bottom 16KB of HUB RAM).
The pnut2 (or whatever) compiler could default to compile at ORGH $0_4000.
These days, memory maps on micros are often quite complex, with maps including ram, bootloaders, flash, and eeprom, registers, etc.
The P2 would still be extremely simple, and wouldn't require the hub to be broken into two blocks, just one contiguous block. This is far superior, especially for some of the proposed later versions with less cogs that most likely will have smaller hub ram.
Contiguous memory is IMHO always better. Think VGA where you want a large frame buffer. In this P2, you have a max frame buffer size of 512KB-16KB= 496KB.
A 256KB P2 would have a max buffer of 240KB, and a 128KB would give 112KB.
Remember all the old discussions about having a place for mailboxes, etc. These could all fit naturally in the bottom 4KB of Hub below the JMP vectors.
BTW I haven't checked lately. I have assumed the Interrupt Vectors to be physical JUMP instructions. If they are in fact just addresses, they could be placed much lower in Hub, just above the 128 bytes that can be directly accessed using immediate addressing in RDxxxx/WRxxxx instructions.
or is that gone?
Mike
Did you check to confirm the DIP sw is actually wired as expected ? - can you activate some other pin, based on the DIP setting to confirm - even using a similar equation syntax, in case Altera gets confused there ?