Hub Memory RAM, ROM, and ROM code discussions
Cluso99
Posts: 18,069
While I don't have all the ROM code layout on-hand, here is the top of the ROM layout...
interpreter.src begins at $F004
booter.src begins at $F800
runner.src butts up against $FFFF
There has been some discussion about changes to the hub memory layout. Let's keep that discussion here in it's own thread.
So here are my initial thoughts...
So here are some ideas...
Thoughts???
interpreter.src begins at $F004
booter.src begins at $F800
runner.src butts up against $FFFF
There has been some discussion about changes to the hub memory layout. Let's keep that discussion here in it's own thread.
So here are my initial thoughts...
- Let's remove the boot encryption (no point as it's fully disclosed)
- For this we will need to have the ROM code unscrambled
- Let's keep the booter and spin interpreter entry points the same as the P1
- Let's create all the HUB RAM & ROM as all RAM, and preload the upper RAM with the old ROM data
- For the DE0, we will have less than 64KB available
- Currently I suggest we double map the missing section to the end of hub
- Later, we should just make it an unused hole
- For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data
- Keep the existing lower hub layout the same (crystal freq, clock mode, etc so that all existing programs just work)
So here are some ideas...
- A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
- Change the booter to perhaps load >32KB to hub
- Add boot debug code (like the P2)
- I did an extension to this for the P2 and I wrote it to almost exclusively use P1 compatible instructions
- Perhaps add the ability to boot from SPI FLASH
- Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)
- Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.
Thoughts???
Comments
That's quite an improvement.
re:Perhaps add the ability to boot from SPI FLASH
Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.
Did you have a chip in mind? The Beagle Bone Black I use has a 4GB 8-bit eMMC on-board flash storage (not sure if it's SPI though)
- on a DE2, make the hub as large as possible
- only a tiny bootloader up top (serial/eeprom boot for now) which like you say, loads standard P1 rom to ram (minus fonts on Nano)
Reason: RDxxx/WRxxx can see all available hub memory, therefore so could LMM.
So do I understand this to say we can have 64k of ram if we use DE2?
And we can put in a different interpreter? This wouldn't necessarily need to be a spin interpreter, would it?
50KB fails! I was hoping to get more than this but the memory is in 9bit blocks so 1 bit gets wasted per byte
Here are the mods I did to hub_mem.v
Note that this only makes hub $0000-BFFF. Currently there is nothing at $C0000-FFFF.
Now I need to work out how to setup a file with the ROM code required, and to remap some of this to $xxxx-FFFF.
I think double-mapping the last 16k so it takes 32k of the map is the least painful way to got.
Or a UNIVAC 1100 series mainframe....we had 9 bit bytes in a 36 bit word....unless you used FIelddata instead of ASCII which gave you 6 bit characters and 6 per word.
For a Prop, you could have 4 more precious bits per long...could be helpful in adding features to instructions. Four precious bits........double cogs (1 bit), double COGRAM (2 bits), indirection (1 bit)......
Investigating doubling the COG RAM is what my intentions are once I get up and running. I have this vague idea of changing the bootloader to accept existing binary files, loading them as normal with the extra D & S bits being forced to zero. Then we'd could have "extended" binary files that would contain the extra bits, perhaps packed on to the end of the standard code. There'll be lots of gotcha's I'm sure but no harm it trying..
Unfortunately my FPGA board is the original DE0 and is slightly too small to take the DE-NANO image Removing the video circuits and reducing the HUB RAM has squeezed it in but I think I'll have to halve the number of COGS in order to double their RAM...
That's quite a good idea, the memory is x9, so there is actually 4 extra bits 'for free', sitting in the RAM map.
Tools would need to create a 36b image (data tables & default opcodes would clear the upper 4 bits)
That then gives you 2 more bits to think of some use for, above the 10b address extensions ...
Assuming random memory accesses being issued by the prop it appears you should be able to read/write a single 32 bit memory location out in the SDRAM cycle time of 10 SDRAM clocks. The SDRAM would need to be operating at 2x the Prop clock rate for best performance. That still leaves 22 SDRAM clock cycles in the 32 per hub cycle for doing the other accesses including refresh. I think this remaining time would allow at least 16 bytes worth of data reads from a graphics controller if you used two burst transfers of 4 words each time, or if you did a single transfer of 8 words per burst. I think it should take 11 SDRAM cycles for each 64 bit transfer. Refresh can always be held off until the blanking intervals. This yields an effective graphics transfer rate of up to 128 bits per hub cycle which is one byte per prop clock. So this should readily allow 1024x768x8bpp (VESA XGA) if you clocked the prop at 65MHz for example which still keeps things within the 143MHz SDRAM memory rating. Higher bpp modes could be achieved at lower video resolutions or if the graphics controller can use full page burst transfers with termination to still allow the prop hub its accesses on demand (TBD). You could also palettize using a dual ported block RAM and get higher color depths that way at the expense of the total colors displayed. And you could try to use a FIFO and different clock domains to decouple video generation timing from memory read timing which might buy more flexibility in the number of video modes possible at a fixed Prop/SDRAM clock rate.
Once I get myself sorted with a board and up to speed on the tools etc, I would hope to experiment with some of these ideas assuming things go well. That bottom interface on the DE0 nano seems it could potentially be used to drive a VGA signal directly out from the FPGA without burning any of the top GPIO port pins. Or maybe even via some TMDS stuff for LCDs perhaps??? All sorts of possibilities start to open up....
UPDATE: Looking at this some more, I now think the best you can do in the 32 SDRAM clocks per hub cycle is actually accessing up to 2 sequential LONGs from a (master) COG, and also 16 words (32 bytes) from another controller (eg. gfx) if you use precharge to terminate an SDRAM page burst in time for the next COG's access. This should allow 1024x768x16bpp and truecolor 800x600x24bpp resolutions which would be nice if you can spare all the pins needed to output the color data (or use an external RAMDAC). The "master" COG reading two sequential longs in a hub cycle from SDRAM may also open up options for some sort of prefetching scheme with hub exec/LMM given you can now read 2 sequential instructions per hub cycle instead of one, so you can read ahead when you are not branching and fill a fifo with future instructions. This could lead to some form of caching and improved performance. Dreaming of more possibilites....