Hub Memory RAM, ROM, and ROM code discussions

Cluso99 · 2014-08-10 04:20

While I don't have all the ROM code layout on-hand, here is the top of the ROM layout...

interpreter.src begins at $F004
booter.src begins at $F800
runner.src butts up against $FFFF

There has been some discussion about changes to the hub memory layout. Let's keep that discussion here in it's own thread.

So here are my initial thoughts...

Let's remove the boot encryption (no point as it's fully disclosed)
- For this we will need to have the ROM code unscrambled
Let's keep the booter and spin interpreter entry points the same as the P1
Let's create all the HUB RAM & ROM as all RAM, and preload the upper RAM with the old ROM data
For the DE0, we will have less than 64KB available
- Currently I suggest we double map the missing section to the end of hub
- Later, we should just make it an unused hole
For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data
Keep the existing lower hub layout the same (crystal freq, clock mode, etc so that all existing programs just work)

Now, once we get to this point, we have a basis to do various things. Because the upper hub is now RAM, we can now replace any sections of that with new code.
So here are some ideas...

A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
Change the booter to perhaps load >32KB to hub
Add boot debug code (like the P2)
- I did an extension to this for the P2 and I wrote it to almost exclusively use P1 compatible instructions
Perhaps add the ability to boot from SPI FLASH
Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)
Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.

While I have touched on some of the things we can do with the interpreter, etc, I think they should each have their own thread when discussing possibilities, and keep this just for the actual hub memory details.

Thoughts???

Bob Lawrence (VE1RLL) · 2014-08-10 05:33

re:A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
That's quite an improvement.

re:Perhaps add the ability to boot from SPI FLASH
Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.

Did you have a chip in mind? The Beagle Bone Black I use has a 4GB 8-bit eMMC on-board flash storage (not sure if it's SPI though)

Cluso99 · 2014-08-10 06:47

The same flash asthe p2 will use. Cannot recall its pn.

Bill Henning · 2014-08-10 07:54

I suggest:

- on a DE2, make the hub as large as possible
- only a tiny bootloader up top (serial/eeprom boot for now) which like you say, loads standard P1 rom to ram (minus fonts on Nano)

Reason: RDxxx/WRxxx can see all available hub memory, therefore so could LMM.

Cluso99 wrote: »

While I don't have all the ROM code layout on-hand, here is the top of the ROM layout...

interpreter.src begins at $F004
booter.src begins at $F800
runner.src butts up against $FFFF

There has been some discussion about changes to the hub memory layout. Let's keep that discussion here in it's own thread.

So here are my initial thoughts...
Let's remove the boot encryption (no point as it's fully disclosed)
For this we will need to have the ROM code unscrambled

Let's keep the booter and spin interpreter entry points the same as the P1

Let's create all the HUB RAM & ROM as all RAM, and preload the upper RAM with the old ROM data

For the DE0, we will have less than 64KB available
Currently I suggest we double map the missing section to the end of hub

Later, we should just make it an unused hole

For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data

Keep the existing lower hub layout the same (crystal freq, clock mode, etc so that all existing programs just work)

Now, once we get to this point, we have a basis to do various things. Because the upper hub is now RAM, we can now replace any sections of that with new code.
So here are some ideas...
A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that

Change the booter to perhaps load >32KB to hub

Add boot debug code (like the P2)
I did an extension to this for the P2 and I wrote it to almost exclusively use P1 compatible instructions

Perhaps add the ability to boot from SPI FLASH

Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)

Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.

While I have touched on some of the things we can do with the interpreter, etc, I think they should each have their own thread when discussing possibilities, and keep this just for the actual hub memory details.

Thoughts???

prof_braino · 2014-08-10 07:59

Cluso99 wrote: »

Let's remove the boot encryption (no point as it's fully disclosed)...For this we will need to have the ROM code unscrambled
For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data

Now, once we get to this point, we have a basis to do various things. Because the upper hub is now RAM, we can now replace any sections of that with new code.
So here are some ideas...
A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that

Change the booter to perhaps load >32KB to hub

Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)

Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.

Thoughts???

So do I understand this to say we can have 64k of ram if we use DE2?

And we can put in a different interpreter? This wouldn't necessarily need to be a spin interpreter, would it?

Coley · 2014-08-10 08:09

If you are going to change or modify the loader why not make it compatible with this ?

Cluso99 · 2014-08-10 18:50

The maximum HUB RAM/ROM for the DE0-Nano is 48KB.
50KB fails! I was hoping to get more than this but the memory is in 9bit blocks so 1 bit gets wasted per byte

Here are the mods I did to hub_mem.v
Note that this only makes hub $0000-BFFF. Currently there is nothing at $C0000-FFFF.
Now I need to work out how to setup a file with the ROM code required, and to remap some of this to $xxxx-FFFF.

// 16384 x 32 ram with byte-write enables ($0000..$FFFF) = 64KB for DE2
// 14336 x 32 ram with byte-write enables ($0000..$DFFF) = 56KB for DE0 (fails)
// 13312 x 32 ram with byte-write enables ($0000..$CFFF) = 52KB for DE0 (fails)
// 12800 x 32 ram with byte-write enables ($0000..$C7FF) = 50KB for DE0 (fails)
// 12288 x 32 ram with byte-write enables ($0000..$BFFF) = 48KB for DE0 (compiles)

reg [7:0] ram3 [12287:0];
reg [7:0] ram2 [12287:0];
reg [7:0] ram1 [12287:0];
reg [7:0] ram0 [12287:0];

reg [7:0] ram_q3;
reg [7:0] ram_q2;
reg [7:0] ram_q1;
reg [7:0] ram_q0;

always @(posedge clk_cog)
begin
    if (ena_bus && w && wb[3])
        ram3[a[13:0]] <= d[31:24];
    if (ena_bus)
        ram_q3 <= ram3[a[13:0]];
end

always @(posedge clk_cog)
begin
    if (ena_bus && w && wb[2])
        ram2[a[13:0]] <= d[23:16];
    if (ena_bus)
        ram_q2 <= ram2[a[13:0]];
end

always @(posedge clk_cog)
begin
    if (ena_bus && w && wb[1])
        ram1[a[13:0]] <= d[15:8];
    if (ena_bus)
        ram_q1 <= ram1[a[13:0]];
end

always @(posedge clk_cog)
begin
    if (ena_bus && w && wb[0])
        ram0[a[13:0]] <= d[7:0];
    if (ena_bus)
        ram_q0 <= ram0[a[13:0]];
end


assign q            = {ram_q3, ram_q2, ram_q1, ram_q0};

/*
// memory output mux

reg [1:0] mem;

always @(posedge clk_cog)
if (ena_bus)
    mem <= a[13:12];

assign q            = !mem[1]    ? {ram_q3, ram_q2, ram_q1, ram_q0};
//        //            : !mem[0]    ? rom_low_q        // comment out this line for DE0-Nano (sacrifices character rom to fit device)
//                                : rom_high_q;
*/
endmodule

Bill Henning · 2014-08-10 19:15

Nice work!

I think double-mapping the last 16k so it takes 32k of the map is the least painful way to got.

Tubular · 2014-08-10 20:03

48kB is still a useful result. Well done

potatohead · 2014-08-10 20:07

The 9th bit is parity, for error correction capable RAM.

David Betz · 2014-08-10 20:30

potatohead wrote: »

The 9th bit is parity, for error correction capable RAM.

Or maybe you could use it to implement a PDP-10 that has a 36 bit word. :-)

mindrobots · 2014-08-10 21:02

David Betz wrote: »

Or maybe you could use it to implement a PDP-10 that has a 36 bit word. :-)

Or a UNIVAC 1100 series mainframe....we had 9 bit bytes in a 36 bit word....unless you used FIelddata instead of ASCII which gave you 6 bit characters and 6 per word.

For a Prop, you could have 4 more precious bits per long...could be helpful in adding features to instructions. Four precious bits........double cogs (1 bit), double COGRAM (2 bits), indirection (1 bit)......

Willy Ekerslyke · 2014-08-10 23:20

mindrobots wrote: »

For a Prop, you could have 4 more precious bits per long...could be helpful in adding features to instructions. Four precious bits........double cogs (1 bit), double COGRAM (2 bits), indirection (1 bit)......

Investigating doubling the COG RAM is what my intentions are once I get up and running. I have this vague idea of changing the bootloader to accept existing binary files, loading them as normal with the extra D & S bits being forced to zero. Then we'd could have "extended" binary files that would contain the extra bits, perhaps packed on to the end of the standard code. There'll be lots of gotcha's I'm sure but no harm it trying..

Unfortunately my FPGA board is the original DE0 and is slightly too small to take the DE-NANO image

Removing the video circuits and reducing the HUB RAM has squeezed it in but I think I'll have to halve the number of COGS in order to double their RAM...

jmg · 2014-08-10 23:31

Willy Ekerslyke wrote: »

Then we'd could have "extended" binary files that would contain the extra bits, perhaps packed on to the end of the standard code. There'll be lots of gotcha's I'm sure but no harm it trying..
...

That's quite a good idea, the memory is x9, so there is actually 4 extra bits 'for free', sitting in the RAM map.
Tools would need to create a 36b image (data tables & default opcodes would clear the upper 4 bits)

That then gives you 2 more bits to think of some use for, above the 10b address extensions ...

rogloh · 2014-08-11 01:09

Today I was looking at the SDRAM fitted to the DE-0 nano board as I've been wondering how to put it to good use it for both graphics and hub memory expansion. I read online that the memory fitted on the DE-0 nano is the slower -7 variant of the 32MB SDRAM device (max clock rate is then 143MHz @ CL=3). This will be one of the key limiting factors when using it. Some related information here is useful: http://hamsterworks.co.nz/mediawiki/index.php/SDRAM_Memory_Controller. Theoretically I'd expect you could create a memory controller to allow (just) one master COG to read/write bytes/words/longs randomly at will all within its existing hub timing (effectively one byte/word/long transfer issued per 16 prop instruction cycles) and still leave enough bandwidth for an internal FPGA graphics controller to pull data out from the SDRAM in parallel without needing to add extra wait states to the prop.

Assuming random memory accesses being issued by the prop it appears you should be able to read/write a single 32 bit memory location out in the SDRAM cycle time of 10 SDRAM clocks. The SDRAM would need to be operating at 2x the Prop clock rate for best performance. That still leaves 22 SDRAM clock cycles in the 32 per hub cycle for doing the other accesses including refresh. I think this remaining time would allow at least 16 bytes worth of data reads from a graphics controller if you used two burst transfers of 4 words each time, or if you did a single transfer of 8 words per burst. I think it should take 11 SDRAM cycles for each 64 bit transfer. Refresh can always be held off until the blanking intervals. This yields an effective graphics transfer rate of up to 128 bits per hub cycle which is one byte per prop clock. So this should readily allow 1024x768x8bpp (VESA XGA) if you clocked the prop at 65MHz for example which still keeps things within the 143MHz SDRAM memory rating. Higher bpp modes could be achieved at lower video resolutions or if the graphics controller can use full page burst transfers with termination to still allow the prop hub its accesses on demand (TBD). You could also palettize using a dual ported block RAM and get higher color depths that way at the expense of the total colors displayed. And you could try to use a FIFO and different clock domains to decouple video generation timing from memory read timing which might buy more flexibility in the number of video modes possible at a fixed Prop/SDRAM clock rate.

Once I get myself sorted with a board and up to speed on the tools etc, I would hope to experiment with some of these ideas assuming things go well. That bottom interface on the DE0 nano seems it could potentially be used to drive a VGA signal directly out from the FPGA without burning any of the top GPIO port pins. Or maybe even via some TMDS stuff for LCDs perhaps??? All sorts of possibilities start to open up....

UPDATE: Looking at this some more, I now think the best you can do in the 32 SDRAM clocks per hub cycle is actually accessing up to 2 sequential LONGs from a (master) COG, and also 16 words (32 bytes) from another controller (eg. gfx) if you use precharge to terminate an SDRAM page burst in time for the next COG's access. This should allow 1024x768x16bpp and truecolor 800x600x24bpp resolutions which would be nice if you can spare all the pins needed to output the color data (or use an external RAMDAC). The "master" COG reading two sequential longs in a hub cycle from SDRAM may also open up options for some sort of prefetching scheme with hub exec/LMM given you can now read 2 sequential instructions per hub cycle instead of one, so you can read ahead when you are not branching and fill a fifo with future instructions. This could lead to some form of caching and improved performance. Dreaming of more possibilites....

Hub Memory RAM, ROM, and ROM code discussions

Comments