Shop OBEX P1 Docs P2 Docs Learn Events
Hub Memory RAM, ROM, and ROM code discussions — Parallax Forums

Hub Memory RAM, ROM, and ROM code discussions

Cluso99Cluso99 Posts: 18,069
edited 2014-08-11 01:09 in Propeller 1
While I don't have all the ROM code layout on-hand, here is the top of the ROM layout...

interpreter.src begins at $F004
booter.src begins at $F800
runner.src butts up against $FFFF

There has been some discussion about changes to the hub memory layout. Let's keep that discussion here in it's own thread.

So here are my initial thoughts...
  • Let's remove the boot encryption (no point as it's fully disclosed)
    • For this we will need to have the ROM code unscrambled
  • Let's keep the booter and spin interpreter entry points the same as the P1
  • Let's create all the HUB RAM & ROM as all RAM, and preload the upper RAM with the old ROM data
  • For the DE0, we will have less than 64KB available
    • Currently I suggest we double map the missing section to the end of hub
    • Later, we should just make it an unused hole
  • For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data
  • Keep the existing lower hub layout the same (crystal freq, clock mode, etc so that all existing programs just work)
Now, once we get to this point, we have a basis to do various things. Because the upper hub is now RAM, we can now replace any sections of that with new code.
So here are some ideas...
  • A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
  • Change the booter to perhaps load >32KB to hub
  • Add boot debug code (like the P2)
    • I did an extension to this for the P2 and I wrote it to almost exclusively use P1 compatible instructions
  • Perhaps add the ability to boot from SPI FLASH
  • Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)
  • Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.
While I have touched on some of the things we can do with the interpreter, etc, I think they should each have their own thread when discussing possibilities, and keep this just for the actual hub memory details.

Thoughts???

Comments

  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2014-08-10 05:33
    re:A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
    That's quite an improvement.

    re:Perhaps add the ability to boot from SPI FLASH
    Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.

    Did you have a chip in mind? The Beagle Bone Black I use has a 4GB 8-bit eMMC on-board flash storage (not sure if it's SPI though)
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-10 06:47
    The same flash asthe p2 will use. Cannot recall its pn.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-10 07:54
    I suggest:

    - on a DE2, make the hub as large as possible
    - only a tiny bootloader up top (serial/eeprom boot for now) which like you say, loads standard P1 rom to ram (minus fonts on Nano)

    Reason: RDxxx/WRxxx can see all available hub memory, therefore so could LMM.
    Cluso99 wrote: »
    While I don't have all the ROM code layout on-hand, here is the top of the ROM layout...

    interpreter.src begins at $F004
    booter.src begins at $F800
    runner.src butts up against $FFFF

    There has been some discussion about changes to the hub memory layout. Let's keep that discussion here in it's own thread.

    So here are my initial thoughts...
    • Let's remove the boot encryption (no point as it's fully disclosed)
      • For this we will need to have the ROM code unscrambled
    • Let's keep the booter and spin interpreter entry points the same as the P1
    • Let's create all the HUB RAM & ROM as all RAM, and preload the upper RAM with the old ROM data
    • For the DE0, we will have less than 64KB available
      • Currently I suggest we double map the missing section to the end of hub
      • Later, we should just make it an unused hole
    • For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data
    • Keep the existing lower hub layout the same (crystal freq, clock mode, etc so that all existing programs just work)
    Now, once we get to this point, we have a basis to do various things. Because the upper hub is now RAM, we can now replace any sections of that with new code.
    So here are some ideas...
    • A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
    • Change the booter to perhaps load >32KB to hub
    • Add boot debug code (like the P2)
      • I did an extension to this for the P2 and I wrote it to almost exclusively use P1 compatible instructions
    • Perhaps add the ability to boot from SPI FLASH
    • Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)
    • Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.
    While I have touched on some of the things we can do with the interpreter, etc, I think they should each have their own thread when discussing possibilities, and keep this just for the actual hub memory details.

    Thoughts???
  • prof_brainoprof_braino Posts: 4,313
    edited 2014-08-10 07:59
    Cluso99 wrote: »

    Let's remove the boot encryption (no point as it's fully disclosed)...For this we will need to have the ROM code unscrambled
    • For the DE2, we can make the whole 64KB RAM, and preload the upper RAM with the old ROM data
    Now, once we get to this point, we have a basis to do various things. Because the upper hub is now RAM, we can now replace any sections of that with new code.
    So here are some ideas...
    • A faster spin interpreter - my interpreter runs ~25% faster, so I am going to try that
    • Change the booter to perhaps load >32KB to hub
    • Perhaps add the ability to boot from SD card (co-exist under FAT16/32 formatted cards)
    • Need to work out how to put the various ROM modules (logs, sin/cos, font, etc) into loadable objects.

    Thoughts???

    So do I understand this to say we can have 64k of ram if we use DE2?

    And we can put in a different interpreter? This wouldn't necessarily need to be a spin interpreter, would it?
  • ColeyColey Posts: 1,110
    edited 2014-08-10 08:09
    If you are going to change or modify the loader why not make it compatible with this ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-10 18:50
    The maximum HUB RAM/ROM for the DE0-Nano is 48KB.
    50KB fails! I was hoping to get more than this but the memory is in 9bit blocks so 1 bit gets wasted per byte :(

    Here are the mods I did to hub_mem.v
    Note that this only makes hub $0000-BFFF. Currently there is nothing at $C0000-FFFF.
    Now I need to work out how to setup a file with the ROM code required, and to remap some of this to $xxxx-FFFF.
    // 16384 x 32 ram with byte-write enables ($0000..$FFFF) = 64KB for DE2
    // 14336 x 32 ram with byte-write enables ($0000..$DFFF) = 56KB for DE0 (fails)
    // 13312 x 32 ram with byte-write enables ($0000..$CFFF) = 52KB for DE0 (fails)
    // 12800 x 32 ram with byte-write enables ($0000..$C7FF) = 50KB for DE0 (fails)
    // 12288 x 32 ram with byte-write enables ($0000..$BFFF) = 48KB for DE0 (compiles)
    
    reg [7:0] ram3 [12287:0];
    reg [7:0] ram2 [12287:0];
    reg [7:0] ram1 [12287:0];
    reg [7:0] ram0 [12287:0];
    
    reg [7:0] ram_q3;
    reg [7:0] ram_q2;
    reg [7:0] ram_q1;
    reg [7:0] ram_q0;
    
    always @(posedge clk_cog)
    begin
        if (ena_bus && w && wb[3])
            ram3[a[13:0]] <= d[31:24];
        if (ena_bus)
            ram_q3 <= ram3[a[13:0]];
    end
    
    always @(posedge clk_cog)
    begin
        if (ena_bus && w && wb[2])
            ram2[a[13:0]] <= d[23:16];
        if (ena_bus)
            ram_q2 <= ram2[a[13:0]];
    end
    
    always @(posedge clk_cog)
    begin
        if (ena_bus && w && wb[1])
            ram1[a[13:0]] <= d[15:8];
        if (ena_bus)
            ram_q1 <= ram1[a[13:0]];
    end
    
    always @(posedge clk_cog)
    begin
        if (ena_bus && w && wb[0])
            ram0[a[13:0]] <= d[7:0];
        if (ena_bus)
            ram_q0 <= ram0[a[13:0]];
    end
    
    
    assign q            = {ram_q3, ram_q2, ram_q1, ram_q0};
    
    /*
    // memory output mux
    
    reg [1:0] mem;
    
    always @(posedge clk_cog)
    if (ena_bus)
        mem <= a[13:12];
    
    assign q            = !mem[1]    ? {ram_q3, ram_q2, ram_q1, ram_q0};
    //        //            : !mem[0]    ? rom_low_q        // comment out this line for DE0-Nano (sacrifices character rom to fit device)
    //                                : rom_high_q;
    */
    endmodule
    
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-10 19:15
    Nice work!

    I think double-mapping the last 16k so it takes 32k of the map is the least painful way to got.
  • TubularTubular Posts: 4,706
    edited 2014-08-10 20:03
    48kB is still a useful result. Well done
  • potatoheadpotatohead Posts: 10,261
    edited 2014-08-10 20:07
    The 9th bit is parity, for error correction capable RAM.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-10 20:30
    potatohead wrote: »
    The 9th bit is parity, for error correction capable RAM.
    Or maybe you could use it to implement a PDP-10 that has a 36 bit word. :-)
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-08-10 21:02
    David Betz wrote: »
    Or maybe you could use it to implement a PDP-10 that has a 36 bit word. :-)

    Or a UNIVAC 1100 series mainframe....we had 9 bit bytes in a 36 bit word....unless you used FIelddata instead of ASCII which gave you 6 bit characters and 6 per word.

    For a Prop, you could have 4 more precious bits per long...could be helpful in adding features to instructions. Four precious bits........double cogs (1 bit), double COGRAM (2 bits), indirection (1 bit)......
  • Willy EkerslykeWilly Ekerslyke Posts: 29
    edited 2014-08-10 23:20
    mindrobots wrote: »
    For a Prop, you could have 4 more precious bits per long...could be helpful in adding features to instructions. Four precious bits........double cogs (1 bit), double COGRAM (2 bits), indirection (1 bit)......

    Investigating doubling the COG RAM is what my intentions are once I get up and running. I have this vague idea of changing the bootloader to accept existing binary files, loading them as normal with the extra D & S bits being forced to zero. Then we'd could have "extended" binary files that would contain the extra bits, perhaps packed on to the end of the standard code. There'll be lots of gotcha's I'm sure but no harm it trying..

    Unfortunately my FPGA board is the original DE0 and is slightly too small to take the DE-NANO image :( Removing the video circuits and reducing the HUB RAM has squeezed it in but I think I'll have to halve the number of COGS in order to double their RAM...
  • jmgjmg Posts: 15,183
    edited 2014-08-10 23:31
    Then we'd could have "extended" binary files that would contain the extra bits, perhaps packed on to the end of the standard code. There'll be lots of gotcha's I'm sure but no harm it trying..
    ...

    That's quite a good idea, the memory is x9, so there is actually 4 extra bits 'for free', sitting in the RAM map.
    Tools would need to create a 36b image (data tables & default opcodes would clear the upper 4 bits)

    That then gives you 2 more bits to think of some use for, above the 10b address extensions ...
  • roglohrogloh Posts: 5,852
    edited 2014-08-11 01:09
    Today I was looking at the SDRAM fitted to the DE-0 nano board as I've been wondering how to put it to good use it for both graphics and hub memory expansion. I read online that the memory fitted on the DE-0 nano is the slower -7 variant of the 32MB SDRAM device (max clock rate is then 143MHz @ CL=3). This will be one of the key limiting factors when using it. Some related information here is useful: http://hamsterworks.co.nz/mediawiki/index.php/SDRAM_Memory_Controller. Theoretically I'd expect you could create a memory controller to allow (just) one master COG to read/write bytes/words/longs randomly at will all within its existing hub timing (effectively one byte/word/long transfer issued per 16 prop instruction cycles) and still leave enough bandwidth for an internal FPGA graphics controller to pull data out from the SDRAM in parallel without needing to add extra wait states to the prop.

    Assuming random memory accesses being issued by the prop it appears you should be able to read/write a single 32 bit memory location out in the SDRAM cycle time of 10 SDRAM clocks. The SDRAM would need to be operating at 2x the Prop clock rate for best performance. That still leaves 22 SDRAM clock cycles in the 32 per hub cycle for doing the other accesses including refresh. I think this remaining time would allow at least 16 bytes worth of data reads from a graphics controller if you used two burst transfers of 4 words each time, or if you did a single transfer of 8 words per burst. I think it should take 11 SDRAM cycles for each 64 bit transfer. Refresh can always be held off until the blanking intervals. This yields an effective graphics transfer rate of up to 128 bits per hub cycle which is one byte per prop clock. So this should readily allow 1024x768x8bpp (VESA XGA) if you clocked the prop at 65MHz for example which still keeps things within the 143MHz SDRAM memory rating. Higher bpp modes could be achieved at lower video resolutions or if the graphics controller can use full page burst transfers with termination to still allow the prop hub its accesses on demand (TBD). You could also palettize using a dual ported block RAM and get higher color depths that way at the expense of the total colors displayed. And you could try to use a FIFO and different clock domains to decouple video generation timing from memory read timing which might buy more flexibility in the number of video modes possible at a fixed Prop/SDRAM clock rate.

    Once I get myself sorted with a board and up to speed on the tools etc, I would hope to experiment with some of these ideas assuming things go well. That bottom interface on the DE0 nano seems it could potentially be used to drive a VGA signal directly out from the FPGA without burning any of the top GPIO port pins. Or maybe even via some TMDS stuff for LCDs perhaps??? All sorts of possibilities start to open up....

    UPDATE: Looking at this some more, I now think the best you can do in the 32 SDRAM clocks per hub cycle is actually accessing up to 2 sequential LONGs from a (master) COG, and also 16 words (32 bytes) from another controller (eg. gfx) if you use precharge to terminate an SDRAM page burst in time for the next COG's access. This should allow 1024x768x16bpp and truecolor 800x600x24bpp resolutions which would be nice if you can spare all the pins needed to output the color data (or use an external RAMDAC). The "master" COG reading two sequential longs in a hub cycle from SDRAM may also open up options for some sort of prefetching scheme with hub exec/LMM given you can now read 2 sequential instructions per hub cycle instead of one, so you can read ahead when you are not branching and fill a fifo with future instructions. This could lead to some form of caching and improved performance. Dreaming of more possibilites....
Sign In or Register to comment.