First attempt at building a P1V image for the BeMicro Max10

ozpropdevozpropdev Posts: 2,473
edited 2015-08-27 - 11:06:56 in Propeller 1
Hi All

Update: 11th Feb 2015 : See working BeMicro-Max10 files here.

I started on a P1V image today for the BeMicro MAX10 board and I've hit a wall.
A rough estimate suggested 4 cogs could fit into the 10M08DAF484C8GES Max10 device. (8000 LE's).
This included the removal of the lower rom stuff (tables,fonts) like the DE0-Nano build.
Quartus II 14.1 reported problems fitting the image with the following details.
+---------------------------------------------------------------------------------+
; Flow Summary                                                                    ;
+------------------------------------+--------------------------------------------+
; Flow Status                        ; Flow Failed - Sat Feb 07 20:00:50 2015     ;
; Quartus II 64-Bit Version          ; 14.1.0 Build 186 12/03/2014 SJ Web Edition ;
; Revision Name                      ; top                                        ;
; Top-level Entity Name              ; top                                        ;
; Family                             ; MAX 10                                     ;
; Device                             ; 10M08DAF484C8GES                           ;
; Timing Models                      ; Preliminary                                ;
; Total logic elements               ; [color=red]17,306 / 8,064 ( 215 % ) [/color]                  ;
;     Total combinational functions  ; 15,830 / 8,064 ( 196 % )                   ;
;     Dedicated logic registers      ; 2,930 / 8,064 ( 36 % )                     ;
; Total registers                    ; 2930                                       ;
; Total pins                         ; 42 / 250 ( 17 % )                          ;
; Total virtual pins                 ; 0                                          ;
; Total memory bits                  ; 327,680 / 387,072 ( 85 % )                 ;
; Embedded Multiplier 9-bit elements ; 0 / 48 ( 0 % )                             ;
; Total PLLs                         ; 1 / 2 ( 50 % )                             ;
; UFM blocks                         ; 0 / 1 ( 0 % )                              ;
; ADC blocks                         ; 0 / 1 ( 0 % )                              ;
+------------------------------------+--------------------------------------------+

By removing the ram loading stuff in hub_mem.v file has a dramatic effect on LE usage.
(* ram_init_file = "hub_rom_high.hex" *)	reg	[31:0] rom_high [4095:0];
changed to
reg	[31:0] rom_high [4095:0];
Resulted in a successful build
+---------------------------------------------------------------------------------+
; Flow Summary                                                                    ;
+------------------------------------+--------------------------------------------+
; Flow Status                        ; Successful - Sat Feb 07 20:29:15 2015      ;
; Quartus II 64-Bit Version          ; 14.1.0 Build 186 12/03/2014 SJ Web Edition ;
; Revision Name                      ; top                                        ;
; Top-level Entity Name              ; top                                        ;
; Family                             ; MAX 10                                     ;
; Device                             ; 10M08DAF484C8GES                           ;
; Timing Models                      ; Preliminary                                ;
; Total logic elements               ; 7,415 / 8,064 ( 92 % )                     ;
;     Total combinational functions  ; 6,788 / 8,064 ( 84 % )                     ;
;     Dedicated logic registers      ; 2,898 / 8,064 ( 36 % )                     ;
; Total registers                    ; 2898                                       ;
; Total pins                         ; 42 / 250 ( 17 % )                          ;
; Total virtual pins                 ; 0                                          ;
; Total memory bits                  ; 327,680 / 387,072 ( 85 % )                 ;
; Embedded Multiplier 9-bit elements ; 0 / 48 ( 0 % )                             ;
; Total PLLs                         ; 1 / 2 ( 50 % )                             ;
; UFM blocks                         ; 0 / 1 ( 0 % )                              ;
; ADC blocks                         ; 0 / 1 ( 0 % )                              ;
+------------------------------------+--------------------------------------------+

Is their another way to initialize the ram with an image or is this result indicating that the MAX10 cell design is consumed by ROM emulation.
The obvious answer is probably a defective "ME", but any other suggestions would be a great help. :)
Cheers
Brian
Melbourne, Australia
«13

Comments

  • pik33pik33 Posts: 792
    edited 2015-02-07 - 05:02:04
    Now I need to check if this is the case with Cyclone 4 too

    Edit: I didn't notice any difference in LE usage when commented out (* ram_init_file = "hub_rom_high.hex" *)

    Something specific to MAX10?
  • TubularTubular Posts: 3,515
    edited 2015-02-07 - 15:51:45
    Is it something to do with trying to stuff the code into the ALMs rather than the flash memory of the device?

    I would be might impressed if we could get 3 cogs running, let alone 4

    Want me to post your Max10?
  • jmgjmg Posts: 13,346
    edited 2015-02-07 - 16:32:07
    The memory reports the same, so it seems strange

    These may help, some suggest it is a little tricky
    http://jimselectronicsblog.blogspot.co.nz/2014/12/storing-nios-ii-application-code-in-non.html
    http://www.altera.com/support/kdb/solutions/rd10302014_959.html

    Still, the rest looks promising, if tight. 92% full, with 649 Spare LE's
     Device                             10M08DAF484C8GES                           ;
     Total logic elements             7,415 / 8,064 ( 92 % )                     ;
       Total combinational functions  6,788 / 8,064 ( 84 % )                     ;
       Dedicated logic registers      2,898 / 8,064 ( 36 % )                     ;
     Total registers                    2898                                       ;
     Total memory bits              327,680 / 387,072 ( 85 % )                 ;
    
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-07 - 17:35:47
    Thanks jmg
    It looks like I might have to take a step back to 14.02 to fix the problem. Yikes! WTG Altera :(
    Cheers
    Brian
    Melbourne, Australia
  • TubularTubular Posts: 3,515
    edited 2015-02-07 - 20:41:09
    If anyone else needs the older quartus 14.0.2 for any reason, such as that post jmg linked to, email or pm me and I'll send the dropbox link
  • jmgjmg Posts: 13,346
    edited 2015-02-07 - 21:04:02
    The ug_m10_memory.pdf from Sept 2014 says
    "You can turn on the Enable ERAM Preload option in the More Analysis & Synthesis Settings dialog box."

    but an Altera answer Jan 2015 says
    http://www.altera.com/support/kdb/solutions/rd01072015_668.html

    [" Title : Where can I find the "Enable ERAM Preload" option for MAX 10 devices in the Quartus II software version 14.1 and later?

    Description

    The "Enable ERAM Preload" option for MAX® 10 devices can be located in the Device options in the Quartus® II software version 14.1 and later. "]

    I think that Altera answer is newer then the Blog, so it may help.
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-07 - 21:43:49
    On Quartus II 14.1
    Just tried one of Altera's suggested settings....failed!
    Trying their second suggestion now......
    Melbourne, Australia
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-07 - 22:55:05
    The good new is both suggested settings removed the problem of LE usage.
    The bad news is I now blow out on the memory bits budget.
    I had to reduce to 3 cogs now for some reason, but building for 1 cog still exceeds memory bit limit.
    Here's the result for a 3 cog build on Quartus II 14.1
    ; Total logic elements               ; 6,374 / 8,064 ( 79 % )                     ;
    ; Total memory bits                  ; [color=red]442,368 / 387,072 ( 114 % )[/color]                ;
    
    It looks like a larger Max10 is needed to accommodate a P1V. :(
    Melbourne, Australia
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 01:02:51
    Something is askew...
    The LE per cog changes from 7,415/4 = 1853.75 to 6,374/3 = 2124.66 - a 15.6% rise ?
    RAM usage has also elevated by 14336 Bytes, but that should be mapped to HUB + N 8 COG memory.

    I've not tried Quartus 14.1, but sometimes tools give more lucid info, on a Build that worked, than one that failed.


    With 9 bit memory maths,
    (387,072-512*36*4)/9 = 34816 Bytes of avail memory, or 2048 Bytes above 32768 for ROM. ( a Full 32K ROM will overflow the part )
    Some trade off of ROM features and RAM will be needed, initially.
    2048 Bytes should be enough for a Loader ?
    The Max10 has 176128 Bytes of FLASH, with parallel interface options, so rather than Initialised-RAM, Flash could be mapped.
    The Flash speed is slower than RAM, and burst orientated, so that will need some work.
    A reduced RAM with minimal ROM Loader should give a testing base ?
  • TubularTubular Posts: 3,515
    edited 2015-02-08 - 01:12:08
    Interesting. The increase in memory usage is 8192 + 3 cogs * 2048 bytes. Is that 8192 the rom size?
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-08 - 01:35:55
    I've taken a step back to 14.02 and the problem now is I need a Max10 qdz file to match.
    The Altera site only seems to have a 14.1 version available. The upgrade from 14.0 to 14.02 does not include this file.
    The device installer only recognizes 14.0 files.
    Has anyone successfully added Max10 devices into 14.02 ?

    Edit: Never mind.....It is there....It's been a long day......:)
    Melbourne, Australia
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 01:44:05
    Sounds like work - did you try lowering RAM ?
    eg if ROM is as above (4096x32) , (387,072-512*36*4-4096*36)/9 = 18432 Bytes of HUB RAM
    That's a large ROM, what bumps the size ?
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-08 - 01:52:05
    Reducing the HUB ram size is next on my to do list.
    Each test takes approx. 15~20 minutes. Time is dragging now....more coffee needed....:coffee:
    Melbourne, Australia
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-08 - 02:03:11
    Arrrrrgh!
    This is what I got first run of 14.02

    attachment.php?attachmentid=113104&d=1423389479

    Do you think my answer should be "How the hell do I know!"

    ugh.jpg
    333 x 358 - 37K
    Melbourne, Australia
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-08 - 04:11:06
    Ok things are looking better now.
    I gave up on Quartus 14.02, nothing but heartache there.
    Back to 14.1 now and just built a 4 Cog 16K HUB P1V...phew :)
    ; Total logic elements               ; 7,481 / 8,064 ( 93 % )                     ;
    ;     Total combinational functions  ; 6,835 / 8,064 ( 85 % )                     ;
    ;     Dedicated logic registers      ; 2,890 / 8,064 ( 36 % )                     ;
    ; Total registers                    ; 2890                                       ;
    ; Total memory bits                  ; 327,680 / 387,072 ( 85 % )                 ;
    
    I bump HUB ram up till it fits next. Now I can get some sleep... :)
    Cheers
    Brian
    Melbourne, Australia
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 10:43:08
    Nice progress. The extra RAM avail is not much, so could be better to do some 'signs of life tests' instead ?
    ie Does it run, and how fast ?
    There may be a way to drop the ROM size to just the [interpreter and loader] value of 0xffff-0xf002 = 4093, or even further, to Loader alone.
  • TubularTubular Posts: 3,515
    edited 2015-02-08 - 12:53:33
    ozpropdev wrote: »
    Ok things are looking better now.
    I gave up on Quartus 14.02, nothing but heartache there.
    Back to 14.1 now and just built a 4 Cog 16K HUB P1V...phew :)
    ; Total logic elements               ; 7,481 / 8,064 ( 93 % )                     ;
    ;     Total combinational functions  ; 6,835 / 8,064 ( 85 % )                     ;
    ;     Dedicated logic registers      ; 2,890 / 8,064 ( 36 % )                     ;
    ; Total registers                    ; 2890                                       ;
    ; Total memory bits                  ; 327,680 / 387,072 ( 85 % )                 ;
    
    I bump HUB ram up till it fits next. Now I can get some sleep... :)
    Cheers
    Brian

    Well done Brian. Thats quite a tight fit on the logic elements, it will be interesting what speed it runs to

    If you get a moment I'd be curious whether 8 cogs + 32k hub ram fits in a 10M16
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 14:48:30
    Tubular wrote: »

    If you get a moment I'd be curious whether 8 cogs + 32k hub ram fits in a 10M16

    On paper, that depends on the ROM handling chosen .
    If you wanted to simply drop-in ROM the same as P1, then it comes up short (even on 10M25)

    10M16 :
    549*1024-(32768*9 + 32768*9 +512*36*8) = -175104b (19456 Bytes short)

    Even 10M25
    675*1024-(32768*9 + 32768*9 +512*36*8) = -46080 (5120 Bytes Short)

    Or solving for ROM size (or Bonus RAM, if ROM is handled differently, see below )
    10M16 : (549*1024-(32768*9 +512*36*8))/9 = 13312 ROM (RAM)
    10M25 : (675*1024-(32768*9 +512*36*8))/9 = 27648 ROM (RAM)

    However, there is a LOT of User Flash with a Parallel burst read ability, that may be able to be used for most ROM tasks.

    10M08 : 1376*1024/8 = 176128 Bytes Flash
    10M16 : 2,368*1024/8 = 303104 Bytes Flash
    10M25 : 3,200*1024/8 = 409600 Bytes Flash

    Certainly, it can have many COG images and the Logic or even an Opcode to Load Page from Flash would open this.

    The timing diagrams show a Address and Burst Count of up to 128 x 32 reads, but I cannot see if there are any boundary caveats on that address ?.
    If the Burst read takes any starting address and feeds up to 128 words, then that is easy to use.
    Even if those reads have to be on 128 word boundaries, it is still useful.
    With a Burst Count shown that could be provided by the ReadFlash Opcode
    aka
     ReadFlash   @Ra, @Rb
    

    Ra has 32b Flash Start address, and Rb contains Burst Count of 7(?9) bits, 1 bit for COG or HUB destination and
    lower 9 bits as Dest Address if COG, or lower 15 ?? BITs if HUB destination.

    The same opcode could work nicely with External QuadSPI memory.. Same params, but Count can be larger (9b? is easy to set)

    Data says Burstcount range is 1, 2, 4, 7, ... 128
    Looks like 1 can be used for single word reads, but 3,5,6 seem invalid ?, and the timing suggests pair-gets ?
    An example gives 6 Count, so maybe that's a typo ?

    The exact handling of Flash Reads varies with Chip part code.

    I think a 10M08 can read 128 x 32 bit words in (5+128*2)/116M = 2.25 us, which is a pretty nimble task switch, or Function call.
  • SeairthSeairth Posts: 2,371
    edited 2015-02-08 - 15:02:22
    It looks like you may be able to use the Avalon-MM IP Core to expose the User Flash Memory as read-only memory. If that's the case, the BeMicroMAX10 appears to have 8K of flask, which should then be usable as 8K of ROM. That should at least be enough to add the boot rom and spin interpreter, right?

    See MAX 10 User Flash Memory User Guide for details.

    And if this is correct, this would be a rather neat way to implement a P1V. You would program the MAX10 with the core, keeping the entire ROM in UFM. Then the "ROM" could potentially be modified without having to touch the verilog.
  • TubularTubular Posts: 3,515
    edited 2015-02-08 - 15:40:54
    From the usage figures, it seems to allow 8 bit access to the 9 bit memories 32768 * 8 hub + 4 * 2048 * 8 cogs = 327,680 like OzPropDev got above. It could of course be all the free bits are the 9th bits in each block, and its misreporting whats truly free

    Anyway yes a reduced ROM or access via flash would be fine
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 15:47:21
    Seairth wrote: »
    If that's the case, the BeMicroMAX10 appears to have 8K of flask, which should then be usable as 8K of ROM. That should at least be enough to add the boot rom and spin interpreter, right?
    Where do you extract 8K ? Data says 10M08 has 1376kb or 176,128 Bytes of User FLASH

    The Flash has a readdatavalid signal, so it should need a fairly simple state machine.

    Seairth wrote: »
    And if this is correct, this would be a rather neat way to implement a P1V. You would program the MAX10 with the core, keeping the entire ROM in UFM. Then the "ROM" could potentially be modified without having to touch the verilog.

    Correct, but Flash PGM from User code is a little more work.
    There is also a speed penalty for Flash reads over RAM reads, but for many (most?) apps that would not matter.
    Using the BurstCount should reduce the impact of the Flash latency.
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 15:51:18
    Tubular wrote: »
    It could of course be all the free bits are the 9th bits in each block, and its misreporting whats truly free
    Yup, which is why my calculation used 9 bits.
    The '9 bit effect' means the tools will still report some Memory Free, even when all available bytes are taken.
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-08 - 16:37:52
    @Tubular
    FYI. I was able to build a 6 Cog 32K HUB image for a 10M16.
    ; Family                             ; MAX 10                                     ;
    ; Device                             ; 10M16DAF256C7G                             ;
    ; Timing Models                      ; Advance                                    ;
    ; Total logic elements               ; 11,102 / 15,840 ( 70 % )                   ;
    ;     Total combinational functions  ; 10,077 / 15,840 ( 64 % )                   ;
    ;     Dedicated logic registers      ; 4,164 / 15,840 ( 26 % )                    ;
    ; Total registers                    ; 4164                                       ;
    ; Total memory bits                  ; 491,520 / 562,176 ( 87 % )                 ;
    
    Cheers
    Brian
    Melbourne, Australia
  • TubularTubular Posts: 3,515
    edited 2015-02-08 - 16:47:54
    Ok, that'd suit me nicely. Thanks for checking that.

    Haven't seen pricing for those 10m16s yet. Altera seem to release new models at the start of each quarter - the 10M50's appeared in the last release. Hopefully the 10M16's aren't far away
  • SeairthSeairth Posts: 2,371
    edited 2015-02-08 - 17:22:08
    jmg wrote: »
    Where do you extract 8K ? Data says 10M08 has 1376kb or 176,128 Bytes of User FLASH

    The BeMicroMAX10 uses the 10M08DAF484C8GES. According to Mouser and Arrow's BeMicroMAX10 details, that particular version has 256Kb. I believe the 1376Kb is the maximum available in a 10M08 package, Based on the MAX10 overview document, the 1376Kb also includes the configuration flash (though the User Flash Memory User Guide states differently).

    Of course, the "ES" at the end of the part number also means "Engineering Sample", so it might very well be that there's a bit less UFM than will be available in full production chips.

    Edit: to make it even more complicated, if you select the device in Quartus, it states that the maximum UFM is 2555904 bits (2496Kb, 312KB). Interestingly, though, the User Flash Memory User Guide states that the Configuration Flash Memory is 2240Kb. If you add 256Kb to that, you get 2496Kb. Here's my guess: the overview document is wrong, as is the UFM column in the UFM User Guide. The actual total amount of flash is 312KB, split into 8KB of UFM and 280KB of CFM.
  • TubularTubular Posts: 3,515
    edited 2015-02-08 - 17:42:48
    Check out table 4 in that overview doc Seairth, where it indicates 378 Kb of M9K memory.

    You're right about them being new to market, and also it might be possible to repurpose memory to some extent that stops from achieving maximums, but the fits from OzPropDev seem consistent with 378Kbits
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 17:57:49
    Seairth wrote: »
    . According to Mouser and Arrow's BeMicroMAX10 details, that particular version has 256Kb. .

    256kb is 32K Bytes ?
    Altera DOCs do look very conflicted and confused. You wonder if anyone bothers to read them before release ?

    I also find this

    ["Table 2: UFM and CFM Array Size for MAX 10 Devices
    This table lists the dimensions of the UFM and CFM arrays for MAX 10 devices. The Altera On-Chip Flash IP core also gives you access to configuration flash memory (CFM) when you turn on the dual image configuration mode option."] somewhat hidden over in this pdf
    http://www.altera.com/literature/an/an631.pdf

    I think they mean you get access to flash when is it NOT used as the Dual Image

    That makes sense of (8+8+41+29)*(16) = 1376kb, achieved by using (CFM2) & (CFM1), with CFM0 as Single Config, so I think their docs mean to say :

    Available FLASH, in Single Config image mode :

    (UFM0) = 16k Bytes (or 8 COG images)
    (UFM1) = 16k Bytes (or 8 COG images)
    (CFM0) = 82k Bytes ( or 31 COG images)
    (CFM1) = 58k Bytes ( or 29 COG images)
  • jmgjmg Posts: 13,346
    edited 2015-02-08 - 18:16:51
    ozpropdev wrote: »
    FYI. I was able to build a 6 Cog 32K HUB image for a 10M16.

    Those builds map to
    11,102/6 = 1850.33 LUT per COG for 6
    7,481/4 = 1870.25 LUT per COG for 4

    On those rates, a 10M16 is a ceiling of 8.560 COGs worth

    The 10M50 has a price of 5 + $76.95 in EQFP144, and it maps to roughly 27 COGS
    The 10M04 is $9.69 (BGA) @ 119 for ~ 2 COGs
  • Dr_AculaDr_Acula Posts: 5,482
    edited 2015-02-08 - 20:52:22
    Fantastic work!
    There are some interesting things you can do with a P1 with less cogs but more pins - eg a touchscreen with the ability to rapidly dump a megabyte of data out to the screen for fast text refresh in any font. The P1 never seems to have quite enough pins for that. And maybe 8 cogs are not needed - I suspect a VHDL/Verilog UART may take less elements than a cog running UART code.
    There seems to be a thread every few weeks about what the P2 should look like - much more fun to actually be building things and testing them out. I am watching this thread with interest (and some degree of guilt, as I have one of these BeMicro boards sitting in the Man Cave on the 'to-do" projects list).
    Answers: 1) A quadcopter. 2) Very high. 3) The internet. 4) A lot. 5) No.
  • ozpropdevozpropdev Posts: 2,473
    edited 2015-02-09 - 03:25:20
    With 16K hub ram in the Max10 P1V, a modification is needed to inhibit writes to upper 16K of hub overwriting the lower 16K.
    This modification is required to allow the loader to work.
    always @(posedge clk_cog)
    begin
    	if (ena_bus && !a[13] && !a[12] && w && wb[3])
    		ram3[a[11:0]] <= d[31:24];
    	if (ena_bus && !a[13])
    		ram_q3 <= ram3[a[12:0]];
    end
    
    always @(posedge clk_cog)
    begin
    	if (ena_bus && !a[13] && !a[12] && w && wb[2])
    		ram2[a[11:0]] <= d[23:16];
    	if (ena_bus && !a[13])
    		ram_q2 <= ram2[a[12:0]];
    end
    
    always @(posedge clk_cog)
    begin
    	if (ena_bus && !a[13] && !a[12] &&w && wb[1])
    		ram1[a[11:0]] <= d[15:8];
    	if (ena_bus && !a[13])
    		ram_q1 <= ram1[a[12:0]];
    end
    
    always @(posedge clk_cog)
    begin
    	if (ena_bus && !a[13] && !a[12] && w && wb[0])
    		ram0[a[11:0]] <= d[7:0];
    	if (ena_bus && !a[13])
    		ram_q0 <= ram0[a[12:0]];
    end
    
    
    
    When I have a verified running Max10 I'll post the archived project :)
    Melbourne, Australia
Sign In or Register to comment.