P1V - with Lattice ECP5 FPGAs?

2

Comments

  • Yes, guess you could, though you'd only likely need RESn, because there would likely be an oscillator onboard
  • AribaAriba Posts: 2,092
    edited October 13 Vote Up0Vote Down
    Thanks Roger for the numbers

    Looks like my cog numbers in the first post are not far from reality ;)

    Have you used Synplify or LSE for synthesis? (You can switch them in Project Properties)
    LSE is normally faster and gives better results, but does not always work. With LSE you also get an easy to read timing report in the 'Report' Tab (ProcessReports -> LatticeLSE -> TimingReport Summary).
    With a Lattice-XP2 device I got an fmax of 95 MHz and I expect the ECP5 parts will be faster. (The XP2-5 can fit 2 cogs BTW, but has just not enough memory to be useful).

    Andy
  • Tubular wrote: »
    Yes, guess you could, though you'd only likely need RESn, because there would likely be an oscillator onboard
    True, tho for the NC pins, it is often easier to trim to length than to remove them entirely.
    I'm sure some useful user feature could be mapped onto those 3 spare pins... :)
    First 3 pins of Port B, or boot jumpers, or....
  • Nice work Roger ;)
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • roglohrogloh Posts: 557
    edited October 13 Vote Up0Vote Down
    @jmg,
    default P1V project settings were used. I checked feature set and I know for sure it included NUMCOGS=8, counters and video generator in the Verilog. Hoping nothing major was optimized out.

    @Ariba,
    I selected Synplify for synthesis. Somewhere I thought I read that this tool was to be used for ECP5 FPGAs but I guess I could try out the other one too at some point to see what happens.

    @Cluso99
    Cheers - haven't got it all done yet though. Will ultimately need one of Valentin's board to play with I guess. I hope to get my SDRAM code going in there too ultimately.
  • rogloh wrote: »
    @jmg,
    default P1V project settings were used. I checked feature set and I know for sure it included NUMCOGS=8, counters and video generator in the Verilog. Hoping nothing major was optimized out.
    With no specific local PLLs for video, many P1V builds have chosen to optional disable that.
    Would be interested to see how much that changes the numbers.



    rogloh wrote: »
    ... I hope to get my SDRAM code going in there too ultimately.
    .. be interesting to see HyperRAM, and OctaRAM and OctaFLASH options there too... :)


  • roglohrogloh Posts: 557
    edited October 13 Vote Up0Vote Down
    jmg wrote: »
    With no specific local PLLs for video, many P1V builds have chosen to optional disable that.
    Would be interested to see how much that changes the numbers.
    Hopefully not too much, I did connect the clock50 through to the core logic timing circuit (bypassing the PLL), so with any luck it would just need to add in the PLL resource and not a lot of other LUTs.

    EDIT: Sorry, I misinterpreted your reply- yeah you could disable the video generators to save some LUTs, especially on Valentin's board with a dedicated HDMI port that could be driven directly by a video gfx engine.
  • rogloh wrote: »
    EDIT: Sorry, I misinterpreted your reply- yeah you could disable the video generators to save some LUTs, especially on Valentin's board with a dedicated HDMI port that could be driven directly by a video gfx engine.
    With 2 PLLs showing in the ECP5 series, one could be used for SysCLK and the other as a video clock generator for one instance of video output.
    I don't think there is much practical use for 8 video generator instances.... ?
    Initial designs could use SysCLK PLL for the video clock, to simplify clock domains.

  • Good info there Roger.
    Keen to hear your experiences with the Lattice tool.
    Melbourne, Australia
  • Cluso99Cluso99 Posts: 12,847
    edited October 13 Vote Up0Vote Down
    Here are some Cyclone IV sizes (DE0-Nano)
                    LEs     REGs
    HUB             371     324
    COGS   x1      1782     638 
    VIDEO  x1       240     152 
    CTRB   x1       282     121 
    ROTATE x1       284     0 
    
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • jmg wrote: »
    With 2 PLLs showing in the ECP5 series, one could be used for SysCLK and the other as a video clock generator for one instance of video output.
    I don't think there is much practical use for 8 video generator instances.... ?
    Initial designs could use SysCLK PLL for the video clock, to simplify clock domains.
    Actually the video generator is still useful to keep in there in some situations (not always for video). Eg, I used it for I2S audio, and it is used in SaucySoliton's 80MHz USB host. The issue is that the P1V video PLL is not HW based (unless you tweak it ;-) )so there is a lot of jitter there with the fakePLL Chip's code emulates. In some sitatuation that is okay but not for a 12Mbps USB clock sourced from a poorly synthesized 96MHz clock.

    Already communicated with Valentin about PLL usage recently and yes in the ECP5 there are the two HW PLLs, one can be used for video clocking, and the other for P1V with USB at usual P1V clock speeds. This is a particularly good use of the available resources for a P1V implementation with video and USB hosts using ECP5. It's ideal really.
  • Yep the 1 pll of max10 in eqfp144 is certainly a pain. I can see 2 being a really useful baseline
  • roglohrogloh Posts: 557
    edited October 13 Vote Up0Vote Down
    Some more info captured below for ECP5 LUT usage... not sure why CTRB reports less LUTs than CTRA though!
    Screenshot.png
    686 x 342 - 119K
  • Cluso99 wrote: »
    Here are some Cyclone IV sizes (DE0-Nano)

    Merge of those 2 posts gives this comparison :
                    Cyclone IV       ECP5.LUT
                    LEs     REGs     LUT
    HUB             371     324      306  
    COGS   x1      1782     638      1554~1582        ( ? note:  188+386+131+738 = 1443, not 1575 ) 
    VIDEO  x1       240     152      188                ~ 12%  
    CTRB   x1       282     121      CTRA:386 CTRB:131  ~ 33%
    ROTATE x1       284     0        ? ALU:738          ~ 47%  
    
    

    Similar, but not quite identical mapping.
    Looks like CTRA has some common base logic, so reports larger than CTRB

  • roglohrogloh Posts: 557
    edited October 13 Vote Up0Vote Down
    From this early data ECP5 may seem a bit more efficient on usage compare to Cyclone IV (if you equate a Lattice LUT to an Altera LE), but I am still concerned it might possibly be optimizing some things out in my build. Will truly only know only once a final P1V build is proven to be loadable/working on Lattice ECP5 and get the numbers again. I was seeing some strange messages in the reports like this..
    N: CL134 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_ram.v":41:0:41:5|Found RAM r, depth=512, width=32
    @N: CG364 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v":24:20:24:26|Synthesizing module cog_ctr in library work.
    @W: CL265 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v":52:0:52:5|Removing unused bit 31 of ctr[31:0]. Either assign all bits or reduce the width of the signal.
    @W: CL271 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v":52:0:52:5|Pruning unused bits 22 to 14 of ctr[31:0]. If this is not the intended behavior, drive the inputs with valid values, or inputs from the top level.
    @W: CL271 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v":52:0:52:5|Pruning unused bits 8 to 5 of ctr[31:0]. If this is not the intended behavior, drive the inputs with valid values, or inputs from the top level.
    @N: CG364 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_vid.v":22:20:22:26|Synthesizing module cog_vid in library work.
    @W: CL265 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_vid.v":50:0:50:5|Removing unused bit 31 of vid[31:0]. Either assign all bits or reduce the width of the signal.
    @W: CL271 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_vid.v":50:0:50:5|Pruning unused bits 22 to 11 of vid[31:0]. If this is not the intended behavior, drive the inputs with valid values, or inputs from the top level.
    @W: CL265 :"/home/roger/Documents/LatticeFPGA/P1V/source/cog_vid.v":50:0:50:5|Removing unused bit 8 of vid[31:0]. Either assign all bits or reduce the width of the signal.
    

    Scratch that : those bits really are unused in those registers! It's all good.
  • If you look at the datasheet then you see that these bits in the VIDR and CTRx registers are really unused. So it's a good thing if the synthesis finds this.
    The LUT numbers looks okay for me.
    Have you used the Verilog source adapted for Xilinx or the one for Altera?

    Andy

  • roglohrogloh Posts: 557
    edited October 13 Vote Up0Vote Down
    I took the Verilog source from Jac's P1V github code here:

    https://github.com/jacgoudsmit/P1V

    I just had to trim it down to the file list below when importing:
    cog_alu.v
    cog_ctr.v
    cog_ram.v
    cog_vid.v
    cog.v
    dig.v
    features.v
    hub_mem.v
    tim.v
    top.v

    I also changed the i++ in the generate block to i=i+1 to get rid of some parsing errors in dig.v.
    I commented out the Altera pll code in top.v and need to make a Lattice equivalent.
    The tool that analyzes the Verilog syntax upon the Design Refresh step is complaining (just a warning only) that it doesn't like the multidimensional wire arrays. Changing to either System Verilog or Verilog 2001 in Synplify Pro settings don't seem to make a difference. Maybe that type of syntax is non-standard, I'm not a Verilog guru. But it appears to still compile after that anyway.

    eg. I see

    WARNING - /home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v(75,1-90,60) (VERI-1680) multiple packed dimensions are not allowed in this mode of verilog

    from things like this:

    wire [15:0][2:0] tp = { dly == 2'b10, !dly[0], 1'b0, // neg edge w/feedback

  • I have a new definition for
    FPGA = "Frequent Pain Gluteus Area" :lol:
    Melbourne, Australia
  • Am feeling that right now! Though at least progress has been made.
  • Yes these multidimensional arrays is exactly what was changed in the Xilinx version (for Pipistrello), so I use these files also for Lattice.
    For sure PLL and Pins must also be customized. Diamond has the IPexpress tools for easy configuration of the PLLs or the ROMs.

    Andy
  • rogloh wrote: »
    Am feeling that right now! Though at least progress has been made.
    No pain, No gain
    My tolerance to pain must be improving.
    I'm playing with Quartus and Cyclone 10 today and have escaped relatively injury free!
    Melbourne, Australia
  • roglohrogloh Posts: 557
    edited October 13 Vote Up0Vote Down
    Got a little bit further tonight on this.
    - Added a Lattice PLL to the build using Clarity Designer to set it up and imported into top.v for driving the P1V clocks.
    - Figured out correct path for sorting out the preinitialized ROM blocks and it reports 80kB RAM is in use now which is correct.
    - Tried to import the SDC file (Synopsis Design Constraints) that Seairth originally created for us into this project. I am not sure all of its syntax is supported by the Lattice Tools but it did read it in and continued with some complaints. Took much longer to route with this added and I am not sure about the final timing. It is not looking like it is getting too high yet.

    I am wondering if it is somehow thinking I want to clock everything at 160MHz (i.e clkpll), while I only want 80MHz. I need to read up more on this. The timing stuff is unfortunately what I understand the least.
    Preference Summary
    
    FREQUENCY NET "clock_160_0" 160.000000 MHz (4096 errors)
    
                4096 items scored, 4096 timing errors detected.
    Warning:  63.666MHz is the maximum frequency for this preference.
    
    FREQUENCY NET "pll_/pll1_inst/buf_CLKI" 50.000000 MHz (0 errors)
                0 items scored, 0 timing errors detected.
    
    FREQUENCY PORT "clock_50" 50.000000 MHz (4096 errors)
    
                4096 items scored, 4096 timing errors detected.
    Warning:   8.683MHz is the maximum frequency for this preference.
    
    



    Also in the Timing Analysis view I get strange numbers in the clk_cog domain, it seems to want to use the 160MHz clock for the data transfers instead. Unsure what gives. Mustn't be setup correctly yet.
    Report Summary
    --------------
    ----------------------------------------------------------------------------
    Preference                              |   Constraint|       Actual|Levels
    ----------------------------------------------------------------------------
                                            |             |             |
    FREQUENCY NET "clock_160_0" 160.000000  |             |             |
    MHz ;                                   |  160.000 MHz|   36.219 MHz|   4 *
                                            |             |             |
    FREQUENCY NET "pll_/pll1_inst/buf_CLKI" |             |             |
    50.000000 MHz ;                         |            -|            -|   0  
                                            |             |             |
    FREQUENCY PORT "clock_50" 50.000000 MHz |             |             |
    ;                                       |   50.000 MHz|    6.048 MHz|   8 *
                                            |             |             |
    ----------------------------------------------------------------------------
    Clock Domains Analysis
    ------------------------
    
    Found 4 clocks:
    
    Clock Domain: clk_cog   Source: SLICE_5575.Q0   Loads: 125
       No transfer within this clock domain is found
    
       Data transfers from:
       Clock Domain: clock_160_0   Source: pll_/pll1_inst/PLLInst_0.CLKOP
          Covered under: FREQUENCY PORT "clock_50" 50.000000 MHz ;   Transfers: 1426
    
    Clock Domain: clock_160_0   Source: pll_/pll1_inst/PLLInst_0.CLKOP   Loads: 2334
       Covered under: FREQUENCY NET "clock_160_0" 160.000000 MHz ;
    
       Data transfers from:
       Clock Domain: clk_cog   Source: SLICE_5575.Q0
          Covered under: FREQUENCY NET "clock_160_0" 160.000000 MHz ;   Transfers: 288
    
       Clock Domain: clk_pll_0   Source: clkgen/SLICE_9352.F0
          Covered under: FREQUENCY NET "clock_160_0" 160.000000 MHz ;   Transfers: 8
    
    Clock Domain: clk_pll_0   Source: clkgen/SLICE_9352.F0   Loads: 832
       Covered under: FREQUENCY NET "clock_160_0" 160.000000 MHz ;
    
       Data transfers from:
       Clock Domain: clock_160_0   Source: pll_/pll1_inst/PLLInst_0.CLKOP
          Covered under: FREQUENCY NET "clock_160_0" 160.000000 MHz ;   Transfers: 1360
    
    Clock Domain: pll_/pll1_inst/buf_CLKI   Source: clock_50.PAD   Loads: 1
       No transfer within this clock domain is found
    
    
    
    
    Still need to add the IO pin mapping too but would need a real board first for testing anything out.
  • rogloh wrote: »
    I am wondering if it is somehow thinking I want to clock everything at 160MHz (i.e clkpll), while I only want 80MHz. I need to read up more on this. The timing stuff is unfortunately what I understand the least.

    Sounds like good progress, for a new target :)

    Once you have the Clock settings sussed, I like the idea of using more of the PLL features in these parts.
    Actual run-time PLL access is poor in FPGAs I've looked at, you need to compile-in the divisors, but you can start with some higher fOUT and have a user-set divider, to give some clock speed
    change ability.

    eg using the data range for VCO and fOUT
    Table 3.22. sysCLOCK PLL Timing Parameter Descriptions  Min   Max Units
    fIN   Input Clock Frequency (CLKI, CLKFB)               8     400 MHz
    fOUT  Output Clock Frequency (CLKOP, CLKOS)             3.125 400 MHz
    fVCO  PLL VCO Frequency                                 400   800 MHz
    fPFD3 Phase Detector Input Frequency                    10    400 MHz
    

    With 80MHz as one SysCLK target, a 5 bit divider can reach down to about RCFAST nominal initial value, if you wanted.
    320MHz => /4=80, /5=64, /6=53.333 /7=45.71428, /8=40 Mhz etc
    400MHz => /4=100, /5=80, /6=66.66 /7=57.14285, /8=50 Mhz /9=44.444 /10 = 40 etc
  • rogloh wrote: »
    WARNING - /home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v(75,1-90,60) (VERI-1680) multiple packed dimensions are not allowed in this mode of verilog

    Make sure your tool interprets all files as SystemVerilog (not Verilog). You may have to override that somewhere. It probably infers Verilog because of the .v extension on most files. I'm planning on renaming the files to use .sv as extension which is more common for SystemVerilog but there's some other reorganizing to do that's more important.

    Once you guys get P1V to work on a Lattice target, I'd be very interested in adding it to the repo!

    ===Jac


    Rancho Cucamonga, CA
  • roglohrogloh Posts: 557
    edited October 14 Vote Up0Vote Down
    Yeah I'll have to have another go at forcing it. Maybe the file rename could help too.
    By the way, Valentin's Indiegogo is almost there now so it's likely going to make it. I think only one more pledge is needed to hit the initial target of $10600 raised for it to get funded. Good news.
  • rogloh wrote: »
    WARNING - /home/roger/Documents/LatticeFPGA/P1V/source/cog_ctr.v(75,1-90,60) (VERI-1680) multiple packed dimensions are not allowed in this mode of verilog

    Make sure your tool interprets all files as SystemVerilog (not Verilog). You may have to override that somewhere. It probably infers Verilog because of the .v extension on most files. I'm planning on renaming the files to use .sv as extension which is more common for SystemVerilog but there's some other reorganizing to do that's more important.

    Once you guys get P1V to work on a Lattice target, I'd be very interested in adding it to the repo!

    ===Jac

    Are you saying that the P1v sources are System Verilog? I thought you had to use Verilog if you planned to synthesize for an FPGA.
  • Chip used one SustemVerilog feature which can pretty easily be recoded to straight Verilog. He may not have even realized it. The synthesizable portion of SustemVerilog is fine for FPGAs good given your tools support it. If you can avoid it then more tools are available for both simulation and synthesis.
  • Well by naming the file with a .sv extension yes I found I was able to prevent the warnings, but only because Diamond appears to skip its parsing step completely with this output instead...

    "No Hierarchy could be parsed out because there is a System Verilog in project."

    I kind of like seeing the Hierarchy view showing resource usage etc so I don't really like that method for getting rid of the warnings and I'd probably choose to live with them instead.

    In general I notice there seem to be a lot less warnings for P1V with Lattice Diamond than with Altera Quartus, though some of this may be because I'm using the tidied up Jac's codebase instead of the original too.
  • David Betz wrote: »
    Are you saying that the P1v sources are System Verilog? I thought you had to use Verilog if you planned to synthesize for an FPGA.

    As far as I understand, SystemVerilog is to Verilog what C++ is to C, in a way. I don't know exactly what the differences are but apparently Chip programmed some parts of the Propeller 1 code in SystemVerilog even though the files have .v extensions which the tools assume to be Verilog. As far as I understand, it's mostly small stuff like using "i++" instead of "i=i+1" in a for-loop.

    Problems with two-dimensional arrays of wires were previously fixed by making subtle changes in formatting: When Andy changed
    wire [height][width] name;
    
    to
    wire [width] name[height];
    
    , it made the Vivado compiler understand that it could use block memory, so after that small change, the Arty and Nexys4 could be supported. See this change.

    The current P1V code is usable on Quartus II (Altera/Intel) and Vivado (Xilinx), if all files are interpreted as SystemVerilog. In Quartus you can set SystemVerilog as the default language in the project settings, or you can set it as override for each file when you add files to a project. In Vivado you can apparently only set the type of each file (not the entire project) but in Vivado this is much easier to do because you don't have to open a properties dialog and scroll through a list of file types. I expect that the Lattice tools will let you override file types; if not, I'll have to prioritize the renaming of the source files to have .sv extensions (which means I have to open all projects and change all file names; not a big deal but I have some more urgent things to fix). I haven't had time to work with Lattice (yet) so I don't know exactly how to change file types and do other magic at this time.

    ===Jac
    Rancho Cucamonga, CA
  • roglohrogloh Posts: 557
    edited October 18 Vote Up0Vote Down
    So I played around a bit more with the Lattice tools, trying to get P1V sorted on an ECP5 FPGA. I'll have to say do like the relative simplicity of use compared to Quartus. They are fairly similar tools but somehow Diamond seems to make a little more sense the way it's structured (to me anyway). Has been reasonably stable on Linux too so far and I'm thankful for that.

    I think I setup the clock constraints now and I've been trying to meet timing and I can get most clocks working out, except for the 2x "clk_pll" used in the P1V design. All my setup timing at 160/80MHz P1V works, but hold timing fails just on the 160MHz P1V "clk_pll" clock (and it reports for 4096 nets!). The 80MHz clk_cog is fine. I think 4096 might be where the tool stops as it seems a coincidental number otherwise.

    Hold time fails because data is arriving too soon relative to the clock edge after the input change propagates through to the next FF in the sequence (ie. second data value from upstream FF arrives too close to first clock edge at downstream FF). You can't slow down the clock period to fix it. You need to either reduce the clock skew between FFs or slow down the data path via extra routing/buffering. The worst case hold time slack I have is about -0.646ns. I wonder if the tools are not making proper use of the clock distribution network in the chip and I am still trying to exactly see how the internal FPGA clock resources are being allocated. One of the issues is that the 80MHz and 160MHz clocks the P1V uses don't come directly from PLL outputs, they come from the tim.v module which divides down an original 160MHz clock according to the CLK register values in the P1V using a counter and I'm not yet sure how these clocks are being distributed with the Lattice part from this Verilog divider module. That counter may well be introducing skew. There are apparently separate HW clock divider blocks inside the Lattice part, maybe I should try to map something over to use that once I figure out how.

    Anyway from my take on things so far I still expect it should be possible to get the P1V to run at its normal 80MHz core speed inside the ECP5 LFE5U-25F even with a slower -8 timing rated part which is what I've been configuring the tools with.

    Roger.
Sign In or Register to comment.