P1V - with Lattice ECP5 FPGAs?

13»

Comments

  • This is the block of concern... the green is the input clock at 160MHz and the two output clocks are at the right. clk_pll goes through an extra mux and it is delayed relative to clk_cog (by about 0.9ns according to the timing analysis tools). That can't be helping.
    Screenshot.png
    1100 x 348 - 38K
  • With Altera, we put this stuff to one side and just input a straight 160MHz clock (no PLL), which happens to be the highest frequency in the CTS CB3 range that Parallax and several FPGA board vendors seem to install. Thats worked pretty well so far.

    The other trick to try is an external jumper (like chip has done with P2 Hot video) where the tool is unaware of the delay around the loop

    There are external programmable oscillators common in 5x7 footprints too, if you want the programmability

    Just a thought
  • Tubular wrote: »
    With Altera, we put this stuff to one side and just input a straight 160MHz clock (no PLL), which happens to be the highest frequency in the CTS CB3 range that Parallax and several FPGA board vendors seem to install. Thats worked pretty well so far.

    That's a good idea for earliest testing, if you know the MHz you can reach.
    Adafruit have a Si5351A breakout board, sub $10, that can generate anything up to 200MHz.

    Once the Verilog looks ok, the more vendor-specific settings around the FPGA can be worked on.
    My preference would be to ultimately use the FPGA VCO output, and divide from there, with enough divider length to reach the lower boot clock speed.


  • I am trying to do a build right now that uses the Lattice "CLKDIVF" HW element. I am now deriving the P1V clk_cog by dividing clk_pll by 2 within this element. Hopefully the edges of both clocks will stay closer then and skew will be reduced. That's the theory anyway. Will be interesting to see if this fixes/helps/hinders the issue. Will know in a few more minutes...
  • rogloh wrote: »
    I am trying to do a build right now that uses the Lattice "CLKDIVF" HW element. I am now deriving the P1V clk_cog by dividing clk_pll by 2 within this element....

    That seems a good idea to reduce skews. An extension of that would be two dividers, one for CLK_PLL and one for CLK_COG, edge timing would be CLK-Q based, but the relative phase of COG clk to PLL clk would vary at higher divide ratios. Hopefully, that does not matter, if the fastest /2 case is ok.

  • roglohrogloh Posts: 563
    edited October 18 Vote Up0Vote Down
    Yeah @jmg, I might have to try two dividers. Just found I am getting even more skew now with my single CLKDIVF attempt athough I think that might be because these special HW divider blocks are on the outer edge of the FPGA and you need to go out to them from the center to do the divide then come back into the center of the chip for clock distribution. That adds latency. I think that is the case because I see that clk_cog is now very much delayed from clk_pll mostly due to routing delay. But doing this did reduce the total number of hold time errors a lot at the expense of 4096 new setup time violation errors! LOL.

    However using two CLKDIVFs probably means using a faster originating 320MHz source instead of 160MHz. Ideally these divided clocks could come directly from a PLL VCO which probably already has better skew performance to begin with. Problem is the whole P1V dynamic core clock selection thing with the CLK register. Lattice ECP5 FPGAs PLL dividers appear to be static only and that's not a good fit there. Some type of register controlled division logic is required if you want anything other than the stock 80MHz. Will keep on playing...
  • rogloh wrote: »
    ...
    However using two CLKDIVFs probably means using a faster originating 320MHz source instead of 160MHz. Ideally these divided clocks could come directly from a PLL VCO which probably already has better skew performance to begin with. Problem is the whole P1V dynamic core clock selection thing with the CLK register. Lattice ECP5 FPGAs PLL dividers appear to be static only and that's not a good fit there. Some type of register controlled division logic is required if you want anything other than the stock 80MHz. Will keep on playing...

    IIRC, the FPGA VCO is 400~800 MHz, so you could start from 640MHz (or 800,720,560,480MHz).
    It's a pity most FPGAs seem to be compile-time PLL's and not much user-access, which is a shame as the hardware is just sitting there with registers....
  • TubularTubular Posts: 2,837
    edited October 18 Vote Up0Vote Down
    The PLLs can be reconfigured at run time, at least with Altera, its just not easy (yet).

    I was initially confused about whether it was paid IP, but looks like the basics (at least) are free
  • roglohrogloh Posts: 563
    edited October 18 Vote Up0Vote Down
    @jmg, on the ECP5 I like really like a 480MHz VCO because you can then get a true 96MHz for USB clocks as well as your 160MHz and 80MHz for the P1V core. Yes 640MHz VCO is another possible option for non-USB systems, though I don't know how well 320MHz clocks are distributed in this part for toggling logic FFs. Starting to get fast and would have to try it to see if it meets timing. However such a 320MHz clock is pretty localized and doesn't have especially high fanout needs at least.

    As a quick hack just to try to meet timing constraints I just did a compile where the clk_pll clock always comes from the divide[11] and clk_cog comes from divide[12] (ie. no additional MUX in the way to slow things down). Thankfully all setup/hold timing was then met for P1V at 160MHz, and now I've seen that I would believe it should be possible to get P1V on Lattice ECP5 fully working one way or another in the end, we just need to minimize skew between these two clocks.

    Now I am thinking perhaps if that extra mux path was used on BOTH output clocks to support the special case of PLL16X setting where you need to bypass the divider, that would delay both clocks about the same. What I mean here is the muxes select both the 80MHz and 160MHz clocks directly from the corresponding PLL output taps when PLL16X speed is selected by the Propeller's CLK register, or they select them from via the counter divider otherwise. In that way the edges should with any luck still remain close in all cases. It's another idea to try anyway.

    I read there are a couple of Dynamic Clock Select HW blocks in the ECP5 FPGA too, and they might be of use there instead of simple logic based muxes. They should better handle any glitches during dynamic clock selection which might otherwise cause problems.
  • roglohrogloh Posts: 563
    edited October 18 Vote Up0Vote Down
    Just tried an ECP5 build with this dynamic clocking structure and it met P1V setup/hold timing at 160MHz. The green clk net is the 160MHz clock from the PLL, and the clk80 net is also from the PLL but at 80MHz. The two clock outputs are now driven similarly.

    I also found that it might have been the case earlier that only one of my P1V clocks was explicitly setup to use the primary clock network, so that might have explained my hold timing violations. When I get a chance I'll revert back to the original timing design and retry with both clk_pll and clk_cog set to use the primary clock network and see if that works properly again. Would obviously be nicer not to have to make logic changes to the P1V design for Lattice implementations.

    UPDATE: retrying the original design didn't fix hold time, so Lattice tools possibly automatically selected to route the clock over primary clock network anyway. At least I still have a way to meet timing.

    Screenshot.png
    1254 x 529 - 51K
  • Nice work Roger.
  • cgraceycgracey Posts: 8,306
    edited October 18 Vote Up0Vote Down
    Yes! The outputs of flops registered by the same super clock gives you coincident signals that can be used as clocks, themselves. In the case of Quartus, and likely other tools, these clocks will automatically get assigned to low-skew clock routing resources in the FPGA fabric. Then, you can assign them frequencies by their symbolic names and the fitter will attempt to meet your stated timing requirements.

    My experience with live-reprogrammable PLL resources is that they are about 100x more complicated to use than they ought to be, requiring long streams of bits to be shifted into them with not just divider and multiplier constants, but loop filter settings, too, which are completely unrealistic to whip up on the fly. They should have hidden those parts and computed them within the PLL circuitry from the divider/multiplier values. In my mind, the PLL should only have async parallel inputs for the divider and multiplier settings. Then, it becomes practical to use.

    The parallel/synchronous flop-to-clock scheme at the top of this post is the only practical approach that I've found for generating low-skew synchronized clocks.
  • rogloh wrote: »
    Now I am thinking perhaps if that extra mux path was used on BOTH output clocks to support the special case of PLL16X setting where you need to bypass the divider, that would delay both clocks about the same. What I mean here is the muxes select both the 80MHz and 160MHz clocks directly from the corresponding PLL output taps when PLL16X speed is selected by the Propeller's CLK register, or they select them from via the counter divider otherwise. In that way the edges should with any luck still remain close in all cases. It's another idea to try anyway.
    That's a quick to try patch, but still has more variation than CLK-Q pathways on both, so CLK-Q is the ideal long term solution.
    The CLK-Q just means you need to start higher, which the VCO runs at anyway, and have a smallest setting of /2.


  • cgracey wrote: »
    My experience with live-reprogrammable PLL resources is that they are about 100x more complicated to use than they ought to be, requiring long streams of bits to be shifted into them with not just divider and multiplier constants, but loop filter settings, too, which are completely unrealistic to whip up on the fly. They should have hidden those parts and computed them within the PLL circuitry from the divider/multiplier values. In my mind, the PLL should only have async parallel inputs for the divider and multiplier settings. Then, it becomes practical to use.

    Recently I was wrestling with Altera's reconfurable PLL's and after a stressful time eventually got it to work.
    The target was a Max10 and in the end I dropped a friendly P1V in there to handle the PLL config.
    Even used SPIN code to do the work. :)
    Melbourne, Australia
  • Great work Roger. :)
    Melbourne, Australia
  • Thanks. Be even better to see something working on real ECP5 HW soon, otherwise this effort may purely be an academic exercise in new FPGA tool learning. :smile:

    I really want to see it going on that Lattice FPGA. Still need to do the IO assignment mapping too and then check that the ROM loader actually boots inside the P1V and if it can finally be accessed serially. Only then will I be satisfied.
  • So I discovered that the -8 speed grade ECP5 FPGA I was targetting in my Diamond P1V build setup is actually the fastest speed grade, not the slowest, which is -6. When I changed the target to use the -6 speed grade it fails timing again. Doh! Not sure which grade Valentin's final FleaFPGA Ohm board is going to use for production. His Indiegogo photos of prototypes show the slower grade. Be nice if we can make it still meet timing even on the slower part. More work to do.
  • That speed grade difference caught me out too
  • rogloh wrote: »
    So I discovered that the -8 speed grade ECP5 FPGA I was targetting in my Diamond P1V build setup is actually the fastest speed grade, not the slowest, which is -6. When I changed the target to use the -6 speed grade it fails timing again. Doh! Not sure which grade Valentin's final FleaFPGA Ohm board is going to use for production. His Indiegogo photos of prototypes show the slower grade. Be nice if we can make it still meet timing even on the slower part. More work to do.

    Hi Roger,

    Nice work you've done there!

    Wish I had more time to have a direct crack at this myself as I am heavily bogged down with production related matters (among other things) right now..

    You are right. I plan on using the -6 (lowest) speed grade purely for reasons of cost. I do however attempt to counteract that by tweaking the ECP5 Vcore by a few tens of millivolts. One is able to override the default Vcore and temp parameters used by the Place & Route tool to obtain a different timing outcome.

    Regards,
    Valentin
  • @Basman74, thx.

    So I tried another thing to get the PLL to run at 320MHz to see if the system can handle it - but it fails setup timing on -6 part (haven't tried -8). Also I did increase the voltage to 1.12V and temp down to 50C in the timing analysis to find that also helps a bit but it still fails. Having the clk_pll and cog_clk nets coming from the divider register output did fix the hold timing at least so the skew there otherwise is the real culprit for failing hold time.

    I also have tried to split up the 320MHz path into a two FF stage pipeline using an additional register array before the ALU addition part of the original circuit above. This 320MHz clock division operation is really tight to meet because it only leaves 3.125ns from clk to clk in this entire clock divider block and I found I had -1.7ns setup slack in the critical path to resolve.

    Results from the pipelining have helped but not enough. Slack increased nicely to -0.454ns and is still negative, but at least it is heading in the right direction.

    The problem with such an approach is I don't really want to use a 320MHz clock sourced from a 640MHz VCO as it makes it impossible to synthesize the 96MHz clock needed for USB from the same PLL in the ECP5 FPGA, I really want to use a 480MHz VCO frequency. And the second PLL in the FPGA is handy to keep dedicated for generating independent video frequencies too (which won't be necessarily be multiples of 96MHz either).

    Still hoping to resolve this and find some way to get closer phases of the 160MHz and 80MHz clocks for P1V core use. Maybe I need to put in some special constraints for achieving that, but I don't know how to yet. One other possibility I was thinking is that the fake PLL could be driven by the P1V's 80MHz clock, which would of course increase jitter of any synthesized clocks. Perhaps that could be a tradeoff or initial starting point for getting some P1V on ECP5 implementation started if I can't find a better timing solution. I'm less concerned about some minor setup time failures, but the hold timing constraints probably have to be met to have some chance of a stable system.
  • ozpropdev wrote: »
    Recently I was wrestling with Altera's reconfurable PLL's and after a stressful time eventually got it to work.
    The target was a Max10 and in the end I dropped a friendly P1V in there to handle the PLL config.
    Even used SPIN code to do the work. :)

    How much code did that need & what range of config did you go for ?
    Does anyone know if Lattice have similar run-time side-door-access to PLL params ?

  • jmg wrote: »
    Does anyone know if Lattice have similar run-time side-door-access to PLL params ?

    ECP5 series allow for phase adjustment and master enable/disable of any clock output at run-time, but that's it AFAIK.

    I know their MachXO2 series do have more comprehensive run-time access via wishbone (when enabled - I never used it however).

    - Valentin

  • KeithE wrote: »
    Heater. wrote: »
    KeithE,
    Where are you reading news of project IceStorm having stalled?

    I was only referring to "Icestorm support of the iCE40 Ultra parts (by Adafruit employee)" which was being worked on by tannewt from Adafruit. You made me search, and I couldn't find the reference where I saw this being referred to as stalled, but this is in the main ultra thread:

    https://github.com/cliffordwolf/icestorm/issues/68

    "tannewt commented on Aug 1
    Thanks for the heads up @cliffordwolf. I'm not exactly sure when I'll pick it up again. I'll be a lot more interested when chips are more readily available. Thanks!"

    Then people point out that the parts are available, but no response.

    Looks like work has picked back up:

    https://github.com/cliffordwolf/icestorm/issues/68#issuecomment-338422654
Sign In or Register to comment.