Forum Update - Announcement about May 10th, 2018 update and your password.

P1V - with Lattice ECP5 FPGAs?

13567

Comments

  • This is the block of concern... the green is the input clock at 160MHz and the two output clocks are at the right. clk_pll goes through an extra mux and it is delayed relative to clk_cog (by about 0.9ns according to the timing analysis tools). That can't be helping.
    Screenshot.png
    1100 x 348 - 38K
  • With Altera, we put this stuff to one side and just input a straight 160MHz clock (no PLL), which happens to be the highest frequency in the CTS CB3 range that Parallax and several FPGA board vendors seem to install. Thats worked pretty well so far.

    The other trick to try is an external jumper (like chip has done with P2 Hot video) where the tool is unaware of the delay around the loop

    There are external programmable oscillators common in 5x7 footprints too, if you want the programmability

    Just a thought
  • Tubular wrote: »
    With Altera, we put this stuff to one side and just input a straight 160MHz clock (no PLL), which happens to be the highest frequency in the CTS CB3 range that Parallax and several FPGA board vendors seem to install. Thats worked pretty well so far.

    That's a good idea for earliest testing, if you know the MHz you can reach.
    Adafruit have a Si5351A breakout board, sub $10, that can generate anything up to 200MHz.

    Once the Verilog looks ok, the more vendor-specific settings around the FPGA can be worked on.
    My preference would be to ultimately use the FPGA VCO output, and divide from there, with enough divider length to reach the lower boot clock speed.


  • I am trying to do a build right now that uses the Lattice "CLKDIVF" HW element. I am now deriving the P1V clk_cog by dividing clk_pll by 2 within this element. Hopefully the edges of both clocks will stay closer then and skew will be reduced. That's the theory anyway. Will be interesting to see if this fixes/helps/hinders the issue. Will know in a few more minutes...
  • rogloh wrote: »
    I am trying to do a build right now that uses the Lattice "CLKDIVF" HW element. I am now deriving the P1V clk_cog by dividing clk_pll by 2 within this element....

    That seems a good idea to reduce skews. An extension of that would be two dividers, one for CLK_PLL and one for CLK_COG, edge timing would be CLK-Q based, but the relative phase of COG clk to PLL clk would vary at higher divide ratios. Hopefully, that does not matter, if the fastest /2 case is ok.

  • roglohrogloh Posts: 631
    edited October 2017 Vote Up0Vote Down
    Yeah @jmg, I might have to try two dividers. Just found I am getting even more skew now with my single CLKDIVF attempt athough I think that might be because these special HW divider blocks are on the outer edge of the FPGA and you need to go out to them from the center to do the divide then come back into the center of the chip for clock distribution. That adds latency. I think that is the case because I see that clk_cog is now very much delayed from clk_pll mostly due to routing delay. But doing this did reduce the total number of hold time errors a lot at the expense of 4096 new setup time violation errors! LOL.

    However using two CLKDIVFs probably means using a faster originating 320MHz source instead of 160MHz. Ideally these divided clocks could come directly from a PLL VCO which probably already has better skew performance to begin with. Problem is the whole P1V dynamic core clock selection thing with the CLK register. Lattice ECP5 FPGAs PLL dividers appear to be static only and that's not a good fit there. Some type of register controlled division logic is required if you want anything other than the stock 80MHz. Will keep on playing...
  • rogloh wrote: »
    ...
    However using two CLKDIVFs probably means using a faster originating 320MHz source instead of 160MHz. Ideally these divided clocks could come directly from a PLL VCO which probably already has better skew performance to begin with. Problem is the whole P1V dynamic core clock selection thing with the CLK register. Lattice ECP5 FPGAs PLL dividers appear to be static only and that's not a good fit there. Some type of register controlled division logic is required if you want anything other than the stock 80MHz. Will keep on playing...

    IIRC, the FPGA VCO is 400~800 MHz, so you could start from 640MHz (or 800,720,560,480MHz).
    It's a pity most FPGAs seem to be compile-time PLL's and not much user-access, which is a shame as the hardware is just sitting there with registers....
  • TubularTubular Posts: 2,956
    edited October 2017 Vote Up0Vote Down
    The PLLs can be reconfigured at run time, at least with Altera, its just not easy (yet).

    I was initially confused about whether it was paid IP, but looks like the basics (at least) are free
  • roglohrogloh Posts: 631
    edited October 2017 Vote Up0Vote Down
    @jmg, on the ECP5 I like really like a 480MHz VCO because you can then get a true 96MHz for USB clocks as well as your 160MHz and 80MHz for the P1V core. Yes 640MHz VCO is another possible option for non-USB systems, though I don't know how well 320MHz clocks are distributed in this part for toggling logic FFs. Starting to get fast and would have to try it to see if it meets timing. However such a 320MHz clock is pretty localized and doesn't have especially high fanout needs at least.

    As a quick hack just to try to meet timing constraints I just did a compile where the clk_pll clock always comes from the divide[11] and clk_cog comes from divide[12] (ie. no additional MUX in the way to slow things down). Thankfully all setup/hold timing was then met for P1V at 160MHz, and now I've seen that I would believe it should be possible to get P1V on Lattice ECP5 fully working one way or another in the end, we just need to minimize skew between these two clocks.

    Now I am thinking perhaps if that extra mux path was used on BOTH output clocks to support the special case of PLL16X setting where you need to bypass the divider, that would delay both clocks about the same. What I mean here is the muxes select both the 80MHz and 160MHz clocks directly from the corresponding PLL output taps when PLL16X speed is selected by the Propeller's CLK register, or they select them from via the counter divider otherwise. In that way the edges should with any luck still remain close in all cases. It's another idea to try anyway.

    I read there are a couple of Dynamic Clock Select HW blocks in the ECP5 FPGA too, and they might be of use there instead of simple logic based muxes. They should better handle any glitches during dynamic clock selection which might otherwise cause problems.
  • roglohrogloh Posts: 631
    edited October 2017 Vote Up0Vote Down
    Just tried an ECP5 build with this dynamic clocking structure and it met P1V setup/hold timing at 160MHz. The green clk net is the 160MHz clock from the PLL, and the clk80 net is also from the PLL but at 80MHz. The two clock outputs are now driven similarly.

    I also found that it might have been the case earlier that only one of my P1V clocks was explicitly setup to use the primary clock network, so that might have explained my hold timing violations. When I get a chance I'll revert back to the original timing design and retry with both clk_pll and clk_cog set to use the primary clock network and see if that works properly again. Would obviously be nicer not to have to make logic changes to the P1V design for Lattice implementations.

    UPDATE: retrying the original design didn't fix hold time, so Lattice tools possibly automatically selected to route the clock over primary clock network anyway. At least I still have a way to meet timing.

    Screenshot.png
    1254 x 529 - 51K
  • Nice work Roger.
  • cgraceycgracey Posts: 9,073
    edited October 2017 Vote Up0Vote Down
    Yes! The outputs of flops registered by the same super clock gives you coincident signals that can be used as clocks, themselves. In the case of Quartus, and likely other tools, these clocks will automatically get assigned to low-skew clock routing resources in the FPGA fabric. Then, you can assign them frequencies by their symbolic names and the fitter will attempt to meet your stated timing requirements.

    My experience with live-reprogrammable PLL resources is that they are about 100x more complicated to use than they ought to be, requiring long streams of bits to be shifted into them with not just divider and multiplier constants, but loop filter settings, too, which are completely unrealistic to whip up on the fly. They should have hidden those parts and computed them within the PLL circuitry from the divider/multiplier values. In my mind, the PLL should only have async parallel inputs for the divider and multiplier settings. Then, it becomes practical to use.

    The parallel/synchronous flop-to-clock scheme at the top of this post is the only practical approach that I've found for generating low-skew synchronized clocks.
  • rogloh wrote: »
    Now I am thinking perhaps if that extra mux path was used on BOTH output clocks to support the special case of PLL16X setting where you need to bypass the divider, that would delay both clocks about the same. What I mean here is the muxes select both the 80MHz and 160MHz clocks directly from the corresponding PLL output taps when PLL16X speed is selected by the Propeller's CLK register, or they select them from via the counter divider otherwise. In that way the edges should with any luck still remain close in all cases. It's another idea to try anyway.
    That's a quick to try patch, but still has more variation than CLK-Q pathways on both, so CLK-Q is the ideal long term solution.
    The CLK-Q just means you need to start higher, which the VCO runs at anyway, and have a smallest setting of /2.


  • cgracey wrote: »
    My experience with live-reprogrammable PLL resources is that they are about 100x more complicated to use than they ought to be, requiring long streams of bits to be shifted into them with not just divider and multiplier constants, but loop filter settings, too, which are completely unrealistic to whip up on the fly. They should have hidden those parts and computed them within the PLL circuitry from the divider/multiplier values. In my mind, the PLL should only have async parallel inputs for the divider and multiplier settings. Then, it becomes practical to use.

    Recently I was wrestling with Altera's reconfurable PLL's and after a stressful time eventually got it to work.
    The target was a Max10 and in the end I dropped a friendly P1V in there to handle the PLL config.
    Even used SPIN code to do the work. :)
    Melbourne, Australia
  • Great work Roger. :)
    Melbourne, Australia
  • Thanks. Be even better to see something working on real ECP5 HW soon, otherwise this effort may purely be an academic exercise in new FPGA tool learning. :smile:

    I really want to see it going on that Lattice FPGA. Still need to do the IO assignment mapping too and then check that the ROM loader actually boots inside the P1V and if it can finally be accessed serially. Only then will I be satisfied.
  • So I discovered that the -8 speed grade ECP5 FPGA I was targetting in my Diamond P1V build setup is actually the fastest speed grade, not the slowest, which is -6. When I changed the target to use the -6 speed grade it fails timing again. Doh! Not sure which grade Valentin's final FleaFPGA Ohm board is going to use for production. His Indiegogo photos of prototypes show the slower grade. Be nice if we can make it still meet timing even on the slower part. More work to do.
  • That speed grade difference caught me out too
  • rogloh wrote: »
    So I discovered that the -8 speed grade ECP5 FPGA I was targetting in my Diamond P1V build setup is actually the fastest speed grade, not the slowest, which is -6. When I changed the target to use the -6 speed grade it fails timing again. Doh! Not sure which grade Valentin's final FleaFPGA Ohm board is going to use for production. His Indiegogo photos of prototypes show the slower grade. Be nice if we can make it still meet timing even on the slower part. More work to do.

    Hi Roger,

    Nice work you've done there!

    Wish I had more time to have a direct crack at this myself as I am heavily bogged down with production related matters (among other things) right now..

    You are right. I plan on using the -6 (lowest) speed grade purely for reasons of cost. I do however attempt to counteract that by tweaking the ECP5 Vcore by a few tens of millivolts. One is able to override the default Vcore and temp parameters used by the Place & Route tool to obtain a different timing outcome.

    Regards,
    Valentin
  • @Basman74, thx.

    So I tried another thing to get the PLL to run at 320MHz to see if the system can handle it - but it fails setup timing on -6 part (haven't tried -8). Also I did increase the voltage to 1.12V and temp down to 50C in the timing analysis to find that also helps a bit but it still fails. Having the clk_pll and cog_clk nets coming from the divider register output did fix the hold timing at least so the skew there otherwise is the real culprit for failing hold time.

    I also have tried to split up the 320MHz path into a two FF stage pipeline using an additional register array before the ALU addition part of the original circuit above. This 320MHz clock division operation is really tight to meet because it only leaves 3.125ns from clk to clk in this entire clock divider block and I found I had -1.7ns setup slack in the critical path to resolve.

    Results from the pipelining have helped but not enough. Slack increased nicely to -0.454ns and is still negative, but at least it is heading in the right direction.

    The problem with such an approach is I don't really want to use a 320MHz clock sourced from a 640MHz VCO as it makes it impossible to synthesize the 96MHz clock needed for USB from the same PLL in the ECP5 FPGA, I really want to use a 480MHz VCO frequency. And the second PLL in the FPGA is handy to keep dedicated for generating independent video frequencies too (which won't be necessarily be multiples of 96MHz either).

    Still hoping to resolve this and find some way to get closer phases of the 160MHz and 80MHz clocks for P1V core use. Maybe I need to put in some special constraints for achieving that, but I don't know how to yet. One other possibility I was thinking is that the fake PLL could be driven by the P1V's 80MHz clock, which would of course increase jitter of any synthesized clocks. Perhaps that could be a tradeoff or initial starting point for getting some P1V on ECP5 implementation started if I can't find a better timing solution. I'm less concerned about some minor setup time failures, but the hold timing constraints probably have to be met to have some chance of a stable system.
  • ozpropdev wrote: »
    Recently I was wrestling with Altera's reconfurable PLL's and after a stressful time eventually got it to work.
    The target was a Max10 and in the end I dropped a friendly P1V in there to handle the PLL config.
    Even used SPIN code to do the work. :)

    How much code did that need & what range of config did you go for ?
    Does anyone know if Lattice have similar run-time side-door-access to PLL params ?

  • jmg wrote: »
    Does anyone know if Lattice have similar run-time side-door-access to PLL params ?

    ECP5 series allow for phase adjustment and master enable/disable of any clock output at run-time, but that's it AFAIK.

    I know their MachXO2 series do have more comprehensive run-time access via wishbone (when enabled - I never used it however).

    - Valentin

  • KeithE wrote: »
    Heater. wrote: »
    KeithE,
    Where are you reading news of project IceStorm having stalled?

    I was only referring to "Icestorm support of the iCE40 Ultra parts (by Adafruit employee)" which was being worked on by tannewt from Adafruit. You made me search, and I couldn't find the reference where I saw this being referred to as stalled, but this is in the main ultra thread:

    https://github.com/cliffordwolf/icestorm/issues/68

    "tannewt commented on Aug 1
    Thanks for the heads up @cliffordwolf. I'm not exactly sure when I'll pick it up again. I'll be a lot more interested when chips are more readily available. Thanks!"

    Then people point out that the parts are available, but no response.

    Looks like work has picked back up:

    https://github.com/cliffordwolf/icestorm/issues/68#issuecomment-338422654
  • roglohrogloh Posts: 631
    edited February 1 Vote Up0Vote Down
    So some good news for Lattice fans. After thinking more about Ariba's idea to remove/simplify the P1V clocking block down and starting with an apparently working Pipistrello (Xilinx) version for P1V, with a little bit of work just now I've been able to get a P1V COG talking serially from a Lattice ECP5 FPGA (LFE5U-25F) responding to the serial probe in Brad's Spin Tool under Linux. :smile:

    I just hacked the top level Verilog (top.v) to bring in the on board 25MHz clock and a reset pin and used a PLL to get 12.5 and 25MHz clocks for clk_cog and clk_pll respectively so that the initialization booter PASM will see a clock in the RC timebase range it expects during boot. Also mapped some IO pins for P30 and P31 to get a serial port to communicate with it over FTDI.

    Even though this is just running at 12.5MHz and I've not run any real code on it yet I think this result so far is a good omen for the FleaFPGA Ohm boards being sent out for those hoping to eventually support a full P1V. I have an earlier prototype board from Valentin if you are wondering how I've got to this point and I'm also hoping to get the final board soon myself.

    Still hacking...need to play more with the clocking stuff to see if I can resurrect a proper solution, and do some more functional testing at higher speeds. But I'm happy to have even got to this point and with any luck can now build upon it.

    Roger.


    Update1 : Just got it running at 75MHz! :cool:
  • Excellent work Roger!
    Hopefully I can clear some time soon to tinker with Valentins new Lattice board. :cool:
    Melbourne, Australia
  • AribaAriba Posts: 2,182
    edited February 6 Vote Up0Vote Down
    I received my Flea Ohm board yesterday :smile:

    Just right to play with it over the weekend. It took a while until all the necessary tools were installed, but then it worked well.
    (Tipp: Clone the FleaOhm-Github repository or download it as ZIP - do not download single files, they don't work this way!).

    I had some problems to synthesize a P1V, but at the end I got it working at 80 MHz. You must use Lattice LSE synthesis engine. Synplify has problems with the Pipistrello P1V source and gives alot of strange warnings.

    Like Roger, I also have only replaced the top.v sourcefile with my own, and created a pin mapping file (see attached picture).
    I've set up a PLL to output 3 frequencies: 12.5, 80 and 160 MHz and I just switch between a clock frequency of 12.5 and 80 MHz depending on the configuration. The fake-PLL for the counters always works with 160 MHz.
    Further I connected the RTS pin of the USB-UART on the FleaOhm board to the reset pin. I used a 100 Ohm resistor, but a Diode may be better. A little Verilog code in the top file emulates the PropPlug reset circuit. With that the board can be detected by the PropTool and SimpleIDE, if you change the
    Reset mode from DTR to RTS. I could also download some example spin files and a C file worked too. All what is missing for a full P1V is an EEPROM connected to P28/29, but there is about 384 kByte unused space in the onboard Flash.

    Currently it's a quite basic P1V, the only advantage over a real Propeller 1 is the bigger RAM of 60 kByte. But 1/3 of the resources are still free for extensions like Multipliers, HDMI video, SDRAM, USB host and more.

    Andy

    EDIT: You don't need the diode in the attached picture. There is already such a connection to a FPGA pin on the board. I was just not aware of it. So absolutely no hardware mods needed for a P1V on the Ohm board!
    428 x 525 - 25K
  • Thats really great, Andy. Nice clear diagram too BTW. And yes 1/3 free is a nice comfortable result.

    Unfortunately the fleaohm modules destined for AU are going via Germany, might be up to another week before we see them here. Normally HK post is relatively quick, but not when it goes via DE

    Thanks to Rogloh for hassling Valentin about the way to get the DTR signal in, that makes it all the more useful to us P1V fans.

  • Tubular wrote: »
    Thats really great, Andy. Nice clear diagram too BTW. And yes 1/3 free is a nice comfortable result.

    Unfortunately the fleaohm modules destined for AU are going via Germany, might be up to another week before we see them here. Normally HK post is relatively quick, but not when it goes via DE

    Thanks to Rogloh for hassling Valentin about the way to get the DTR signal in, that makes it all the more useful to us P1V fans.
    You have plenty of time to draw nice diagrams while waiting until a compilation is done ;) It takes about 14 minutes on my PC for a full P1V with 8 cogs.

    Do you say there is also a DTR signal at the UART chip? Maybe the CBUS pin can be configured for that. I only found RTS which is a bit cumbersome if you change boards often.

    Andy
  • Actually Andy I think you're right and it was RTS we need to use. I had to dig through some old emails, here's what Rogloh was asking for...

    "I did mention the P1V serial reset thing. He says his FT230x only has RTS and not DTR. I know that some Propeller tools do support RTS so I asked if he could at least bring it into any spare FPGA pin,..."

    That was back in October. I'm not sure whether that was taken on board or not. What you have with the RTS link is fine...
  • AribaAriba Posts: 2,182
    edited February 5 Vote Up0Vote Down
    Unfortunatly RTS goes not to a spare FPGA IO pin, only to a JTAG pin which can not be sensed by the logic. Therefore the link from RTS/TDI to RESET. I see now that there is a 820 Ohm resistor anyway in between, so no diode needed, just a wire.

    But still much simpler than the additional PropPlug you need for most Altera boards.
Sign In or Register to comment.