P1V - with Lattice ECP5 FPGAs?

1235

Comments

  • There will be another ECP5 board soon, made by TinyFPGA. It has USB, HyperRAM and SDcard in a breadboard friendly form factor. Price is not known yet.

    Interesting is the USB interface. No FTDI or similar chip, but a USB bootloader done in Verilog in the first configuration image in Flash. This bootloader programs the user configuration image into another part of the Flash and switches then to it (ECP5 has multiboot feature). I think you can not program the SRAM cells direct, so no fast write for testing.

    Here is a picture:
    DVpa5d2UQAAjxHX.jpg:large

    Andy
  • Cool. Interesting they have chosen hyperRAM over SDRAM. Perhaps it was just easier to route on the PCB.

    With the small number of IO pins required I wonder how may PCB layers they needed for connection to that Lattice BGA? Almost looks like it's designed for two layers, but surely not with all the various voltage/gnd pins needed. It would be interesting to see the back of the board too if you had a picture of that as well.
  • Interesting... Perhaps nice to be free of FTDI chip, but why use HyperRam when you have a gazillion I/O pins?
    Wonder if it has same footprint as Parallax Flip module..
    Prop Info and Apps: http://www.rayslogic.com/
  • rogloh wrote: »
    Cool. Interesting they have chosen hyperRAM over SDRAM. Perhaps it was just easier to route on the PCB.

    With the small number of IO pins required I wonder how may PCB layers they needed for connection to that Lattice BGA? Almost looks like it's designed for two layers, but surely not with all the various voltage/gnd pins needed. It would be interesting to see the back of the board too if you had a picture of that as well.

    You can find a lot of infos and pictures on TinyFPGAs Twitter page: https://twitter.com/TinyFPGA
    The whole development process is really open and Luke from TinyFPGA is happy to answer questions ;)
    I have made two designs with 2 layers for ECP5, if you only use the I/O balls of the outer 2 rows, it's doable for the 0.8mm pitch parts.


    Rayman
    It has 48 pins, so no Flip compatibelity. I think HyperRAM is small, cheap and you have to route only 12 wires.
  • RaymanRayman Posts: 8,417
    edited February 11 Vote Up0Vote Down
    Mouser has two ECP5 boards: Versa and Machine Vision. Both are $199.

    Andy: Think either of those would be good for P1V?

    BTW: How does TinyFPGA handle the USB drivers on the PC side? I guess that's the tricky part, if you don't use FTDI...
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    BTW: How does TinyFPGA handle the USB drivers on the PC side? I guess that's the tricky part, if you don't use FTDI...

    That's the hilarious part - embedding a huge USB stack just to get a virtual comport.

    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • AribaAriba Posts: 2,158
    edited February 11 Vote Up0Vote Down
    Rayman wrote: »
    Mouser has two ECP5 boards: Versa and Machine Vision. Both are $199.

    Andy: Think either of those would be good for P1V?

    BTW: How does TinyFPGA handle the USB drivers on the PC side? I guess that's the tricky part, if you don't use FTDI...
    The Lattice ECP5 boards are mostly made to demonstrate the high end features and use the UM or UG types of the FPGAs. These devices are not supported by the free version of the Diamond IDE, so you have to buy also Diamond full version and that's expensive.
    I don't understand why Lattice not makes cheap breakout boards for the low cost ECP5 like they do for MachXO2/3 and other lines.

    The USB bootloader is opensource, I have tried to implement it on the UPDuino board, but had no luck with that. There is no CPU the whole USB protocol and Flash programming is handled by state machines. The PC sees it as a CDC device with a virtual COM port, so no special driver required. The LUT count of the bootloader is smaller than a single P1V cog.
    Don't know why poeple think USB devices need huge stacks, it was done on a Prop1 and can be done on small AVRs in software. It's all a matter of knowing how ...

    Andy

  • roglohrogloh Posts: 597
    edited February 12 Vote Up0Vote Down
    I think an ideal cheap / small board for people starting out could be something like that MAX1000 board but with a Lattice ECP5 FPGA (25k gate) fitted instead. Not available of course, so the closest thing is the Indiegogo funded FleaFPGA Ohm board which is great for HDMI experimenting with P1V's however whether it will become available again in volume and over a long term is TDB. Maybe we just need to build an ECP5-1000 board ourselves.

    While I have no objections in principle of at some point getting rid of FTDI parts if possible, I think Valentin's approach of using a relatively inexpensive FT230 part instead of the FT2232 saves quite a bit of money and board complexity while still retaining a well supported COM port. However a problem I've recently encountered with all this FTDI stuff (in Linux anyway) is that they have two sets of drivers (VCP and D2XX) and they are incompatible, it's one or the other. You apparently need to use D2XX for programming JTAG and must unload their com port driver stuff to allow it to even work. This can play havoc with both Lattice and Quartus tools on Linux and FPGA programming if you don't get it setup right. I even found that Lattice Diamond will segfault if I have another FTDI to serial port adapter plugged in and not unloaded before I start it. PITA.

    Roger

    Ps. LOL the actual FleaFPGA Ohm board just arrived via postman as I was typing in this post!
  • rogloh wrote: »
    Ps. LOL the actual FleaFPGA Ohm board just arrived via postman as I was typing in this post!

    That's really good news, I felt so alone :smile:
    rogloh wrote: »
    I think an ideal cheap / small board for people starting out could be something like that MAX1000 board but with a Lattice ECP5 FPGA (25k gate) fitted instead. ...
    Maybe we just need to build an ECP5-1000 board ourselves.
    ...

    Funny you say that. That's exactly one of the 2 designs I already made. I need a bit more experience with P1V on the FleaOhm, to make the final version. One problem with that Arduino-MKR format is the low pin count (about 20 I/Os on the sides and another 8 for a PMOD).
  • roglohrogloh Posts: 597
    edited February 12 Vote Up0Vote Down
    Ariba wrote: »
    Funny you say that. That's exactly one of the 2 designs I already made. I need a bit more experience with P1V on the FleaOhm, to make the final version. One problem with that Arduino-MKR format is the low pin count (about 20 I/Os on the sides and another 8 for a PMOD).
    Sounds great Andy, the more board choices the better.

    Agree the total pin count on MKR form factor is somewhat smallish, though having that double row PMOD is rather nice. I still think something like it it would be a great choice on a breadboard for prototyping small HW designs, and would be good also to have a microSD connector present on the board too. One other issue there is that unlike MAX10 FPGAs the Lattice parts don't do ADCs internally, though you can probably try to do it like Valentin does. I am yet to see how well that method works, but it's undoubtedly a lot better than nothing.

    For best P1V support I think it might make sense to use a 50MHz xtal (or even 30MHz). The more prime factors the starting crystal frequency has the easier it is to derive lots of different clocks independently instead of PLL cascading. I am having some trouble with the 25MHz limitations and cascading PLLs on the Flea board, but I'm still testing. I can easily get the P1V going at 12.5MHz or 75MHz or 80MHz via a single PLL path but when I try to go through two PLLs to get a final VCO set at 480MHz and then derive the 80/160/96MHz outputs to also support USB rates, it doesn't boot at all. Not sure why but I need to measure clock outputs to see if it is jitter related. What's weird is that this even happens when I stick with the 12.5MHz derived from the first PLL as the slow clock during boot. Doesn't make a lot of sense.

  • +1 for your MKR board Andy. The sooner the better.

    It's Christmas here today, FleaFPGA, Max1000's, Laserpings and more.

    Anyone else notice the MAX1000's have blind vias on both sides of the board?
  • Just scoped some PLL outputs on the Flea board. First PLL seems to introduce a bit of jitter, maybe ~1ns or so peak to peak on the 12.5MHz and 75MHz output taps, though my home scope BW is not especially good to get a precise measurement on this range, and it appears the second cascaded PLL apparently won't achieve lock with this when I brought out its lock output signal for probing. Not sure why the first PLL output is that bad, perhaps the XTAL oscillator specs on the Flea Ohm are a bit jittery to begin with or this is normal for PLLs in FPGAs? I'll have to try to probe the oscillator if I can get my scope onto its output.
  • AribaAriba Posts: 2,158
    edited February 12 Vote Up0Vote Down
    rogloh wrote: »
    ...
    Agree the total pin count on MKR form factor is somewhat smallish, though having that double row PMOD is rather nice. I still think something like it it would be a great choice on a breadboard for prototyping small HW designs, and would be good also to have a microSD connector present on the board too. One other issue there is that unlike MAX10 FPGAs the Lattice parts don't do ADCs internally, though you can probably try to do it like Valentin does. I am yet to see how well that method works, but it's undoubtedly a lot better than nothing.
    ...

    The problem is that I can not fit HyperRAM and/or SDcard beside the big FPGA (17x17mm). I made the PMOD 2x8 pin so I have 12 IOs, this allows to add a HyperRAM module there. I think Tubular has already such a module. Alternatively a module with SPI RAM + SDcard + LCD is possible.
    For the ADCs I brought A0..A2 to comparator inputs, and wired the other comparator input to a common ramp generator at ARef. This should give some analog measuring, it's one of the things I need to try out. Valentins ADCs works according the SigmaDelta principle, which may be better.

    I wonder if a Quickstart compatible board would not be better for P1V, with a second header instead of the touchbuttons.

    Andy

  • Ariba wrote: »
    The problem is that I can not fit HyperRAM and/or SDcard beside the big FPGA (17x17mm).

    I am kind of hopeful that Lattice may one day release that caBGA package 256pin variant mentioned on their web page for the LFE5U-25 ECP5. That would be down to 14x14 mm instead and save a bit of room for you. They seem to have been updating some related files for that only a few months back so perhaps it is soon. Or if you can route/solder 0.5mm BGA's (!) there's already the 285pin package available I think.
  • roglohrogloh Posts: 597
    edited February 12 Vote Up0Vote Down
    Ariba wrote: »
    I wonder if a Quickstart compatible board would not be better for P1V, with a second header instead of the touchbuttons.

    For a good space challenge how about a Propeller flip module size board with P1V FPGA and integrated SDRAM, SD card with a single micro-USB host and mini-HDMI? Be very hard to fit all that in the space available I suspect. You'd probably need to populate both layers. In any case Valentin's board already has all that stuff in the Raspi zero form factor so I guess it makes sense to try to use a breadboard adapter or whatever Raspi breakout stuff exists instead. Still, a flip module P1V + extras would be rather nice, especially if you can just plug in a keyboard and screen and develop directly on the part itself (self-hosting possibility).
  • Good news for Lattice fans! I was just able to get the second ECP5 PLL to lock onto the output of the first PLL, run it at 480MHz, and for the P1V to be recognised by the Propeller tools serially! This result means we should be able to have a USB clock too as well as decent HDMI video resolutions. It must have been one or some combination of the following changes just made...

    1) Enabled high bandwidth PLL settings in both PLLs.
    2) Used 15MHz instead of 75MHz as source clock into second PLL, jitter as a percentage of the overall period seems lower now.
    3) Use of proper FleaFPGA Ohm board not the original early prototype I had been using which I know has some board issues like some pins and SDRAM not working.

    This is how I have the PLL settings configured and working...

    PLL1:
    input: 25MHz from external oscillator,VCO=750MHz, high bandwidth enabled, feedback path from CLKOP
    outputs:
    CLKOP - 25MHz (for VGA over HDMI, locked in phase with CLKOS)
    CLKOS - 125MHz (for VGA over HDMI, 5x25MHz, I expect it could potentially be set to 250MHz if DDR IO is not used)
    CLKOS2 - 15MHz (for feeding PLL2)
    CLKOS3 - 12.5MHz (for fast RC clock emulation)

    PLL2:
    input: 15MHz from PLL1, VCO=480MHz, high bandwidth enabled, feedback path from CLKOS3
    outputs:
    CLKOP - 160MHz (for P1V's clk_pll)
    CLKOS - 96MHz (for future USB host implementation)
    CLKOS2 - 80MHz (for P1V's clk_cog)
    CLKOS3 - 60MHz (for any other future use in P1V counter or other peripherals for example. I guess this could also potentially be 30MHz or 120MHz instead but would have to try those to see if it causes issues because this is the PLL's feedback path too. 120MHz is probably nicer to be able to divide down from in counters.)

    This cascade approach should still allow SVGA and 720p resolutions when the PLL1 VCO is set to 600MHz and 750MHz respectively. XGA needs a PLL VCO frequency of 650MHz and the PLL2 input would need to be 10MHz instead of 15MHz. Hopefully that still syncs. 480/576p HDMI should also be doable with the PLL1 VCO set to 675MHz. Again it all needs to be tested.

    We would need a different P1V bit file for each target HDMI resolution because the PLLs are static. However I think that should still be workable.
  • Thats awesome, Roger!.

    Yes we can cope with different bit config files for a few different resolutions, thats a small price to pay

    Need to find some time in coming days to flash my fleafpga and try this out
  • rogloh wrote: »
    PLL1:
    input: 25MHz from external oscillator,VCO=750MHz, high bandwidth enabled, feedback path from CLKOP
    outputs:
    CLKOP - 25MHz (for VGA over HDMI, locked in phase with CLKOS)
    CLKOS - 125MHz (for VGA over HDMI, 5x25MHz, I expect it could potentially be set to 250MHz if DDR IO is not used)
    CLKOS2 - 15MHz (for feeding PLL2)
    CLKOS3 - 12.5MHz (for fast RC clock emulation)

    PLL2:
    input: 15MHz from PLL1, VCO=480MHz, high bandwidth enabled, feedback path from CLKOS3
    outputs:
    CLKOP - 160MHz (for P1V's clk_pll)
    CLKOS - 96MHz (for future USB host implementation)
    CLKOS2 - 80MHz (for P1V's clk_cog)
    CLKOS3 - 60MHz

    Impressive list of frequencies generated from a single crystal oscillator. Theoretically you can also generate the input into a PLL from a NCO, clocked by the pll_clk, so we get a runtime programmable video clock. Just like it's done on the P1. If the phase of the NCO can be reset at every new line the jitter is not much visible as the P2 shows.

    I think now a ECP5 board should have a HDMI output, that's maybe the biggest advantage over P1 and also P2. If there is HDMI it also needs a big RAM, and to fill that with data an SD card would be good. A MKR type board has only little advantage over the P1 boards we already have.

  • roglohrogloh Posts: 597
    edited February 12 Vote Up0Vote Down
    Ariba wrote: »
    Impressive list of frequencies generated from a single crystal oscillator. Theoretically you can also generate the input into a PLL from a NCO, clocked by the pll_clk, so we get a runtime programmable video clock. Just like it's done on the P1. If the phase of the NCO can be reset at every new line the jitter is not much visible as the P2 shows.

    If a second PLL is dedicated entirely to video yes that could be tried and it could then yield more dynamic resolution flexibility. A 50MHz crystal oscillator input would certainly have supported such an approach on the FleaFPGA Ohm board and still provided a 96MHz clock as well, but having only a 25MHz source I needed to get creative with PLL cascading to achieve a purer 96MHz clock needed for USB host code. I've tried feeding an NCO generated clock to the USB host software before and it didn't seem to work (even for low speed USB devices), though maybe I need to start out with a NCO clock that is already >192MHz to have any hope to synthesize it without severe jitter problems. It's probably worth another try someday, perhaps using a 200MHz clock output by the first PLL whose VCO is operating at 800MHz.
    Ariba wrote: »
    I think now a ECP5 board should have a HDMI output, that's maybe the biggest advantage over P1 and also P2. If there is HDMI it also needs a big RAM, and to fill that with data an SD card would be good. A MKR type board has only little advantage over the P1 boards we already have.

    I also agree Andy, having some cool P1V "extras" make all the difference over just a simple FPGA copy of the P1, especially once they don't tie up any IO port pins or COGs inside the P1V. Like having an inbuilt HDMI video engine that reads text mode screens directly from dual ported hub mapped SRAM or displaying some hires graphics from a larger frame buffer in external SDRAM for example. Also having that SDRAM mapped directly into a COG's hub address range lets you enable GCC and LMM code to achieve very large programs without incurring all the IO pin limitations use you usually deal with once you try to add external memory to the P1. In some prior P1V projects and experiments I've already had various bits and pieces of this stuff working so I already feel it's a very useful addition around a P1V core.
  • Would be really nice to have HDMI output... I guess the SERDES units can do that, right?
    I was thinking you'd want to use SDRAM with a FIFO output for a video buffer.
    You'd have the FIFO connected to a video driver.
    You'd need the SDRAM running as fast as possible.
    Then, you'd just monitor the FIFO half-full signal. As long as it's half full, you can edit the video buffer.
    When it get's half full, you stop edits and fill it back up again.
    Can that work?
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    Would be really nice to have HDMI output... I guess the SERDES units can do that, right?
    I was thinking you'd want to use SDRAM with a FIFO output for a video buffer.
    You'd have the FIFO connected to a video driver.
    You'd need the SDRAM running as fast as possible.
    Then, you'd just monitor the FIFO half-full signal. As long as it's half full, you can edit the video buffer.
    When it get's half full, you stop edits and fill it back up again.
    Can that work?
    SERDES is not available on the cheap ECP5 version supported by the free toolchain. But there are differential output drivers for LVDS and I think they work also with DDR. This should be fast enough.

    I would do it with 2 linebuffers in dual ported RAM (BRAM) which gets constantly read out from a HDMI video output IP. A cog can decide when the 2 buffers get switched, and how the buffers are filled. The pixeldata can be caculated, or read line by line from the SDRAM, or can come from a camera - it's up to the cog code. Something like WAITVID on P1, only that the pixel and color longs are expanded to full lines.
    The SDRAM interface is a seperate peripheral, that should be shareable by several cogs. I remember Roger has already done some work for hub access of SDRAM. Random access will need to write the address for every access, I don't know how fast this is possible. The RAM can then be used for more than just video.

    Andy

  • roglohrogloh Posts: 597
    edited February 13 Vote Up0Vote Down
    Hi Rayman

    You could use some SERDES channels if they are available on the FPGA, but DDR I/O can also be used for doing HDMI or even single data rate I/O toggling for VGA resolutions is achievable on some FPGAs. The cheaper Lattice ECP5's do not have the SERDES so you would use DDR I/O to rapidly toggle output pins. From memory I think it is rated up to 800Mbps per pin, which is 80M pixels per second for HDMI. And 720p is ~750Mbps per channel.

    That FIFO approach you mention is one possibility though it will also periodically block your COG access which may not always be desirable.

    Another approach is if you just lock the memory clock to the P1V clock. By doing this you are able to resolve the usual R/W contention problem at memory controller design time and make a COG's SDRAM accesses transparent to it (SDRAM just looks like high hub RAM). For example with 16 bit wide SDRAM operating at 80MHz and mapped directly into the P1V address space you can get two memory accesses per hub cycle in any bank using a CL=2 setting in the SDRAM. This allows a SINGLE fixed COG to get full R/W SDRAM data access up to 32 bits wide in any hub cycle deterministically as usual while at the same time also achieve a 320Mbps video read rate from your video engine, with some background SDRAM refreshing done in the blanking intervals to not impact the COG or active video portion. That simple model could sustain 800x600x256 colour output from a frame buffer held in SDRAM. If you wanted you could also implement a 256 entry palette in SRAM so you could still drive out 24 bit colour over HDMI as well.

    While that basic method can work it is still bandwidth restricted and the SDRAMs found on FPGA boards are inherently capable of significantly higher performance than 320Mbps so if you need higher resolutions or colour depths it is possible to operate the SDRAM using CL=3 at 160MHz. Then it become possible to read a lot more data per hub cycle. For even higher memory bus utilization again you can also overlap the internal SDRAM bank accesses and do double buffer / page flipping on frame boundaries so TWO fixed COGs get to access 3 of the 4 banks on any hub cycle while the video engine reads from the remaining bank and can itself sustain 1280Mbps output. Refreshing is also still done and hidden in the background having no impact on the COGs. That bandwidth would enable 720p60 at 15/16bpp and still give two COGs some access for reads and writes.

    In the past I have had SDRAM mapped into a single COG's hub address range @80MHz with CL=2 and used the second SDRAM access opportunity per hub cycle for doing some hidden refresh but at that point the design had no video engine connected to make use of the actual SDRAM data being read. I tested it out some and found it worked on my DE-Nano and BeMicroMAX10 boards. I haven't tried the last idea out yet as all my boards to date have had SDRAM rated to 143MHz and I'd have to overclock to even attempt it so I wouldn't easily know how reliable the design was if there were failures. But I am still interested in trying it at some point on the new FleaFPGA Ohm board which has SDRAM fitted rated to up 166MHz. It would be rather nice to see a 720p bitmap image on an LCD display panel / HDTV from a Prop without using up lots of COGs.



  • RaymanRayman Posts: 8,417
    edited February 13 Vote Up0Vote Down
    Thanks Ariba, rogloh. Lots of food for thought...

    I wish there were an FPGA section in this forum...
    Prop Info and Apps: http://www.rayslogic.com/
  • roglohrogloh Posts: 597
    edited February 13 Vote Up0Vote Down
    Ariba wrote: »
    The SDRAM interface is a seperate peripheral, that should be shareable by several cogs. I remember Roger has already done some work for hub access of SDRAM. Random access will need to write the address for every access, I don't know how fast this is possible. The RAM can then be used for more than just video.


    Sharing the SDRAM amongst lots of COGs get tricky. Possibly with SDRAM at 80MHz one option is to give a single COG, which could be running some LMM "main" program in GCC for example, full prioritized access and then if it doesn't use its time slot the other 7 COGs round robin share it. It might be possible however to have some COGs blocked for an extended period which is not good.

    For the 160MHz clocked SDRAM you could go one better and have one dedicated COG getting full SDRAM access whenever it needs it and the remaining access slot shared by other 7 COGs so there is never any starvation. I suppose this is probably a reasonably good compromise and lets the main application see full bandwidth and the remaining 7 COGs just share the rest.


  • Rayman wrote: »
    Thanks Ariba, rogloh. Lots of food for thought...

    I wish there were an FPGA section in this forum...

    Same here. P1V stuff is now all over the place and hard to find. The dedicated P1V forum category got lost during that last forum migration. Not sure why it had to.
  • Out of interest, any idea what current was being drawn by the fleafpga when it was running?
  • Rayman wrote: »
    Thanks Ariba, rogloh. Lots of food for thought...

    I wish there were an FPGA section in this forum...

    Moderators, can you make this happen?
  • I think that is an excellent idea.

    Even better would be if there was an authoritative P1V github repository that was maintained. I imagine it would keep the basic P1V and all that is required to get it working it on the many different FPGA's and boards that people are using. With the various configurations of number of COGS and amount of RAM available.

    Perhaps not all the extensions and peripherals that seem to be in progress.


  • RaymanRayman Posts: 8,417
    edited February 19 Vote Up0Vote Down
    I noticed that Verilog looks a bit like Spin...
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman,

    You must be squinting really hard :)
Sign In or Register to comment.