Shop OBEX P1 Docs P2 Docs Learn Events
BeMicro CV FPGA Board for P2 ? - Page 3 — Parallax Forums

BeMicro CV FPGA Board for P2 ?

13

Comments

  • GuyLemieuxGuyLemieux Posts: 4
    edited 2013-10-23 22:15
    David, that is very interesting. My company VectorBlox (www.vectorblox.com) designs a massively parallel accelerator for FPGA-based processors -- we work with Nios, MicroBlaze and ARM as host CPUs. However, we are looking for a new processor that we can use as a host that would be higher performance, and portable to any FPGA device.

    The NUON you mention might be a good candidate for us to work on. The Verilog source that you have is probably intended for an ASIC implementation, and a lot of rework is necessary to get it to run successfully at full speed on an FPGA. This is not a trivial task, but if it shows promise its something my company might be able to take on.

    Please get in touch with me so we can discuss details.

    Thanks -- Guy (guy.lemieux@gmail.com)
    David Betz wrote: »
    I have the Verilog code for a processor we designed at VM Labs back when Eric Smith and I worked there. The IP is now owned by STMicroelectronics but they have given permission to release the code under the MIT license. I haven't even tried to compile it yet but it would be fun to get it working on an FPGA. Also, this was used in an early multi-core chip that had four of these processors on it. Even better, there is a port of GCC to it done by Eric Smith and Ken Rose! If anyone here would like to help bring this up on an FPGA please let me know.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-10-24 09:13
    GuyLemieux wrote: »
    David, that is very interesting. My company VectorBlox (www.vectorblox.com) designs a massively parallel accelerator for FPGA-based processors -- we work with Nios, MicroBlaze and ARM as host CPUs. However, we are looking for a new processor that we can use as a host that would be higher performance, and portable to any FPGA device.

    The NUON you mention might be a good candidate for us to work on. The Verilog source that you have is probably intended for an ASIC implementation, and a lot of rework is necessary to get it to run successfully at full speed on an FPGA. This is not a trivial task, but if it shows promise its something my company might be able to take on.

    Please get in touch with me so we can discuss details.

    Thanks -- Guy (guy.lemieux@gmail.com)
    Sounds great! I sent you email with the NUON architecture document attached. Let me know if you want to help with this project. Thanks!
  • GuyLemieuxGuyLemieux Posts: 4
    edited 2013-11-26 17:11
    For everyone who bought a CV board ... I have created a Nios Ii/f design that uses the DDR3. I've attached it to this forum.

    It includes an SD controller from the Altera University Program -- you may have to install that package from Altera if you want to compile the qsys system exactly as given. Otherwise, delete the SD controller and compile.

    The Nios II/f design is here bemicro_cv_nios_ddr3-3-niosIIf.zip.
    (note: file updated with new software)

    Thanks,
    Guy
  • jmgjmg Posts: 15,175
    edited 2013-11-26 17:23
    How much of the FPGA does that use, and what speed does it run at ? (from On chip-memory, and via DDR3 )

    - and how long does a build of NIOS II/f take ?
  • TubularTubular Posts: 4,705
    edited 2013-11-26 19:35
    To my disappointment it turns out the BeMicro-CV board is not quite physically pin compatible with the DE0-Nano. The twin headers on the CV are 0.1" closer than the DE0-Nano. I think someone made an error and ran with it anyway.

    Note it may be possible to fix this with some short ribbon cable and IDC connectors.
    1024 x 768 - 98K
    1024 x 768 - 122K
    1024 x 1365 - 143K
  • GuyLemieuxGuyLemieux Posts: 4
    edited 2013-11-26 21:30
    The build includes 64KB on-chip SRAM and takes about 10 minutes with Quartus set to extra effort (speed). I included a small memory-test program that is stored in this SRAM at powerup/configtime (saved as onchip_mem.hex; source code in software directory).

    Nios II/f is used with hw mult/div and 8kb+8kb I+D caches.

    The entire Nios system is 4019 ALMs (43% of 9430 on the chip) and 800kb of 1800kb on-chip memory (44%). The cpu can run 80 - 100MHz, depending upon the build quality, so I set it to 80MHz. The clock speed of Nios is independent of SRAM/DDR3 usage.

    I tried using a Nios II/e instead of II/f, but for some reason my software hangs. I gave up trying to figure it out and am just releasing the II/f version.

    It took me a long time to get everything going, especially the I/O pin configurations for DRAM. Shame on Arrow for not including this qsys file + qsf settings for others to build from. Many thanks to Tommy Thorn's flashing LED example which got me started. (https://github.com/tommythorn/BeMicro-CV)
  • GuyLemieuxGuyLemieux Posts: 4
    edited 2013-11-26 21:40
    WARNING: only use J1 with Terasic devices.

    The I/O pinout of the lower header (J4) is *not* compatible with Terasic pinouts. Even power and gnd connections are different.

    You can clearly see that J1 and J4 are different if you check the BeMicro CV schematics:
    http://components-asiapac.arrow.com/file_system/intranet/MAR/ADRE/File/Hardware_Reference_Guide_for_BeMicro_CV_A2_v1.04.pdf

    Guy
  • AleAle Posts: 2,363
    edited 2013-12-03 04:36
    It seems that the board is out of stock :(

    Does someone has one that he/she doesn't use and what to sell/exchange ?

    Thanks !
  • jmgjmg Posts: 15,175
    edited 2013-12-17 13:52
    .. and another possible FPGA for development... this one from the fringes, but likely to be FAST

    http://www10.edacafe.com/nbc/articles/1/1243098/Achronix-Demonstrates-Embedded-100G-Ethernet-Speedster22i-22nm-FPGAs

    ["Speedster22i FPGAs are built on Intel's advanced 22nm, 3-D Tri-Gate transistor technology. Speedster22i HD devices consume half the power and are half the cost of competitive high-end FPGAs for targeted high bandwidth applications. The 100G Ethernet was demonstrated on the HD1000 device, which has 1 million effective programmable look-up-tables (LUTs). The HD680 is the second member of the Speedster22i family and will begin shipping in Q1 2014."]

    Has anyone seen a price point on these parts ?
  • cgraceycgracey Posts: 14,206
    edited 2013-12-17 14:55
    jmg wrote: »
    .. and another possible FPGA for development... this one from the fringes, but likely to be FAST

    http://www10.edacafe.com/nbc/articles/1/1243098/Achronix-Demonstrates-Embedded-100G-Ethernet-Speedster22i-22nm-FPGAs

    ["Speedster22i FPGAs are built on Intel's advanced 22nm, 3-D Tri-Gate transistor technology. Speedster22i HD devices consume half the power and are half the cost of competitive high-end FPGAs for targeted high bandwidth applications. The 100G Ethernet was demonstrated on the HD1000 device, which has 1 million effective programmable look-up-tables (LUTs). The HD680 is the second member of the Speedster22i family and will begin shipping in Q1 2014."]

    Has anyone seen a price point on these parts ?


    I called the number on the press release and left a message. This is the same technology that Altera is using for its future Stratix 10 devices, for which their slogan is "Delivering the Unimaginable". Sounds really compelling, doesn't it? They're talking about compiled logic designs running at 1GHz+. I would suspect that these Achronix devices, being half as expensive as the competition, might still mean that they cost $5,000 per chip. I wonder what they do for compilation. Altera has been honing its tools for decades, and these guys seemed to have just started. It's hard to imagine there are not some Altera employees in there.
  • cgraceycgracey Posts: 14,206
    edited 2013-12-17 15:23
    How much the Achronix development board costs was a bit of a mystery, but I found something:

    Achronix is maknig a Speedster22i HD1000 development board available for purchase separately, at a price of $13,000. The development kit is a PCI-express form-factor card, which includes the ACE software, a programming pod and power supply. Designers can use the kit to develop and prototype applications with the 100G Ethernet, Interlaken, PCI Express, DDR3 and SerDes functions in the HD1000. Achronix will also supply a number of reference designs for evaluation of the hardened IP interface protocols.


    http://dsp-fpga.com/1447-achronix-begins-shipping-first-22nm-finfet-fpgas/

    I guess I won't be buying one anytime soon.
  • jmgjmg Posts: 15,175
    edited 2013-12-17 15:44
    cgracey wrote: »
    I guess I won't be buying one anytime soon.

    Hehe no. still it does put an upper price on the Chips ;)
    They have another smaller variant coming, so it could be worth tracking thru 2014 for price/performance.

    Either way, 100G Ethernet is impressive.
  • kwinnkwinn Posts: 8,697
    edited 2013-12-17 22:30
    Trying to keep up with the P2 threads is mind boggling. There have been so many improvements over such a short time that posts less than two months old seem positively ancient. Great work. Keep it up. I know some of it will not make it into P2, but the ideas are a great starting point for the P3.
  • cgraceycgracey Posts: 14,206
    edited 2013-12-19 03:16
    I've had some dialog with the Achronix guys about their 22nm FinFET FPGA's.

    I told them we'd like to make a board with their FPGA on it and have our customers be able to use it, on their own, with their ACE software tool which provides compiler and fitter functions. We'll see what they say.

    It turns out they have a 200k LE version coming out soon which would likely support 8 cogs. I'm getting the feeling that the device might be available for around $700, which is a way better deal than Altera's fastest parts are. This Achronix FPGA could run Prop2 at probably 300MHz.

    I think a hang-up for them might be giving their ACE tool away, which currently comes with a one-year license when you buy their $13,000 development board.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-19 06:15
    Sounds interesting...

    Suggest to them that they can limit their tool to 200k LE's for this purpose.

    Did you look at

    http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=830 <---- should fit 3 cogs, $179

    http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=816 <--- should fit 5-6 cogs, $299, may not even need add-on board as it has 24 bit VGA that could be mapped to P2 DAC's in verilog

    Both have uSD, audio etc. and while they are not SDRAM based, some verilog code could hide the difference.

    Many other people suggested these boards before, they would lower the cost barrier of entry for multi-cog Px emulation - allowing more people to participate.
    cgracey wrote: »
    I've had some dialog with the Achronix guys about their 22nm FinFET FPGA's.

    I told them we'd like to make a board with their FPGA on it and have our customers be able to use it, on their own, with their ACE software tool which provides compiler and fitter functions. We'll see what they say.

    It turns out they have a 200k LE version coming out soon which would likely support 8 cogs. I'm getting the feeling that the device might be available for around $700, which is a way better deal than Altera's fastest parts are. This Achronix FPGA could run Prop2 at probably 300MHz.

    I think a hang-up for them might be giving their ACE tool away, which currently comes with a one-year license when you buy their $13,000 development board.
  • jmgjmg Posts: 15,175
    edited 2013-12-19 11:05
    cgracey wrote: »
    I've had some dialog with the Achronix guys about their 22nm FinFET FPGA's.

    I told them we'd like to make a board with their FPGA on it and have our customers be able to use it, on their own, with their ACE software tool which provides compiler and fitter functions. We'll see what they say.

    It turns out they have a 200k LE version coming out soon which would likely support 8 cogs. I'm getting the feeling that the device might be available for around $700, which is a way better deal than Altera's fastest parts are. This Achronix FPGA could run Prop2 at probably 300MHz.

    I think a hang-up for them might be giving their ACE tool away, which currently comes with a one-year license when you buy their $13,000 development board.

    Their Programming tool they should be able to give sway, but the synthesis tools likely involve other parties, and so have significant external license costs.

    You should need just one license there, and send binaries to users to download the P2. (as now )

    Full speed P2 would have serious appeal for emulation, but over-speed might need a special build & release, as it would spawn code that might not run on P2 silicon. There could be Education and R&D users who like this.

    Is there a Silicon revision word in the P2 somewhere ?
  • cgraceycgracey Posts: 14,206
    edited 2013-12-19 11:18
    jmg wrote: »
    Their Programming tool they should be able to give sway, but the synthesis tools likely involve other parties, and so have significant external license costs.

    You should need just one license there, and send binaries to users to download the P2. (as now )

    Full speed P2 would have serious appeal for emulation, but over-speed might need a special build & release, as it would spawn code that might not run on P2 silicon. There could be Education and R&D users who like this.

    Is there a Silicon revision word in the P2 somewhere ?


    Your point about external license costs make a lot of sense. Those EDA tool elements could be costly and may impede Achronix from giving their tool away. I don't know, but would suspect this could be the case. I was thinking it would be neat for customers to be able to rework the RTL, starting with Prop1.

    We looked into getting a complete RTL-to-silicon setup a few months ago for doing chip design and it was $320k/year, even with a 50% discount!

    In the Prop2's ROM, you can see the version in the first few bytes. This could be iterated to express silicon changes.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-19 11:25
    cgracey wrote: »
    I was thinking it would be neat for customers to be able to rework the RTL, starting with Prop1.
    Nice! I didn't know there was RTL for P1.
  • cgraceycgracey Posts: 14,206
    edited 2013-12-19 11:28
    David Betz wrote: »
    Nice! I didn't know there was RTL for P1.


    It's written in AHDL, but it would only take a few days to redo it into Verilog. If you guys could see that code, you'd go, "Pffff... That's it?!?" It was designed with hand layout in mind. Prop2 is much more complex and relies on automation to be built.
  • jmgjmg Posts: 15,175
    edited 2013-12-19 11:31
    cgracey wrote: »
    Your point about external license costs make a lot of sense. Those EDA tool elements could be costly and may impede Achronix from giving their tool away. I don't know, but would suspect this could be the case. I was thinking it would be neat for customers to be able to rework the RTL, starting with Prop1.

    Prop 3?
    Yes, would be good, but that can come later - you should be able to negotiate a loan of a License as soon as they have the 200k device included. With luck, that will be right about when P2 is in tapeout.
    cgracey wrote: »
    In the Prop2's ROM, you can see the version in the first few bytes. This could be iterated to express silicon changes.

    You could reserve a field for MHz tag, or similar which would allow Silicon, and various FPGA builds to check they have the required headroom at runtime.
    Because Silicon MHz might be hard to predict, as MHz, that could be a die-rev number.
    Values < 20 would be considered Die-Numbers, and > 20 would be MHz for targeted FPGA ?
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-19 11:33
    cgracey wrote: »
    It's written in AHDL, but it would only take a few days to redo it into Verilog. If you guys could see that code, you'd go, "Pffff... That's it?!?" It was designed with hand layout in mind. Prop2 is much more complex and relies on automation to be built.
    P1 sounds simple and elegant. I'd love to see the code!
  • Ken GraceyKen Gracey Posts: 7,395
    edited 2013-12-19 11:44
    David Betz wrote: »
    P1 sounds simple and elegant. I'd love to see the code!

    There's a place and time for an open P1 AHDL release, and it's not right now. While we could just plop the source on the forums and have it go unnoticed except by a few people, we could also release it in due course with an FPGA system en route to doing the same with P2 in the future. Once it's here, Chip has to answer questions about it, too. This is not the right time.

    All of these tasks take "just another couple of days, that's all. Maybe even a few hours." Anybody know how this kind of request for time adds up?

    The continual process of improvement and redesign is a costly one. In the last several weeks I have set up distribution with Terasic, started to scope out our own FPGA board around Altera Cyclone V-7A and now there's discussion about yet another FPGA with unknown tools. Inability to hold the course is a sure path to nowhere.

    Ken Gracey
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-19 11:47
    Ken Gracey wrote: »
    There's a place and time for an open P1 AHDL release, and it's not right now. While we could just plop the source on the forums and have it go unnoticed except by a few people, we could also release it in due course with an FPGA system en route to doing the same with P2 in the future. Once it's here, Chip has to answer questions about it, too. This is not the right time.

    All of these tasks take "just another couple of days, that's all. Maybe even a few hours." Anybody know how this kind of request for time adds up?

    The continual process of improvement and redesign is a costly one. In the last several weeks I have set up distribution with Terasic, started to scope out our own FPGA board around Altera Cyclone V-7A and now there's discussion about yet another FPGA with unknown tools. Inability to hold the course is a sure path to nowhere.

    Ken Gracey
    I agree completely. This is not the time to be distracted. I guess I should have said "I will look forward to seeing the code when the time is right to release it." Sorry!
  • Ken GraceyKen Gracey Posts: 7,395
    edited 2013-12-19 11:51
    David Betz wrote: »
    I agree completely. This is not the time to be distracted. I guess I should have said "I will look forward to seeing the code when the time is right to release it." Sorry!

    No, by all means please write what you want. If you want a train for Christmas, don't ask for it next year!

    I'm not trying to modify anybody's behavior around the subject, only to make a point that all of our requests contribute to an ongoing R&D process.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-19 11:53
    Ken Gracey wrote: »
    No, by all means please write what you want. If you want a train for Christmas, don't ask for it next year!

    I'm not trying to modify anybody's behavior around the subject, only to make a point that all of our requests contribute to an ongoing R&D process.

    Edit: Christmas list deleted.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-19 13:03
    Understood Ken.

    I too am eager, however shipping silicon to kick all this off is the priority. No worries here.
  • jmgjmg Posts: 15,175
    edited 2013-12-19 13:25
    Ken Gracey wrote: »
    In the last several weeks I have set up distribution with Terasic, started to scope out our own FPGA board around Altera Cyclone V-7A and now there's discussion about yet another FPGA with unknown tools.

    The Achronix FPGA is somewhat 'over the horizon' stuff, but certainly worth tracking.
    You could take that over from Chip ?

    Given you have IP that is proven to work on Altera Parts, I'd think Achronix would be eager to get benchmarks in this area of performance.
    They can hardly quote NIOS numbers, can they ;)

    Check with Achronix, they may offer a BGA484 device, would be a smart way to ramp design-in !!.


    Meanwhile, Cyclone V sounds like the ideal 'live product' target, and there are also real Cyclone V boards out there to run-up, as you define your own Board.

    A Cyclone V board, with option for multiple SDRAM would allow maximal memory systems to run native code, with no patches.
    Can a single BGA484 PCB design cover A2 / A7 / A9 ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-19 13:34
    Ken,
    Agreed. And before any release of source is made, for P1 and/or P2 (even is reasonable sections), I think a full and proper internal and external discussion needs to happen. Now is not the time for this.
    Obviously, I would love to see the P2 verilog code, as would others, but I would really hate to see Parallax expose their IP as I am sure some unscrupulous player(s) would serve back up against the interests of Parallax. I have been in the game long enough to see what big companies do to little companies. (eg the compression in MSDOS6 comes to mind, the ergonomic keyboard, "the job ain't done 'till Lotus won't run", and many others).

    As for FPGA sw, the only two companies that could potentially give away their dev sw is Altera and Xilinx. Since the in-house experience is with Altera, the Cyclone V family makes sense to me. I have preference for Xilinx, but it's only an old choice that remains. But you could get quotes from Xilinx Spartan 6 equivalents as a cross-reference.

    BTW For your FPGA board, IMHO you should consider using the Cyclone V 5CEBA 4/5/7/9 F23C8N F484. These are pin-compatible (with careful design) 48K/76K/149K/301K LE's (Mouser/DigiKey ~$53/88/159/209 qty 1). Possibly you would build just 2 variants, the 4 & 7 (48K & 149K LE's).
  • jmgjmg Posts: 15,175
    edited 2013-12-19 14:21
    Cluso99 wrote: »
    BTW For your FPGA board, IMHO you should consider using the Cyclone V 5CEBA 4/5/7/9 F23C8N F484. These are pin-compatible (with careful design) 48K/76K/149K/301K LE's (Mouser/DigiKey ~$53/88/159/209 qty 1). Possibly you would build just 2 variants, the 4 & 7 (48K & 149K LE's).

    Pin compatible build options makes clear sense. There is a BGA484 A2, is that not pin compatible ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-19 17:06
    The A2 is too small ~25K LE's. We are at that limit for 1 cog now.
Sign In or Register to comment.