I think this is all totally amazing. Actually managing to create all the tool steps needed to get Verilog into an FPGA. Reverse engineering the required bit streams and all. Incredible.
I agree. There's a lot that goes into an FPGA compilation flow. It has almost all the requirements of full-chip synthesis.
...Why would one need 16 cores of P2 when the functions you want to create can be done in Verilog on a super cheap FPGA?...
It's a lot easier to compose software for a microcontroller than worry about clock-by-clock logic states. I have to slow WAY down when writing Verilog code. Writing code for a processor lets you think at some macro scale, but defining hardware is quite tedious, I find.
I agree. There's a lot that goes into an FPGA compilation flow. It has almost all the requirements of full-chip synthesis.
I have not gotten into the Yosys thing much. But in one presentation it is stated that people have used it to get from Verilog to actual silicon. This is all over my head but sounds ground breaking.
It's a lot easier to compose software for a microcontroller than worry about clock-by-clock logic states. I have to slow WAY down when writing Verilog code.
Writing code for a processor lets you think at some macro scale, but defining hardware is quite tedious, I find.
I am sure that if one is using Verilog for a non-trivial design, like a Propeller 2, and one wants it to be optimal then that involves a lot of head scratching and concentration. Way above my skill level.
Up until a few weeks ago I would have wholeheartedly agreed with your statement. But then you got me curious about Verilog with that xoroshiro128+ PRNG code....
In no time at all I had that pumping out PRNG samples at 50MHz from A DE0 Nano. Turns out not to be so hard. Certainly easier than doing it in PASM on a P1 or P2. If it is even possible.
I can see a future coming where one does not get an Arduino or Propeller or whatever MCU and then sweat blood trying to get high speed and concurrent stuff done in software. Just get a $1 FPGA, stick a 32 bit core on there for running big C code and add the fast twiddly logic bits in Verilog.
That future is already here of course. Lattice sells billions of their FPGAs into mobile phones.
The world at large is not attune to this yet but given the rise open source tools, even usable from a Raspberry Pi, and super cheap FPGAs I can see 10 year old kids churning out logic designs from their bedrooms!
Yosys is not limited to SiliconBlue/Lattice iCE40 (Icestorm). Although it is the most known, and maybe the more complete implementation (all of them are experimental yet).
Yosys can map to ASIC (standard cell) and also other FPGAs (ice40, Xilinx-7, gowin, and greenpak)
From FAQ:
2. What synthesis targets are supported by Yosys?
Yosys is retargetable and adding support for additional targets is not very hard. At the moment, Yosys ships with support for ASIC synthesis (from liberty cell library files), iCE40 FPGAs, Xilinx 7-Series FPGAs, Silego GreenPAK4 devices, and Gowinsemi GW1N/GW2A FPGAs.
Note that in all this cases Yosys only performs synthesis. For a complete open source ASIC flow using Yosys see Qflow, for a complete open source iCE40 flow see Project IceStorm. Yosys Xilinx 7-Series synthesis output can be placed and routed with Xilinx Vivado.
I think this is all totally amazing. Actually managing to create all the tool steps needed to get Verilog into an FPGA. Reverse engineering the required bit streams and all. Incredible.
Yes, he should be some kind of genius. But not only from Verilog to bitstream, also to logic gates (standard cell ASIC).
A complete open source toolkit from Verilog into GDSII.
(unfortunately this is not completely polished yet. Or maybe it is, but it requires more skills than I currently have). Maybe you want one day to turn your own PRNG into a custom IC ... :-)
Er, that is Chip's PRNG code. I only wrapped around with something to get it ticking.
Oddly, on the occasions I have looked at Verilog code over the years I though it looked pretty horrible and not something I wanted to spend time on. But Chip's PRNG jumped out at me and looked like something fun.
Should I happen to win the lottery I'm off to the FAB to buy some chips!
GreenPAK parts are interesting, as they have very low fuse counts, less that SPLDs.
A GreenPAK5 for example, gets to a lofty 2040 bits, so I'm not sure how much 'synthesis' or P&R can occur here ?
Still, the fuses are documented, and evals are not price, so it could be a useful path proof of concept.
I'm keen to play with new iCE40UP5K–SG48, which shows one drawback of open-source.
Only Lattice tools currently support the iCE40UP5K.
Err, not being able to talk to a given chip, is actually a pretty fundamental drawback.
Yes, what has been done is impressive, but if you want a part not on their brief list, you are SOL.
I also have the source you linked above, but there seems no project or mapping files for Lattice flows - I guess they are interested only in their device subset, not someone using the core on a different iCE40 ?
I had expected Lattice builds, so users could reasonably compare tool flow results.
The IceStorm Toolchain supports many FPGA boards, not only "their platform". The cheapest is the IceStick from Lattice (about 25$). Here's a list: http://icoboard.org/boards-supported-by-icotc.html
If the board has an HX/LP-ICE40 FPGA and a FT2232 as programming interface it should work.
The ICE40-Ultra Family is not supported yet. The bitstream for them is not reverse engineered. Exept maybe for the UltraLite parts.
If you want to use the iCE40UP5K–SG48 parts, you always can do it with the free Lattice IceCube software.
Not long ago these simulation and synthesis tools would have cost you tens of thousands of dollars. Not to mention the cost of the Sun workstations or whatever required to run them.
The TL;DR of it is about a little HDL design competition 10 or 20 years ago at an HDL conference. The challenge was to come up with the fastest design in Verilog or VHDL for some simple logic function. In 90 minutes.
The problems they had with workstation setup, licence keys, crashing workstations, half hour long simulation/synthesis runs, etc, etc.
Half of the entrants, apparently knowledgeable and experience HDL guys failed to produce a working solution.
Looking at the problem now I think even me with my beginners experience could manage the problem. For free, on a tablet !
I've worked on chips where bugs were found through gate level simulation so it definitely can be worth it. I like the statement about always taking the time to understand why the bugs were missed by other methods - do some introspection, fix what was wrong and anything else that it caused you to think about, and rerun everything.
Some categories may not apply to the Prop2. e.g. I don't know if test logic is being inserted at the gate level or not.
Interesting stuff, although I don't much about what he is discussing. The difference between emulation and simulation etc.
My take away is that getting a big, fast, non-trivial chip, like a P2, to work is a lot more complicated that even the hugely complex that I might have expected already!
I'm thinking of it by analogy to writing a big program. You write it, test it, it checks out fine. Then it gets delivered via chain of being recompiled with a different compiler, to run on a different platform with perhaps slightly different libraries, with different optimisation settings, different #defines, etc. Then people wonder why it does not work!
The one time I was close to an actual chip design was working with a team building an ASIC for a military radio modem back in the late 1980's or so. I was amazed to see how they were testing their design. They had four or five big FPGA dev boards laid out on the bench, linked together with lots of wide ribbon cables and surrounded by logic analysers. My job was to make sure they designed in all the suitable interfaces so that our software could make it do what was expected. For example they neglected to provide a signal to wake up the MPU from sleep mode when a radio packet synch was detected. That would have required us to poll the thing, thus killing battery life and sinking the project!
Anyway, I'm guessing that a gate level simulation of a huge chip takes forever to run. I might have imagined that chip vendors provided IP blocks for flip-flops, latches, muxes, adders, multipliers, etc that were fully characterised and guaranteed to work. Such that going all the way to gate level sim is not required.
Some time ago, I tried to develop my own RISC V cores. An async model from the memory stand-point. I wanted a 32-bit version that fist in a ... Lattice MachXO2 , what I have at hand and love.
I got side-tracked recreating some old HP calculators (in FPGAs). I wanted to buy some iCE5LP4K–SG48 from mouser the other day, just to test their low power abilities. They had none. I got the 1k LUTs one, for testing will do... I didn't check digikey though... but they don't have it It would be great if the MachXO2 would be available in such a package, the ~ 4k LUT version seem to come in some kind of dual row QFN . What do they have against a nice 64 or 48 pin TQFP ? . Only the smaller members are available in such a package. Yes I know you mention the iCE40 UltraPlus and I speak about the Ultra only... the names are.... not helping. Anyways still not available
I don't know if the RISCV is the answer to the P3. The P1 core is in itself pretty small and is targeted for such a task, the RISC V otoh would need some "extensions" and has a different memory model: similar to that x chip everyone knows about: they also come in hard to solder and hobby unfriendly packages.
Let the P2 be and see what would be needed for a P3. I from my humble position see the P2 as a complicated (full of custom instructions) processor. At least there are not a thousand registers and peripherals to know.
Yeah, it's really annoying that FPGA's come in such hobbyist unfriendly packages. Even if you are up for the challenge of making a board and soldering some of those non BGA devices you then have the hassle of surrounding it with all the supply voltages it needs, the configuration memory and some serial/UART device.
That leaves getting the FPGA on a board of some kind. These are amazingly cheap now a days, compared to only a few years ago, but still it's and expense and you may or may not find a board that suits what you want to do.
That is all far removed form the wonderful ease of supplying power to a DIP Propeller, hooking a Prop Plug to it and away you go in no time.
As I said, I'm not seriously suggesting the RISC V and any kind of answer to the P2, however I can imagine some P3/P4/P5 with a Linux running RISC V surrounded by Propeller COGs as peripherals. All very fanciful of course.
Still, I have a worry that even cheap little FPGAs like those Lattice devices could be competition for the P2. They are big enough to put a 32 bit RISC V on for running big code, and have enough logic left over to make all those peripherals one may need in Verilog/VHDL. Which all stands up against writing C code for HUB exec in a P2 and fighting with PASM to get the peripherals made. What with the low cost of the FPGAs now and the entirely free and Open Source tool chains I can see people getting into that.
This is not going to happen anytime soon, people are not aware of the option, the tools and such are all quite new, it's a chore to install them all, a lot to learn. It would take an Arduino like effort to package everything up into a quick and easy to use dev system (a board and software) and provide friendly documentation etc.
...At least there are not a thousand registers and peripherals to know.
As I was groping around trying to make a UART as one of my first Verilog experiments I thought exactly that. Making a super simple UART in Verilog is easier than wading through hundreds of pages of manual for some MCU trying to find how to set up it's thousands of registers. Not to mention getting the interrupt system working and so one. On top of that the end result is portable to any FPGA device. And you can point your peripheral at almost any I/O pin on the chip. How easy could it be!?
When you want to do something a bit weird the the MCU does not help. Enter the Propeller. Or as I hinted above an little FPGA. So it's PASM on the P2 vs Verilog/VHDL.
Heater - totally get where you're coming from. For me making a UART, I2C master/slave, or SPI master/slave in Verilog is pretty straightforward versus making one in PASM. And they are quite small versus a COG. But traditionally the turn times have been a real pain. I knew about those Lattice tools and boards before, but now you're tempting me again with this RISC V stuff!
But traditionally the turn times have been a real pain.
Yes, that is the thing. Working in Verilog with tools like Quartus is a real pain. They are huge complicated beasts. The editor is pretty crappy. The edit/compile/load/test turn around time is depressingly slow. I have no idea if Xilinx or Lattice tools are any better. This would put off many dabblers off immediately. Including myself.
My recent discovery is that with the icarus verilog simulator you can hack verilog code, compile and run it with the same kind of interactive turn around times as hacking on a Python program or Javascript under node.js. Makes it fun.
There is the hassle of making test benches for every module so that you can see how it behaves. But they need not be big and complex. Creating unit tests for normal software is a good idea as well.
Then, when it looks like your logic is about right it's time to load it to your Quartus project, hit the build button and go and make some tea!
This may never be as quick and easy as hacking regular software on an MCU, but it might be easier than learning PASM for the P2. Provided that is, the core like the RISC V is already setup and configured for your FPGA so that all you have to worry about is the custom logic. We are not there yet...
Oh, did I mention, the Atom editor does very good Verilog syntax highlighting with it's language-verilog package. And even points out syntax errors as you type with linter-verilog. So much better than the Quartus editor.
Comments
I agree. There's a lot that goes into an FPGA compilation flow. It has almost all the requirements of full-chip synthesis.
It's a lot easier to compose software for a microcontroller than worry about clock-by-clock logic states. I have to slow WAY down when writing Verilog code. Writing code for a processor lets you think at some macro scale, but defining hardware is quite tedious, I find.
Up until a few weeks ago I would have wholeheartedly agreed with your statement. But then you got me curious about Verilog with that xoroshiro128+ PRNG code....
In no time at all I had that pumping out PRNG samples at 50MHz from A DE0 Nano. Turns out not to be so hard. Certainly easier than doing it in PASM on a P1 or P2. If it is even possible.
I can see a future coming where one does not get an Arduino or Propeller or whatever MCU and then sweat blood trying to get high speed and concurrent stuff done in software. Just get a $1 FPGA, stick a 32 bit core on there for running big C code and add the fast twiddly logic bits in Verilog.
That future is already here of course. Lattice sells billions of their FPGAs into mobile phones.
The world at large is not attune to this yet but given the rise open source tools, even usable from a Raspberry Pi, and super cheap FPGAs I can see 10 year old kids churning out logic designs from their bedrooms!
Yosys is not limited to SiliconBlue/Lattice iCE40 (Icestorm). Although it is the most known, and maybe the more complete implementation (all of them are experimental yet).
Yosys can map to ASIC (standard cell) and also other FPGAs (ice40, Xilinx-7, gowin, and greenpak)
From FAQ:
As you can see in the documentation:
Yes, he should be some kind of genius. But not only from Verilog to bitstream, also to logic gates (standard cell ASIC).
You are welcome ! You were so motivated and learned/implemented the PRNG verilog so fast that I thought you would liked it too.
Wait ....
there is more. .......
http://opencircuitdesign.com/qflow/index.html
A complete open source toolkit from Verilog into GDSII.
(unfortunately this is not completely polished yet. Or maybe it is, but it requires more skills than I currently have). Maybe you want one day to turn your own PRNG into a custom IC ... :-)
Oddly, on the occasions I have looked at Verilog code over the years I though it looked pretty horrible and not something I wanted to spend time on. But Chip's PRNG jumped out at me and looked like something fun.
Should I happen to win the lottery I'm off to the FAB to buy some chips!
A GreenPAK5 for example, gets to a lofty 2040 bits, so I'm not sure how much 'synthesis' or P&R can occur here ?
Still, the fuses are documented, and evals are not price, so it could be a useful path proof of concept.
I'm keen to play with new iCE40UP5K–SG48, which shows one drawback of open-source.
Only Lattice tools currently support the iCE40UP5K.
The drawback is that FPGA vendors don't make documentation public for their platforms so that tools can be developed for them.
People like Clifford Wolf have done an amazing job in reverse engineering these things.
Yes, what has been done is impressive, but if you want a part not on their brief list, you are SOL.
I also have the source you linked above, but there seems no project or mapping files for Lattice flows - I guess they are interested only in their device subset, not someone using the core on a different iCE40 ?
I had expected Lattice builds, so users could reasonably compare tool flow results.
My only quibble is that it is not the fault of "open source", it is the the fault of the FPGA vendors.
Anyway, think I'm going to try and get hold of a Lattice board that is supported. Just to try this. Sounds too amazing to miss out on.
https://github.com/cliffordwolf/picorv32/tree/master/scripts/yosys-cmp
The IceStorm Toolchain supports many FPGA boards, not only "their platform". The cheapest is the IceStick from Lattice (about 25$). Here's a list: http://icoboard.org/boards-supported-by-icotc.html
If the board has an HX/LP-ICE40 FPGA and a FT2232 as programming interface it should work.
The ICE40-Ultra Family is not supported yet. The bitstream for them is not reverse engineered. Exept maybe for the UltraLite parts.
If you want to use the iCE40UP5K–SG48 parts, you always can do it with the free Lattice IceCube software.
https://mystorm.uk/we-forecast-blackice-this-winter-2/
Not clear where to order that ?
some mention of dates here
https://hackaday.io/project/12930-mystorm-the-30-open-hardware-fpga-dev-board
"I should add that we are offering a fully specified System Design Kit - available for $49 plus shipping from early December.
Early next year we will produce a lower spec version which will sell for $30 plus shipping
In order to keep shipping costs down, we hope to bulk ship from Shenzhen, China, and use local territorial distributors.
We will be announcing discounts for bulk purchase (10+) - for schools, colleges, makerspaces - to be confirmed in the New Year. "
more chatter here..
https://www.rs-online.com/designspark/mystorm-creating-an-open-source-fpga-development-platform
http://www.mouser.com/Lattice/Semiconductors/Programmable-Logic-ICs/FPGA-Field-Programmable-Gate-Array/_/N-3oh9p?P=1yy0b3gZ1yvofwzZ1ytf7b0Z1yy0pmdZ1yx89y9Z1z0zl4qZ1z0yqx7Z1z0y0bdZ1yzng2cZ1z0ypqiZ1z0y2q6&Ns=Pricing|0
What could you do with a $1.30 LP384?, with 384 luts and no ram, I guess some realtime io wrangling?
Could you fit a hdmi tx in there?
http://www.latticestore.com/products/tabid/417/categoryid/59/productid/6117/searchid/1/searchvalue/ice40hx8k-b-evn/default.aspx
That or the icoboard https://shop.trenz-electronic.de/de/Produkte/OnSite-Broadcast/ but that starts to get expensive at a 100 euro odd.
I'll see who else is interested down here
*deep_sigh!*
At least for those Lattice devices.
I think this is all mind bendingly amazing.
Not long ago these simulation and synthesis tools would have cost you tens of thousands of dollars. Not to mention the cost of the Sun workstations or whatever required to run them.
Anyone interested in HDL and or the history of such things might be interested/amused by this article: http://athena.ecs.csus.edu/~changw/class_docs/VerilogManual/cooley.html
The TL;DR of it is about a little HDL design competition 10 or 20 years ago at an HDL conference. The challenge was to come up with the fastest design in Verilog or VHDL for some simple logic function. In 90 minutes.
The problems they had with workstation setup, licence keys, crashing workstations, half hour long simulation/synthesis runs, etc, etc.
Half of the entrants, apparently knowledgeable and experience HDL guys failed to produce a working solution.
Looking at the problem now I think even me with my beginners experience could manage the problem. For free, on a tablet !
http://www.deepchip.com/items/0569-01.html
Obviously this is for ASIC designs, and you can ignore it.
"Like real life, it's impossible to create accurate engineering design schedules."
I thought it is just me stumbling over that all the time...
Enjoy!
Mike
Thanks for posting that, KeithE. I'll read that again as we get ready for synthesis.
No problem. He explains the motivations pretty clearly, so you can see what might apply in your case. It is a pain!
Some categories may not apply to the Prop2. e.g. I don't know if test logic is being inserted at the gate level or not.
Thanks for the link to deepchip.
Interesting stuff, although I don't much about what he is discussing. The difference between emulation and simulation etc.
My take away is that getting a big, fast, non-trivial chip, like a P2, to work is a lot more complicated that even the hugely complex that I might have expected already!
I'm thinking of it by analogy to writing a big program. You write it, test it, it checks out fine. Then it gets delivered via chain of being recompiled with a different compiler, to run on a different platform with perhaps slightly different libraries, with different optimisation settings, different #defines, etc. Then people wonder why it does not work!
The one time I was close to an actual chip design was working with a team building an ASIC for a military radio modem back in the late 1980's or so. I was amazed to see how they were testing their design. They had four or five big FPGA dev boards laid out on the bench, linked together with lots of wide ribbon cables and surrounded by logic analysers. My job was to make sure they designed in all the suitable interfaces so that our software could make it do what was expected. For example they neglected to provide a signal to wake up the MPU from sleep mode when a radio packet synch was detected. That would have required us to poll the thing, thus killing battery life and sinking the project!
Anyway, I'm guessing that a gate level simulation of a huge chip takes forever to run. I might have imagined that chip vendors provided IP blocks for flip-flops, latches, muxes, adders, multipliers, etc that were fully characterised and guaranteed to work. Such that going all the way to gate level sim is not required.
I got side-tracked recreating some old HP calculators (in FPGAs). I wanted to buy some iCE5LP4K–SG48 from mouser the other day, just to test their low power abilities. They had none. I got the 1k LUTs one, for testing will do... I didn't check digikey though... but they don't have it It would be great if the MachXO2 would be available in such a package, the ~ 4k LUT version seem to come in some kind of dual row QFN . What do they have against a nice 64 or 48 pin TQFP ? . Only the smaller members are available in such a package. Yes I know you mention the iCE40 UltraPlus and I speak about the Ultra only... the names are.... not helping. Anyways still not available
I don't know if the RISCV is the answer to the P3. The P1 core is in itself pretty small and is targeted for such a task, the RISC V otoh would need some "extensions" and has a different memory model: similar to that x chip everyone knows about: they also come in hard to solder and hobby unfriendly packages.
Let the P2 be and see what would be needed for a P3. I from my humble position see the P2 as a complicated (full of custom instructions) processor. At least there are not a thousand registers and peripherals to know.
Yeah, it's really annoying that FPGA's come in such hobbyist unfriendly packages. Even if you are up for the challenge of making a board and soldering some of those non BGA devices you then have the hassle of surrounding it with all the supply voltages it needs, the configuration memory and some serial/UART device.
That leaves getting the FPGA on a board of some kind. These are amazingly cheap now a days, compared to only a few years ago, but still it's and expense and you may or may not find a board that suits what you want to do.
That is all far removed form the wonderful ease of supplying power to a DIP Propeller, hooking a Prop Plug to it and away you go in no time.
As I said, I'm not seriously suggesting the RISC V and any kind of answer to the P2, however I can imagine some P3/P4/P5 with a Linux running RISC V surrounded by Propeller COGs as peripherals. All very fanciful of course.
Still, I have a worry that even cheap little FPGAs like those Lattice devices could be competition for the P2. They are big enough to put a 32 bit RISC V on for running big code, and have enough logic left over to make all those peripherals one may need in Verilog/VHDL. Which all stands up against writing C code for HUB exec in a P2 and fighting with PASM to get the peripherals made. What with the low cost of the FPGAs now and the entirely free and Open Source tool chains I can see people getting into that.
This is not going to happen anytime soon, people are not aware of the option, the tools and such are all quite new, it's a chore to install them all, a lot to learn. It would take an Arduino like effort to package everything up into a quick and easy to use dev system (a board and software) and provide friendly documentation etc.
When you want to do something a bit weird the the MCU does not help. Enter the Propeller. Or as I hinted above an little FPGA. So it's PASM on the P2 vs Verilog/VHDL.
My recent discovery is that with the icarus verilog simulator you can hack verilog code, compile and run it with the same kind of interactive turn around times as hacking on a Python program or Javascript under node.js. Makes it fun.
There is the hassle of making test benches for every module so that you can see how it behaves. But they need not be big and complex. Creating unit tests for normal software is a good idea as well.
Then, when it looks like your logic is about right it's time to load it to your Quartus project, hit the build button and go and make some tea!
This may never be as quick and easy as hacking regular software on an MCU, but it might be easier than learning PASM for the P2. Provided that is, the core like the RISC V is already setup and configured for your FPGA so that all you have to worry about is the custom logic. We are not there yet...