FPGA based soft-CPU (distant relative of COG)
nutson
Posts: 242
For·experiments with my FPGA·board (DE-1,·Altera 2C20) I wanted·a soft-CPU that is·easy to program.·I have build·a·CPU that can·execute·a (very) limited set of·COG instruction codes,·so I can use·the propeller IDE to write·assembler programs. It has a 256x32 program memory, addresses >·$FF are reserved for I/O ports.·All·instructions·are 2 clocks: 1= fetch instruction,·write back previous result, 2= fetch data and·calculate.·The CPU· currently is less than 100 lines of Verilog code, the propstick control and test environment is another 50 lines, FPGA working environment is·Quartus 8.0SP1.
A·propstick SPIN·program downloads the soft_CPU·program, sets·the clock mode, and in manual clock·mode displays·internal CPU states·using PropTerminal (very usefull·program, Andy).·After ironing out·the·logic bugs this way, I ran·the thing at speed,·and have hit·80MHz (40 MIPS)·with·the simple testprogram shown already.·Far beyond expectation, there must be some·nasty timig bugs waiting for the me and the logic analyser.
Help needed:·Any experienced Verilog, Quartus users out there? I am a·a beginner with Quartus,·further steps·are beyond my current knowledge. Who wants to cooperate and help me now with:
-·optimizing·speed and reducing logic cell usage:
- extend the instruction set: I need·CALL/RTN, indexed or indirect adressing,·and some more:
- specify the I/O structure: ideally this would be a Wishbone interface,·opening·up access·to all the·www.opencores.org IP.
Creating a full specced COG is not my goal.·I want a simple controller to·perform I/O tasks,·example: an FPGA DDS synthesizer with the soft-CPU performing amplitude and frequency modulation.
Drop me a PM, tell me what you want, and I will send you the code (and some documentation if I can find the time to do that the next days)
Nico Hattink
PS There is one clear bug showing in the single step dug data: who is the first to see this?
A·propstick SPIN·program downloads the soft_CPU·program, sets·the clock mode, and in manual clock·mode displays·internal CPU states·using PropTerminal (very usefull·program, Andy).·After ironing out·the·logic bugs this way, I ran·the thing at speed,·and have hit·80MHz (40 MIPS)·with·the simple testprogram shown already.·Far beyond expectation, there must be some·nasty timig bugs waiting for the me and the logic analyser.
Help needed:·Any experienced Verilog, Quartus users out there? I am a·a beginner with Quartus,·further steps·are beyond my current knowledge. Who wants to cooperate and help me now with:
-·optimizing·speed and reducing logic cell usage:
- extend the instruction set: I need·CALL/RTN, indexed or indirect adressing,·and some more:
- specify the I/O structure: ideally this would be a Wishbone interface,·opening·up access·to all the·www.opencores.org IP.
Creating a full specced COG is not my goal.·I want a simple controller to·perform I/O tasks,·example: an FPGA DDS synthesizer with the soft-CPU performing amplitude and frequency modulation.
Drop me a PM, tell me what you want, and I will send you the code (and some documentation if I can find the time to do that the next days)
Nico Hattink
PS There is one clear bug showing in the single step dug data: who is the first to see this?
Comments
If my memory is correct, Chip is designing the Prop II using an Altera Stratix III to verify the logic.
Post Edited (Cluso99) : 11/7/2008 12:27:50 PM GMT
nutson
Great job, looks very promising...
Is the "clear error" that the "instruct" for the first 3 steps is wrong? After step 3, the PC matches the instruction, but step 0, 1 and 2 use a different value...
Hanno
Cluso99: I have a good 500MHz logic analyzer, and·absolutely need this speed to eventually track down timing problems or·gliches, or analyze the worst case timing path in this CPU that is already clocked at 80Mhz.·Quartus has many facilities for analyzing timing during simulation and·execution but I have not mastered all of these.
Nico
I started reading up on VHDL a while ago with a view to trying to create a COG as well. So far I can't justify the outlay for an FPGA board.
The idea was that if one creates a COG (or indeed the whole Prop) in VHDL then not only do you have a CPU for FPGA but that code can also be run under GHDL ghdl.free.fr/ on Windows or Linux and so gives you a cycle accurate simulator for free !
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I have had one reaction on my call for help, with the request to go for the full prop instruction set. The advantages would be: the soft_CPU potentially being able to execute SPIN programs, and drivers from OBEX. I am considering this, now that I have a basic CPU running, the goal seems less far away.
Nico Hattink
Of course if you have some instructions working already people would like to spur you on to complete the set. And then multiple COGs and the HUB and the and the IO and timers and ....
I will be watching your progress with great interest. I'm curious to find out what is the smallest cheapest FPGA a single COG will fit into. And then the HUB etc etc
Re: Running SPIN. The spin byte code interpreter has been publish in these forums so it seems quite doable. Hope Parallax does not mind to much.
I like you idea about the wishbone interconnect. Prop with USB anyone?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Nico Hattink
How have you used up 283 pins ?
Perhaps you could also add the multiply instruction without much ado, shame to waste all those free multipliers.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I started with FPGA's with this module http://www.elektor.com/magazines/2006/march/versatile-fpga-module.58036.lynkx mainly because it came with a 10 part course and design examples in VHDL. I lost interest when already in the third example an 8051 microprocessor was used as controller with (for me) incomprehensible C and assembly programs, and no toolchain for this software . When I browsed the Terasic design examples some time later, and found their Verilog code examples quite readable, I jumped onto Verilog. My feeling is that when you have done digital design on the gate and flip-flop level (7400 series), Verilog is more readable and understandable, you can sort of visualize in a timing diagram what happens with Verilog statements.
http://www.em.avnet.com/evk/home/0,1707,RID%3D0%26CID%3D46501%26CCD%3DUSA%26SID%3D32214%26DID%3DDF2%26LID%3D32232%26PRT%3D0%26PVW%3D%26BID%3DDF2%26CTP%3DEVK,00.html
I am finding Verilog much simpler than VHDL (I am only a beginner). Here is a good intro to Verilog http://www.asic-world.com/verilog/veritut.html
Chip is using an Altera Stratix III for modelling the PropII. Presumably also the 64 I/O Prop I update.
I have heard it said that Verilog is more popular amongst hardware types and VHDL for softies. As a softie VHDL looks good to me but the difficulty I found was that it is a very big language and that a lot of what you may naturally want to write cannot be synthesized into an actual device. Many VHDL books and online tutorials don't emphasize this much so one ends up somewhat overwhelmed and confused.
This was rectified by the discovery of "Circuit Design With VHDL" by Volnei Pedroni which concentrates on practical circuits, has lots of examples and is clear as a bell.
The next hurdle is VHDLs strict type checking which is easily sorted with a quick read through www.synthworks.com/papers/vhdl_math_tricks_mapld_2003.pdf
Another concise practical intro is ece.gmu.edu/courses/ECE545/viewgraphs_F04/loCarb_VHDL_small.pdf
I remember looking over that Elector series and thinking it was a bit unworkable. Just now I'm drawn by the boards used here www.fpga4fun.com/ and available here www.knjn.com/
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I'm eagerly awaiting Leon's Prop/CPLD board.
Does anyone happen to know where to find the Altera Cyclone serial programming protocol?
Lets say some mad guy wanted to hang a Cyclone off of his Propeller and get the Prop to configure it from an SD card file.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I ought to finish that off. I'm rather preoccupied with the XMOS chips, though.
BTW, an XMOS chip can probably emulate Propeller cogs in software faster than the real thing, and a lot faster than an FPGA.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
100mips into 20mips is 5 instructions
first instruction read instruction
2nd instruction and bits for jump table
3rd instruction jump
4th instruction and for source
5th instruction read source
nope, deffo can't emulate it faster than the real thing :P
Baggers
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
Post Edited (Leon) : 11/9/2008 5:13:18 PM GMT
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
Edit: Maybe not? www.xlinkers.org/forum/viewtopic.php?f=3&t=127&p=658&hilit=mips#p660
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
Sharing data back n forth will take too long, but if you still feel that strongly about it, go for it [noparse]:D[/noparse] and prove me wrong, I'd gladly eat humble pie.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
Next killer is the RAM. If you want to emulate the COG memory for multiple COGs and the HUB RAM, well it's just going to get stuck.
I can just see it now, running my 8080 emulator in an emulated Prop on an XS1 !
We had better quit this talk of that "other" company before we get our wrists slapped again[noparse]:)[/noparse]
Surely the FPGA implementation could get up to Prop speed. At least for a single COG.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
And why should an XMOS be faster than an FPGA? With an FPGA you can do all in parallel what has to be sequential decoded on a CPU.
There are a lot of 32bit CPU designs for FPGAs and they run with up to 100 MIPS and more.
Andy