i8080 Verilog implementations and on how to design a microprocessor in a HDL
Ale
Posts: 2,363
I keep an eye on that nice site zeptobars, and recently they published a Verilog implementation (I'd say almost gate for gate) of the KR580VM80 (which was also recently revers-engineered). Of course I had to give it a try. Using my beloved Diamond and targeting for the MachXO2:
That is not that much.
For the ones that don't know or forgot, the 8080 needs two non-overlapping clocks:
Of course that poses some problems because in an fpga one shouldn't gate the clock and two clock domains bring headaches . They used two clock-enables
and one clock to solve the problem. Very nice.
But I just wanted to point out how tiny that implementation is !, It is a bit slow too , getting 22 MHz on the slowest MachXO2 device (-1).
Another implementation (M8080), taken from opencores needs double so many luts, but can be clocked a bit higher (26 MHz) and has a simpler IO bus.
But what I wanted to talk about was how different most processor implementations are from the gate model (RTL, I think).
Why do I go here ?
The thing is I developed a 6809 in Verilog here . It is clunky, full of muxes, and tricks, and several kinds of signals... The question is... how should one proceed when designing logic ? Let's see the hottest item in town... Chip's Cog. It works ? is small ? and fast ? and a beauty ? and a work of art ? yes, yes, yes, yes, yes. If I compare it to what I wrote (also works, it is fast, not a beauty and also a work of art) I see that I am a toddler, or didn't spend as much time as I should have (what I really also do not have . But.. is it a behavioral model ? yes, both. It seems one can get better RISC processors as CISC using this technique, or am I wrong ?. I am studying this VM80 and trying to understand how it works... it doesn't have big state machine per-se, but uses many single bit signals for different functions and has many clock-enables, so to say . Very interesting, I think." rel="nofollow">http://opencores.org/project,6809_6309_compatible_core[/core]. It is not complete because half of it has not been written yet (6309). I took me months, any many of them. It works, needs loads of LUTs, ~2100 and it is not microcoded. Fast ? Some 20 Mhz too.
I want to rewrite it. I see that for such a complicated processor I made many wrong decisions. I see that a behavioral model cannot provide a nice and small design, or I am unable (yet) to write such code, that is better. Maybe I should go to the drawing board again and design a kind of gate model. That way I could maybe see why. The overwhelming number of different instructions makes the whole difficult. I can also call it a day, it works, but I don't want. From time to time someone sends me an email and says this and this do not work, normally what I didn't really tested, didn't work .
It is clunky, full of muxes, and tricks, and several kinds of signals...
The question is... how should one proceed when designing logic ?
Let's see the hottest item in town... Chip's Cog. It works ? is small ? and fast ? and a beauty ? and a work of art ? yes, yes, yes, yes, yes. If I compare it to what I wrote (also works, it is fast, not a beauty and also a work of art) I see that I am a toddler, or didn't spend as much time as I should have (what I really also do not have .
But.. is it a behavioral model ? yes, both. It seems one can get better RISC processors as CISC using this technique, or am I wrong ?.
I am studying this VM80 and trying to understand how it works... it doesn't have big state machine per-se, but uses many single bit signals for different functions and has many clock-enables, so to say . Very interesting, I think.
Design Summary Number of registers: 448 out of 1604 (28%) PFU registers: 433 out of 1280 (34%) PIO registers: 15 out of 324 (5%) Number of SLICEs: 406 out of 640 (63%) SLICEs as Logic/ROM: 406 out of 640 (63%) SLICEs as RAM: 0 out of 480 (0%) SLICEs as Carry: 14 out of 640 (2%) Number of LUT4s: 808 out of 1280 (63%) Number of logic LUTs: 780 Number of distributed RAM: 0 (0 LUT4s) Number of ripple logic: 14 (28 LUT4s) Number of shift registers: 0
That is not that much.
For the ones that don't know or forgot, the 8080 needs two non-overlapping clocks:
__ __ __/ \______/ \__ ___ ______/ \______/
Of course that poses some problems because in an fpga one shouldn't gate the clock and two clock domains bring headaches . They used two clock-enables
and one clock to solve the problem. Very nice.
But I just wanted to point out how tiny that implementation is !, It is a bit slow too , getting 22 MHz on the slowest MachXO2 device (-1).
Another implementation (M8080), taken from opencores needs double so many luts, but can be clocked a bit higher (26 MHz) and has a simpler IO bus.
Design Summary Number of registers: 270 out of 7209 (4%) PFU registers: 233 out of 6864 (3%) PIO registers: 37 out of 345 (11%) Number of SLICEs: 790 out of 3432 (23%) SLICEs as Logic/ROM: 790 out of 3432 (23%) SLICEs as RAM: 0 out of 2574 (0%) SLICEs as Carry: 129 out of 3432 (4%) Number of LUT4s: 1579 out of 6864 (23%) Number of logic LUTs: 1321 Number of distributed RAM: 0 (0 LUT4s) Number of ripple logic: 129 (258 LUT4s) Number of shift registers: 0
But what I wanted to talk about was how different most processor implementations are from the gate model (RTL, I think).
Why do I go here ?
The thing is I developed a 6809 in Verilog here . It is clunky, full of muxes, and tricks, and several kinds of signals... The question is... how should one proceed when designing logic ? Let's see the hottest item in town... Chip's Cog. It works ? is small ? and fast ? and a beauty ? and a work of art ? yes, yes, yes, yes, yes. If I compare it to what I wrote (also works, it is fast, not a beauty and also a work of art) I see that I am a toddler, or didn't spend as much time as I should have (what I really also do not have . But.. is it a behavioral model ? yes, both. It seems one can get better RISC processors as CISC using this technique, or am I wrong ?. I am studying this VM80 and trying to understand how it works... it doesn't have big state machine per-se, but uses many single bit signals for different functions and has many clock-enables, so to say . Very interesting, I think." rel="nofollow">http://opencores.org/project,6809_6309_compatible_core[/core]. It is not complete because half of it has not been written yet (6309). I took me months, any many of them. It works, needs loads of LUTs, ~2100 and it is not microcoded. Fast ? Some 20 Mhz too.
I want to rewrite it. I see that for such a complicated processor I made many wrong decisions. I see that a behavioral model cannot provide a nice and small design, or I am unable (yet) to write such code, that is better. Maybe I should go to the drawing board again and design a kind of gate model. That way I could maybe see why. The overwhelming number of different instructions makes the whole difficult. I can also call it a day, it works, but I don't want. From time to time someone sends me an email and says this and this do not work, normally what I didn't really tested, didn't work .
It is clunky, full of muxes, and tricks, and several kinds of signals...
The question is... how should one proceed when designing logic ?
Let's see the hottest item in town... Chip's Cog. It works ? is small ? and fast ? and a beauty ? and a work of art ? yes, yes, yes, yes, yes. If I compare it to what I wrote (also works, it is fast, not a beauty and also a work of art) I see that I am a toddler, or didn't spend as much time as I should have (what I really also do not have .
But.. is it a behavioral model ? yes, both. It seems one can get better RISC processors as CISC using this technique, or am I wrong ?.
I am studying this VM80 and trying to understand how it works... it doesn't have big state machine per-se, but uses many single bit signals for different functions and has many clock-enables, so to say . Very interesting, I think.