i8080 Verilog implementations and on how to design a microprocessor in a HDL

Ale · 2015-02-25 23:13

I keep an eye on that nice site zeptobars, and recently they published a Verilog implementation (I'd say almost gate for gate) of the KR580VM80 (which was also recently revers-engineered). Of course I had to give it a try. Using my beloved Diamond and targeting for the MachXO2:

Design Summary
   Number of registers:    448 out of  1604 (28%)
      PFU registers:          433 out of  1280 (34%)
      PIO registers:           15 out of   324 (5%)
   Number of SLICEs:       406 out of   640 (63%)
      SLICEs as Logic/ROM:    406 out of   640 (63%)
      SLICEs as RAM:            0 out of   480 (0%)
      SLICEs as Carry:         14 out of   640 (2%)
   Number of LUT4s:        808 out of  1280 (63%)
      Number of logic LUTs:      780
      Number of distributed RAM:   0 (0 LUT4s)
      Number of ripple logic:     14 (28 LUT4s)
      Number of shift registers:   0

That is not that much.

For the ones that don't know or forgot, the 8080 needs two non-overlapping clocks:

   __        __
__/  \______/  \__
       ___    
______/   \______/

Of course that poses some problems because in an fpga one shouldn't gate the clock and two clock domains bring headaches

. They used two clock-enables
and one clock to solve the problem. Very nice.
But I just wanted to point out how tiny that implementation is !, It is a bit slow too

, getting 22 MHz on the slowest MachXO2 device (-1).

Another implementation (M8080), taken from opencores needs double so many luts, but can be clocked a bit higher (26 MHz) and has a simpler IO bus.

Design Summary
   Number of registers:    270 out of  7209 (4%)
      PFU registers:          233 out of  6864 (3%)
      PIO registers:           37 out of   345 (11%)
   Number of SLICEs:       790 out of  3432 (23%)
      SLICEs as Logic/ROM:    790 out of  3432 (23%)
      SLICEs as RAM:            0 out of  2574 (0%)
      SLICEs as Carry:        129 out of  3432 (4%)
   Number of LUT4s:        1579 out of  6864 (23%)
      Number of logic LUTs:      1321
      Number of distributed RAM:   0 (0 LUT4s)
      Number of ripple logic:    129 (258 LUT4s)
      Number of shift registers:   0

But what I wanted to talk about was how different most processor implementations are from the gate model (RTL, I think).
Why do I go here ?
The thing is I developed a 6809 in Verilog here . It is clunky, full of muxes, and tricks, and several kinds of signals... The question is... how should one proceed when designing logic ? Let's see the hottest item in town... Chip's Cog. It works ? is small ? and fast ? and a beauty ? and a work of art ? yes, yes, yes, yes, yes. If I compare it to what I wrote (also works, it is fast, not a beauty and also a work of art) I see that I am a toddler, or didn't spend as much time as I should have (what I really also do not have

. But.. is it a behavioral model ? yes, both. It seems one can get better RISC processors as CISC using this technique, or am I wrong ?. I am studying this VM80 and trying to understand how it works... it doesn't have big state machine per-se, but uses many single bit signals for different functions and has many clock-enables, so to say

. Very interesting, I think." rel="nofollow">http://opencores.org/project,6809_6309_compatible_core[/core]. It is not complete because half of it has not been written yet (6309). I took me months, any many of them. It works, needs loads of LUTs, ~2100 and it is not microcoded. Fast ? Some 20 Mhz too.
I want to rewrite it. I see that for such a complicated processor I made many wrong decisions. I see that a behavioral model cannot provide a nice and small design, or I am unable (yet) to write such code, that is better. Maybe I should go to the drawing board again and design a kind of gate model. That way I could maybe see why. The overwhelming number of different instructions makes the whole difficult. I can also call it a day, it works, but I don't want. From time to time someone sends me an email and says this and this do not work, normally what I didn't really tested, didn't work

.
It is clunky, full of muxes, and tricks, and several kinds of signals...

The question is... how should one proceed when designing logic ?

Let's see the hottest item in town... Chip's Cog. It works ? is small ? and fast ? and a beauty ? and a work of art ? yes, yes, yes, yes, yes. If I compare it to what I wrote (also works, it is fast, not a beauty and also a work of art) I see that I am a toddler, or didn't spend as much time as I should have (what I really also do not have

.
But.. is it a behavioral model ? yes, both. It seems one can get better RISC processors as CISC using this technique, or am I wrong ?.

I am studying this VM80 and trying to understand how it works... it doesn't have big state machine per-se, but uses many single bit signals for different functions and has many clock-enables, so to say

. Very interesting, I think.