Shop OBEX P1 Docs P2 Docs Learn Events
What's wrong with my HDL FIFO? — Parallax Forums

What's wrong with my HDL FIFO?

After a long hiatus I was getting back into my processor design for FPGA and built a UART for it. Then I decided the UART could do with a FIFO on the receiver and possibly on the transmitter. Sounds simple enough but it took me all day to get the logic right and the test harness in shape.

Now, I'm hoping this is a rhetorical question because as far as I can tell my FIFO works just fine. Under simulation at least. Bytes go in, bytes come out, in the right order, it says FULL when it's full and EMPTY when it's empty. No matter what I throw at it.

But then I started looking at Verilog FIFO designs around the net. Wow, they can be very long winded, complicated and hard to follow. Mine is really short and sweet. That made me start thinking I had missed an important point somewhere.

Here some examples of what look like over complex, tortuous, solutions:
https://www.fpga4student.com/2017/01/verilog-code-for-fifo-memory.html
https://www.quora.com/What-is-the-Verilog-code-for-synchronous-FIFO

The latter comes from an Intel engineer, and as far as I can tell is taking twice as much code to do exactly the same as mine. Perhaps that's why Intel processors need so many transistors and run so hot :)

So what is wrong with my FIFO? What am I missing here?

Here is my FIFO in SpinalHDL:
class Fifo(width: Int, depth: Int) extends Component {

  val addressWidth = (scala.math.log(depth) / scala.math.log(2)).toInt

  val io = new Bundle {
    val dataIn = in UInt (width bits)
    val dataOut = out UInt (width bits)
    val read = in Bool
    val write = in Bool
    val full = out Bool
    val empty = out Bool
  }

  val mem = Mem(Bits(width bits), wordCount = depth)
  val head = Reg(UInt (addressWidth  bits )) init (0)
  val tail = Reg(UInt (addressWidth  bits)) init (0)
  val full = Reg(Bool) init False
  val empty = Reg(Bool) init True

  mem.write(head, io.dataIn.asBits, !full & io.write)
  io.dataOut := U(mem.readAsync(tail))

  when (io.write && !io.read) {
    when (!full) {
      head := head + 1
      full := ((head + 1) === tail)
      empty := False
    }
  }

  when (!io.write && io.read) {
    when (!empty) {
      tail := tail + 1
      empty := (tail + 1  === head)
      full := False
    }
  }

  when (io.write & io.read) {
    when (full) {
      tail := tail + 1
      full := False
    }
    when (empty) {
      head := head + 1
      empty := False
    }
    when (!full & !empty) {
      tail := tail + 1
      head := head + 1
    }
  }

  io.empty := empty
  io.full := full
}
How simple could that be? In fact I think it's one of the most beautiful pages of code I have ever written. Never mind the specifics of the Spinal syntax, it has a certain poetic symmetry about it. That generates the following Verilog, longer and less pretty but still has the poetic symmetry:
module Fifo (
      input  [7:0] io_dataIn,
      output [7:0] io_dataOut,
      input   io_read,
      input   io_write,
      output  io_full,
      output  io_empty,
      input   clk,
      input   reset);
  wire [7:0] _zz_1;
  wire [6:0] _zz_2;
  wire [6:0] _zz_3;
  wire [7:0] _zz_4;
  wire  _zz_5;
  reg [6:0] head;
  reg [6:0] tail;
  reg  full;
  reg  empty;
  reg [7:0] mem [0:127];
  assign _zz_2 = (head + (7'b0000001));
  assign _zz_3 = (tail + (7'b0000001));
  assign _zz_4 = io_dataIn;
  assign _zz_5 = ((! full) && io_write);
  always @ (posedge clk) begin
    if(_zz_5) begin
      mem[head] <= _zz_4;
    end
  end

  assign _zz_1 = mem[tail];
  assign io_dataOut = _zz_1;
  assign io_empty = empty;
  assign io_full = full;
  always @ (posedge clk or posedge reset) begin
    if (reset) begin
      head <= (7'b0000000);
      tail <= (7'b0000000);
      full <= 1'b0;
      empty <= 1'b1;
    end else begin
      if((io_write && (! io_read)))begin
        if((! full))begin
          head <= (head + (7'b0000001));
          full <= (_zz_2 == tail);
          empty <= 1'b0;
        end
      end
      if(((! io_write) && io_read))begin
        if((! empty))begin
          tail <= (tail + (7'b0000001));
          empty <= (_zz_3 == head);
          full <= 1'b0;
        end
      end
      if((io_write && io_read))begin
        if(full)begin
          tail <= (tail + (7'b0000001));
          full <= 1'b0;
        end
        if(empty)begin
          head <= (head + (7'b0000001));
          empty <= 1'b0;
        end
        if(((! full) && (! empty)))begin
          tail <= (tail + (7'b0000001));
          head <= (head + (7'b0000001));
        end
      end
    end
  end

endmodule
I know, way off topic for this forum. But I know there are some HDL gurus around here and I blame Chip for setting me off down this road.

«1

Comments

  • KeithEKeithE Posts: 957
    edited 2018-11-09 05:27
    I don't want to take a lot of time to review this, but this part:

    io.dataOut := U(mem.readAsync(tail))
    ...
    assign _zz_1 = mem[tail];
    assign io_dataOut = _zz_1;

    Indicates that you are doing async reads. I'm not sure what device you are targeting, but it may mean that this gets implemented with a bunch of registers. This is probably not what you want? Often times RAM reads are clocked. e.g. see sysMEM in Memory Usage Guide for iCE40 Devices

    But yeah a synchronous FIFO shouldn't be too complicated. It's the asynchronous ones that get interesting.

    Also one funny thing that I noticed recently when trying to understand things that other people wrote, is that head and tail are ambiguous terms. (I read a specification where they used inconsistently.) I would use something like wrPtr and rdPtr now. See Head_or_tail_first
  • Heater.Heater. Posts: 21,230
    KeithE,

    Thanks for taking a quick peek at it. For sure don't waste much time on it.

    I had not really looked into what Spinal does with readAsyn and readSync. Just now I changed it to Sync as an experiment. All that happens is that the memory read gets clocked out:
    Spinal:
        mem.readSync:
    
    Verilog:
        assign io_dataOut = _zz_2;
        assign _zz_1 = 1'b1;
        always @ (posedge clk) begin
            if(_zz_1) begin
            _zz_2 <= mem[tail];
            end
        end
    
    That causes all my test bench to fail as the data is now one clock late.

    Also as an experiment I dropped it into my xoro project, with a picorv32. In both cases Quartus synthesized it into SYNC_RAM. Which surprised me as I had a devil of a job getting Quartus to use memory blocks for the picorv32's RAM, ended up having to use the Quartus memory wizard thing to do it for me.

    In this case I might be happier if the memory was a bunch of LUTs. I had to reduce my picorv32 RAM from 64K bytes to 40K bytes to stop Quartus complaining that it can't find enough RAM blocks. Not sure why it want's to eat so much for my 32 stage FIFO.

    Ah, yes, I had not considered asynchronous FIFO's. Dealing with different clock domains is out of my league at the moment.

    That's interesting about the ambiguity of "head" and "tail". Having created many software FIFO's over the years and seen many others I have never had anyone argue that point in reviews. I have never even thought about it.

    I'm going to leave as is. Because that is the way I have always seen it. Because, apparently, that is how the Linux kernel does it so everyone else must be wrong.

    Perversely I could say that what we have here is queue of empty space. So "head" indicates the first free space slot to be consumed from the queue :)
  • Yes I’m in agreement with the Linux usage too. Whew!

    We used some 8051 IP where the internal data RAM had to support async reads. No way to configure it as sync. It would have impacted performance. But it was only 128 bytes, so we could live with a regfile.
  • Heater.Heater. Posts: 21,230
    I was thinking about this "head" vs "tail" ambiguity.

    Strangely enough I suspect that if one asked me to implement a "FIFO" in software, or "cyclic buffer" as they have been known, I would use the terms "head" and "tail" as I have here.

    On the other hand if the requirement was for a "queue" then I might use "head" and "tail" the other way round.

    How so?

    Because "FIFO" and "cyclic buffer" are sort of hardware guys way to look at it and the terms those guys would use. You don't by a "queue" chip you buy a "FIFO" chip. For example: https://www.digikey.com/products/en/integrated-circuits-ics/logic-fifos-memory/707

    But "queue" is from the computer science world where queue's consume data from the "head" and add it to the "tail".

    It's all the same idea of course, the confusion comes when hardware guys meet software guys in the middle, in the FPGA world.

    Anyway, my problem now is that my serial receiver works fine with my picorv32 core in an FPGA. My receiver + FIFO works fine in Verilator simulation. But when I put the receiver + FIFO in FPGA all kinds of garbage comes out.

    This all gets too big and complex to discuss here. I have to fight with it some more....
  • Heater.Heater. Posts: 21,230
    Oh shoot. This "head"/"tail" thing is worse than it seems.

    In the esoteric programming world of Haskell we have this:
    >head [5,4,3,2,1]
    5
    
    So "head" is the first element in a list.
    >tail [5,4,3,2,1]  
    [4,3,2,1]
    
    So "tail" is everything else!
    >last [5,4,3,2,1]  
    1   
    
    So "last" is the one on the end. Fair enough.
    >init [5,4,3,2,1]  
    [5,4,3,2]
    
    Which everything before "last". Hmm..OK

    So "head" is the opposite to "last" and "tail" is the opposite to "init"

    Confused yet?

    OK, given we are talking about a FIFO here, as in "first in, first out", we should not use "head" and "tail" but rather "first" and "last".

    "first" is the position things go out of the FIFO, they were first in.

    "last" is the position things go in to the FIFO, they were last in.

    That would at least remove the "head"/"tail" ambiguity. At risk of confusing everyone because they had never seen these things named like that before.

    Perhaps I worry about trivia too much....



  • ElectrodudeElectrodude Posts: 1,657
    edited 2018-11-09 22:44
    Heater. wrote: »
    Oh shoot. This "head"/"tail" thing is worse than it seems.

    In the esoteric programming world of Haskell we have this:
    >head [5,4,3,2,1]
    5
    
    So "head" is the first element in a list.
    >tail [5,4,3,2,1]  
    [4,3,2,1]
    
    So "tail" is everything else!
    >last [5,4,3,2,1]  
    1   
    
    So "last" is the one on the end. Fair enough.
    >init [5,4,3,2,1]  
    [5,4,3,2]
    
    Which everything before "last". Hmm..OK

    So "head" is the opposite to "last" and "tail" is the opposite to "init"

    Confused yet?

    OK, given we are talking about a FIFO here, as in "first in, first out", we should not use "head" and "tail" but rather "first" and "last".

    "first" is the position things go out of the FIFO, they were first in.

    "last" is the position things go in to the FIFO, they were last in.

    That would at least remove the "head"/"tail" ambiguity. At risk of confusing everyone because they had never seen these things named like that before.

    Perhaps I worry about trivia too much....



    That's not just Haskell. Most functional languages either call them car and cdr (like most Lisps) or call them head and tail.

    This section of Wikipedia's Linked List article says that tail can refer to either the rest of the list or the last element of the list. The article uses the term "tail" in the sense of the rest of the list extensively: e.g. search the page for "tail-sharing".
  • Heater.Heater. Posts: 21,230
    Oh Boy, yes, "car" and "cdr". Lisp and Scheme and others...

    At least "car" and "cdr" avoid ambiguity by being really weird.

    To any normal person this "head" and "tail" thing is obvious. Animals have a head at one end and a tail at the other.Food goes in at the head end and comes out at the tail end. The stuff in the middle is not either head nor tail.

    The CS guys get it wrong.

    In Haskell lists can be infinitely long, for example [1, 2..]

    So the head is "1".

    But what is the last? ... hangs up computer...

    Perhaps for this reason the Haskell guys define "tail" as everything that is not "head", [2..]. They cannot get to the actual tail!
  • instead of head and tail one could find better names. I for example would be tempted to name one of them FirstIn and the other FirstOut so that they say what they do, but that is to COBOL-like for other people...

    Mike
  • Cluso99Cluso99 Posts: 18,069
    msrobots wrote: »
    instead of head and tail one could find better names. I for example would be tempted to name one of them FirstIn and the other FirstOut so that they say what they do, but that is to COBOL-like for other people...

    Mike
    No, that might be obvious these days. Much better to use something ambiguous to obfuscate the code.
  • Heater.Heater. Posts: 21,230
    How about the Fortran approach, just call them I and J ?

  • kwinnkwinn Posts: 8,697
    How about using Fi and Fo. Saves keystrokes and leaves Fee and Fum free for other uses ;-)
  • Heater.Heater. Posts: 21,230
    edited 2018-11-10 19:05
    Yes.

    Of course if you call them "Fi" and "Fo" then the "F" is redundant.

    Might as well call them "i" and "o". Which brings us back to Fortran world.

    I don't like to use "ptr" as in "wrPtr" and "rdPtr". Pointers are a low level software things, as in C, they don't really describe what is going on. And again the "Ptr" part is redundant. Might as well call them "wr" and "rd".

    Oh what, "Fee-fi-fo-fum, I smell the blood of an Englishman, Be he alive, or be he dead I'll grind his bones to make my bread."

    That put the fear of God into me as a five year old. Who are these big guys that want to eat us?!
  • What have I started? ;-)

    This sort of reminds me of something that happened at a job. There was a mode called green mode. I asked - why is it called green mode? Oh, that's because it was the mode drawn with a green dry erase marker on the white board. Similarly there was a mode called mode 3, because it was the third mode of operation introduced. Somehow modes 1 and 2 got reasonable names indicating what they did, but mode 3 never did. And the key GPS software engineers seemed to enjoy this sort of thing.
  • kwinnkwinn Posts: 8,697
    KeithE wrote: »
    What have I started? ;-)
    ....

    No worries. After dealing with problem after problem this week I was am in a bit of a whimsical mood. I'm sure it will pass soon.
  • Heater.Heater. Posts: 21,230
    KeithE,

    Colour me stupid but modes named "1", "green", "3" do not indicate what any of them did.

  • I didn’t give the updated names for 1 and 2.
  • Heater.Heater. Posts: 21,230
    edited 2018-11-19 10:17
    Well of course there is something wrong with my FIFO. It won't work at 100MHz. At least not when embedded into my UART receiver. Which is a pain because the picorv32 processor core I'm using does.

    I guess it's hardly surprising as there is a huge lot of logic to get through between a byte arriving and a FIFO write and similarly on read.

    In the receiver:
    ...
    when (baudClockX64Sync2.rise) {
        ...
        switch(state) {
          ...
          is(3) {
            // Check stop bit
            when(bitTimer === 0) {
              when(io.rx === True) {
                when (!fifo.io.full) {
                  fifo.io.dataIn := shifter
                  fifo.io.write := True
                  ...
    
    Then in the fifo:
                   ...
                    when (io.write.rise && !io.read) {
                        when (!full) {
                        head := head + 1
                        full := ((head + 1) === tail)
                        empty := False
                   ...
    
    What to do? Some how slow down the FIFO read/write cycles? Add some kind of pipelining buffers on the FIFO inputs and outputs. All sounds a bit messy.

    Code is here if anyone is curious:

    SpinalHDL : https://github.com/ZiCog/xoroshiro/tree/master/src/main/scala/xoroshiro
    Verilog : https://github.com/ZiCog/xoroshiro

    Edit:

    STOP PRESS:

    Quite by chance I now have the FIFO running with a 100MHz clock in my UART receiver. BUT it only does so when I hook the UART's internal state variable to the DE0 Nano LEDS. Take that out and it all locks up!

    Make me think this is working on the edge timing wise.

  • Heater.Heater. Posts: 21,230
    edited 2018-11-22 02:08
    Nope, turns out my UART/FIFO problems are not timing related, it's worse than that. If there are any Verilog/Quartus gurus around that could advise that would be great.

    My UART receiver and it's FIFO work perfectly well with a system clock of 100MHz and a baud rate of 115200. This can pump data around for hours, error free.

    Except for one annoying detail. I found this only works if I assign the state variable of the receiver's state machine to an unused register. When I remove that redundant assignment it all fails badly. Which is very curious.

    After fighting with this for hours I noticed that when I do that redundant assignment Quartus generates a pile of logic which works. But when I remove it Quartus generates a state machine block, which shows up as a yellow box in the RTL viewer and also in the state machine viewer, which does not work.

    Closer inspection shows that the generated state machine changes depending on if I assign the two bits of state to an unused 8 bit register of to a 2 bit register. In some cases it's shows that there are no transitions out of state zero generated!

    So my question is? How is Quartus inferring it's state machine blocks? What are the tricks to coding a Verilog state machine such that Quartus gets it right!

    The UART/FIFO verilog is here : https://github.com/ZiCog/xoroshiro/blob/master/AsyncReceiver.v if anyone has the stamina to have a quick look.



  • KeithEKeithE Posts: 957
    edited 2018-11-22 01:24
    I took a quick look and don't see an obvious problem.

    Is something potentially detecting the state machine and changing the coding of the states? There would probably be a mention in the logfile about this.

    Also just so you know, you have a certain "c" word in there which is not a polite term in English.

    Maybe this link is relevant:

    https://www.intel.com/content/www/us/en/programmable/quartushelp/14.1/mergedProjects/hdl/vlog/vlog_file_dir_syn_encoding.htm

    If this is the issue, then perhaps it decides not to change the coding if the state is output because it doesn't control all of the consumers of the state?
  • Heater.Heater. Posts: 21,230
    KeithE,

    Thanks for taking the time to have a look at it.

    This is still causing me to tear my hair out. With that unused assignment in place it works very reliably, without it the inferred state machine is incorrect. There must be some subtlety about how one codes a state state machine that Quartus is picking up on. Similar to the way one has to create memory in just the right way else it does not infer RAM blocks.

    I don't mind it inferring a state machine, it did before I put the FIFO in there and was using just a single byte holding buffer, if only it would get the states right!

    Oops, sorry about the "c" word. It is long gone in commits since then. It was a typo, honest.

    Thanks for the link re: syn_encoding. I might experiment with that. I hope it's not required to resolve this though, my Verilog is generated from SpinalHDL source and I'd rather not have to hack it manually after every change to Spinal sources.

    I suspect you have a point about it not controlling all consumers of the state. Just now I've been studying this paper on "State Machine Coding Styles for Synthesis" hoping to get some clues from the examples there: http://www.sunburst-design.com/papers/CummingsSNUG1998SJ_FSM.pdf
  • No problem with the word. I just wanted to point it out. Now maybe more curious people will look at your code ;-)

    If this is a problem then maybe this feature could help?

    https://spinalhdl.github.io/SpinalDoc/spinal/core/vhdl_generation/#vhdl-and-verilog-attributes
  • Heater.Heater. Posts: 21,230
    edited 2018-11-22 04:31
    KeithE,

    I love you...in a platonic kind of way... you have saved my sanity!

    Inspired by your comment about "...it doesn't control all of the consumers of the state..." I made one teeny-tiny, seemingly inconsequential, redundant change to my state machine and boom! It works fine, even when I remove that redundant assignment. Been running error free for an hour or so.

    All I did was to introduce a new register "next" that will hold the next state value. For each state transition I set "next" rather than "state". Then I assign "next" to "state" separately on every clock.

    Like so:
      val state = Reg(UInt(3 bits)) init (0)
      val next = Reg(UInt(3 bits)) init (0)          // New: Next required state goes here.
      ...
      state := next                                  // New: State gets updated from next on every clock.
      ...
          switch(state) {                            // The state machine.
          ...
            is(1) {
                // Check valid start bit
                when(bitTimer === 0) {
                when(io.rx === False) {
                    bitTimer := 63
                    next := 2                        // New: State updates go into "next" not "state".
                } otherwise {
                    next := 0
                }
            }
          ...
          }
    ...
    
    The story is not quite over yet though.... although this now works the inferred state machine seems a bit crazy. It looks like it allows for any transition from any state to any other. Which is not what my source code says. Like so:

    AsyncReceiver.png

    Perhaps I need to adopt more of the state machine style as shown in the examples in that paper above.
    1263 x 502 - 12K
  • Heater.Heater. Posts: 21,230
    Thanks for the heads up on the Spinal attributes thing.

    Hadn't noticed it before, or perhaps it floated out of my mind as I had no idea what it was about or why I might want one.
  • >Then I assign "next" to "state" separately on every clock.

    Now that you mention it, that's typically what I do. No idea why it would have this type of effect on your design though.

    And that any state to any state diagram seems quite unexpected.

    For some potential future sanity saving. Clifford posted something a while back about how one should code logic with synchronous resets. Twitter being twitter I didn't get his point at first when viewed on a little iPhone SE screen.

  • Heater.Heater. Posts: 21,230
    edited 2018-11-22 07:10
    Interesting. I don't recall ever seeing anyone put the reset code after the non-reset code like that. Always the other way around. Does it really make any difference? I have no idea.

    Anyway, as I'm using Spinal I have no say in what happens with clock and reset, they do not appear in the source anywhere, all abstracted away.

    Spinal always generates asynchronous resets:
    always @ (posedge clk or posedge reset) begin
        if (reset) begin
            [reset code here]
        end else begin
            [non-reset code here]
        end
    end
    

    I don't think Spinal provides for any control of that.
  • KeithEKeithE Posts: 957
    edited 2018-11-22 05:51
    In the thread someone suggests what you wrote above. (But without the "or posedge reset" of course.)

    And then Clifford states

    "Only if you do assign reset values to *all* registers written to in [reset code]. If you have some registers in that block that do not have a reset value then your code will infer additional logic and they are not identical.
    That's the whole point."

    Maybe not really a sanity saver, but good to understand. I typically used async resets for most of my work in the past, so I hadn't thought about this. You can often have some registers which aren't reset and that can save you some logic. e.g. registers in a datapath.

    Edited to elaborate:

    >Spinal always generates synchronous resets

    Maybe you just made a typo, but that's an asynchronous reset.

    always @ (posedge clk or posedge reset) begin : async because the reset is in the sensitivity list
    if (reset) begin

    always @ (posedge clk) begin : sync - reset is just like any other logic signal and doesn't take effect until a posedge of clk
    if (reset) begin

    Going offline now - Thanksgiving won't wait ;-)
  • Heater.Heater. Posts: 21,230
    I'm a bit too tired to get my head around that just now. That state machine has worn me down...

    I don't really understand the pros and cons of synchronous vs asynchronous resets.

    Why would one choose one over the other? Why does Spinal do what it does?

  • Heater.Heater. Posts: 21,230
    edited 2018-11-22 07:13
    Oops again. Yeah I meant async. Fixed it now.

    Have a good Thanksgiving!
  • KeithEKeithE Posts: 957
    edited 2018-11-22 16:48
    Sunburst has a paper on async versus sync resets. We used to use async for ASICs with the deassertion of reset synced to the clock. The timing of this is critical.

    Sometimes it’s convenient to have things reset even when there’s not a clock present, so async is convenient for that. Plus sync creates additional logic. Whereas async is just an input to the flops.

    The Sunburst paper probably explains all of this. Fortunately for FPGAs you don’t have to get it right or optimal the first time. And you can look at what others have done in their open source designs and ask questions.

    Edited to add: http://www.sunburst-design.com/papers/CummingsSNUG2003Boston_Resets.pdf
  • Heater.Heater. Posts: 21,230
    Thanks for the reset paper link.

    That is all far too much for me to get my head into just now. My UART is working well but I'm still trying to figure out why the state machine transition diagram is so weird.

    Also, if I try to us any of the suggestions from here:http://www.sunburst-design.com/papers/CummingsSNUG1998SJ_FSM.pdf, for example, see figure 11, it all fails terribly.

    I can't even change to a one hot-state encoding without it locking up!

    Perfectly innocuous changes make if fail. Which is rather disturbing.
Sign In or Register to comment.