System Verilog Translation
Kye
Posts: 2,200
I was wondering if anyone would want to use a version of the P1 core translated from Verilog to System Verilog.
System Verilog is to Verilog what C++ is to C. System Verilog adds a ton of new features to Verilog that make writing Verilog code almost as easy as writing C. Translating the P1 core to System Verilog will make it substantially easier for the community to modify the core.
Below is an example of a System Verilog FIFO. The logic of the FIFO more or less is C code:
Writing hardware code in this high level style makes it very easy to add new features at the cost of having to trust the compiler a lot more. Ten years ago you wouldn't want to write Verilog code this way because the compilers weren't that good. But, now you can do this. In fact, compilers actually prefer the code being written in high level style now as it gives them more flexibility with how to implement your logic. The more specific you are the more you tie the tool's hands.
I'm interested in doing this because I want to make architectural modifications to the P1 core. For example, two things I want to do are to slice the hub up into 8 pieces like the P2 hub and make the hub 32-bit accessible. Chip's style is super optimized which is great! But... this also means that changing the P1 architecture will be much trickier.
Anyway, so, would doing this be of value? Will anyone be interested in it even if it's not the official Parallax source? Note that the code will inherit the GPL license like the Parallax source. My modifications will also be GPL.
NOTE: The System Verilog translation will only be functionality equivalent to the official P1 code. It will not be the same down to the gate level.
System Verilog is to Verilog what C++ is to C. System Verilog adds a ton of new features to Verilog that make writing Verilog code almost as easy as writing C. Translating the P1 core to System Verilog will make it substantially easier for the community to modify the core.
Below is an example of a System Verilog FIFO. The logic of the FIFO more or less is C code:
// By: Kwabena W. Agyeman module SREG_FIFO #(parameter WIDTH = 8, parameter DEPTH = 8) ( input logic clk, rst_n, input logic fifo_enqueue, input logic fifo_dequeue, output logic [$clog2(DEPTH)-0:0] fifo_free_space, output logic [$clog2(DEPTH)-0:0] fifo_used_space, output logic fifo_full, output logic fifo_empty, input logic [WIDTH-1:0] fifo_data_in, output logic [WIDTH-1:0] fifo_data_out ); logic [$clog2(DEPTH)-0:0] counter, counter_q; dff #(.WIDTH($bits(counter))) counter_register ( .clk(clk), .rst_n(rst_n), .d(counter), .q(counter_q) ); logic [DEPTH-1:0][WIDTH-1:0] fifo_register, fifo_register_q; genvar k; generate for(k = (DEPTH-1); k >= 0; k--) begin : fifo dff #(.WIDTH($bits(fifo_register[k]))) register ( .clk(clk), .rst(rst), .d(fifo_register[k]), .q(fifo_register_q[k]) ); end : fifo endgenerate always_comb begin : fifo_logic enum [1:0] { HOLD, ENQUEUE, DEQUEUE, BOTH } action; action = HOLD; counter = counter_q; fifo_register = fifo_register_q; case({fifo_dequeue, fifo_enqueue}) 1: begin if(counter_q < DEPTH) begin action = ENQUEUE; counter = counter_q + 1; end end 2: begin if(counter_q > 0) begin action = DEQUEUE; counter = counter_q - 1; end end 3: begin action = BOTH; end endcase for(int i = (DEPTH-1); i >= 0; i--) begin if((action == DEQUEUE) || (action == BOTH)) begin if(i != (DEPTH-1)) begin fifo_register[i] = fifo_register[i+1]; end else begin fifo_register[i] = '0; end end if((action == ENQUEUE) || (action == BOTH)) begin if(i == counter_q) begin fifo_register[i] = fifo_data_in; end end end fifo_free_space = DEPTH - counter_q; fifo_used_space = counter_q; fifo_full = !fifo_free_space; fifo_empty = !fifo_used_space; fifo_data_out = fifo_register_q[0]; end : fifo_logic endmodule : SREG_FIFO module dff #(parameter WIDTH = 8) ( input logic clk, rst_n, input logic [WIDTH-1:0] d, output logic [WIDTH-1:0] q ); always_ff @(posedge clk or negedge rst_n) begin : dff_logic if(!rst_n) begin q <= '0; end else begin q <= d; end end : dff_logic endmodule : dff
Writing hardware code in this high level style makes it very easy to add new features at the cost of having to trust the compiler a lot more. Ten years ago you wouldn't want to write Verilog code this way because the compilers weren't that good. But, now you can do this. In fact, compilers actually prefer the code being written in high level style now as it gives them more flexibility with how to implement your logic. The more specific you are the more you tie the tool's hands.
I'm interested in doing this because I want to make architectural modifications to the P1 core. For example, two things I want to do are to slice the hub up into 8 pieces like the P2 hub and make the hub 32-bit accessible. Chip's style is super optimized which is great! But... this also means that changing the P1 architecture will be much trickier.
Anyway, so, would doing this be of value? Will anyone be interested in it even if it's not the official Parallax source? Note that the code will inherit the GPL license like the Parallax source. My modifications will also be GPL.
NOTE: The System Verilog translation will only be functionality equivalent to the official P1 code. It will not be the same down to the gate level.
Comments
I say "go for it!"
It looks like a valuable addition to the project. Certainly a leading tool and if it inspires other flights of imagination or is something that clicks better than plain Verilog for some people, it is certainly a worthy endeavor!
I for one would be interested in seeing the differences.
As for its equivalence to the P1, I don't think that should be a concern. If a future SV-based design is worth converting into an ASIC, dealing with gate-level issues (if there are any) can be dealt with then.
The ALU is now just one big mux with the result of every possible instruction going into it. Only the selected instruction gets outputted.
I'm not sure if the tool will optimize common logic. I'm used to using Design Compiler at work where this is the case.
The code sure is going to look different!!
Kye, do you have any relative LE counts between this and the original? I bet this compiles under Quartus to be several times bigger. I don't believe that it will see the commonality among all those huge shifters.
Hi Kye, looks a good idea. What are the free tools to work with systemverilog?
Quartus II works with SV just fine, as does ModelSim.
Yeah, I'm concerned about that. At work, using Design Compiler the tool will optimize the logic completely. However, I don't know if Quartus II does this. I haven't compiled the ALU yet, I need to finish coding it first. But, if the tool's optimizer is good then I should be able to continue with this approach. Otherwise I'll have to optimize the code myself.
However, the whole point of writing the code in this high level style is to make it easy to modify. If the tool requires me to heavily optimize everything like you have already done then this exercise is moot.
@Ramon
Use DVKit = http://dvkit.sourceforge.net/
It's Eclipse with System Verilog support. It can do all the fancy stuff like call tips and clrl+click navigation. That said, its just an editor. You have to use Quartus to compile and ModelSim to simulate.
@Searith
Thanks for the offer. If writing the ALU in high level system verilog style produces good synthesis results then I'll create a git hub repo for others to wok on the code.
Perhaps Design Compiler has some intermediate output that could be fed into Quartus, so that it could be used to make these far-flung inferences. Then, Quartus could just map the distilled logic into its fabric.
One benefit of thinking about the low-level implementation is that you tend to design things that have a lot of overlap and that will always result in smaller circuits. You actually morph your problem to allow a simpler hardware solution. I don't think any compiler can be smart enough to understand the goal in such a way.
It looks like my design uses up about 3051 LE vs your ALU of 613 LEs. Given the short compile time of about 10 seconds I don't think Quartus II did much work. Design Compiler in ultra mode would normally spend 15 to 20 minutes compiling the ALU I wrote. It's very serious about making the smallest possible circuit.
I really like how efficient the Propeller ALU is. It's really quite impressive.
I can still add modifications to the code however!
Wait! Let's find out how smart Design Compiler really is.
Would it be too much trouble to compile both versions with Design Compiler using some standard CMOS library you've got handy? I'm really curious to see if it makes those combined shifter inferences.
(For those of you that don't know, Design Compiler is pretty much the industry-standard workhorse for ASIC logic compilation. It's made by Synopsis and costs about $120k/year, though the price is always secret. It's renowned for its thoroughness.)
I already deleted all the code I wrote. Too late. However, just for kicks I can compile the P1 source using design compiler.
...
I suppose I can also compare the code I posted here with the P1 shifter code too.
Per seat
For my own stuff, everything I write, even test stubs, go into their own Git repository, extremely easy to do. And easy to backup and save, takes very little space. Occasionally I want to do something in Perl or C or TeX or whatever and realise that I probably have a stub somewhere that did something similar - and it's there. I didn't always do this in the past. But somewhen in 1997 I was analysing a tricky problem in handling satellite data in realtime, and remembered that I had already solved that problem in 1985, and by luck I found an old CCT (reel-to-reel tape) with the code. Had to dig out an old CCT drive and whip up a tool to read the old format, but I found the source code and could incorporate it almost directly into the new system. Saved a lot of time. Now I'm more careful by storing everything in a more accessible way. I have lost much more than I have kept of the old code, back when I just made random backups now and then, with no real thought for the future.
The short version of the above is that if you put effort into creating something, don't delete it
One thing ASIC designers on tight schedules need to consider is the ECO-ability of code. e.g. Will the tool Conformal ECO be able to handle ECOs efficiently? This isn't really a Verilog versus SystemVerilog issue - it's about coding style. I wonder if the coding style above would hinder this or not? Once you get to the ECO level you need to have a mental model of the structure of the design.
...
As for ECOs, writing in a higher level style is not a hindrance. As a designer you have to write code that meets timing (and area). So, this means you always have an idea of what you're generating. The goal of the higher level style is just to leave the optimizing work to the tool. No matter the code style you write in... once the technology mapping process takes place the output gate level netlist will look nothing like what you wrote. Technology libraries have all kinds of crazy gates that the tool will use. Even if you code up a simple addition of two values together the tool may decide that it wants to replace the adder with complex gates that take into account the logic feeding the adder. The tool will generally output a netlist that looks nothing at all like you think it should be.
EDIT: We don't use FPGAs at my job so I think I approached this whole thing with the wrong mind set. My primary goal was to write easy to understand code that is very fast. Modern processes more or less give you unlimited resources to play with. I've gotten used to just focusing on performance and not thinking about resource costs (within reason).
Maybe if you're trusting Formality and getting side files sent by DC things are better, but in our group there are those that want to formal checks done by a non-Synopsys tool "third party" tool.
I'm a new college hire, one year of experience right now. I only know a little bit about what they do at my job for ECOs. I bet you know way more about this stuff . I'm working in the non-consumer sector. While power matters for us, our envelope is in the tens of watts. Latency and throughput are the most important factors.