Questions on the P1 Verilog code to anyone...

Cluso99 · 2014-08-12 17:39

I can see where the cogs are being generated. Its being done in dig.v file in the generate section with the for (i=0; i<8; i++)

What I cannot see is how this generate section actually causes the cogxxx.v filesto be included multiple times.
Seem that by setting cog cog_( .nres (nres), etc, the appropriate pieces are "pulled" in from the cog.v (and its children) file(s) for each loop.
Is this correct or could someone explain where/how this is done please?

Now, if I wanted to only generate the VGA section for cog 0, how would I do that?
I know when the cogs are itterated, I have a value of itteration in i.
Can I use that within cog.v to optionally include the line 'include "cog_vid.v" or should I optionally include the // vid section?
If so, what is the format, otherwise how do I do that?
Since i is also defined within the cog.v file, how do I use i from the dig.v file?

Thanks in advance.

rjo__ · 2014-08-12 18:04

About everything I do. I is special and can only be used by a generate

KeithE · 2014-08-12 18:21

Verilog modules will "float" to the top level unless they are instantiated somewhere. In an FPGA design I think you would typically only want one item at the top level, but for simulations you could have multiple things floating to the top. e.g. a logging module might not be instantiated. The generate section is just instantiating the cog module multiple times, but the source code doesn't need to be included multiple times. It's sort of like defining a complex data structure and then creating multiple variables of that type. (Except for the floating to the top part.)

You can instantiate cog_vid as many times as you want without needing to include the source multiple times. You might also want to look up verilog parameters because you could use that to customize each instance.

KeithE · 2014-08-12 18:34

This site has some good verilog papers - http://www.sunburst-design.com/papers/

See HDLCON 2002 for one that covers parameters. There might be some good examples in there. Oftentimes they are used to specify things like a bus width or FIFO depth, but you should be able to use them to specify things like how many cog_vids to instantiate.

Also if you're doing many instantiations of various things with a lot of wiring maybe take a look at verilog emacs mode. (http://www.veripool.org/wiki/verilog-mode) A lot of engineers like to use that for wiring up designs.

jmg · 2014-08-12 19:06

Cluso99 wrote: »

Now, if I wanted to only generate the VGA section for cog 0, how would I do that?

I think this example from the links given in #4 is what you want

module demuxreg (q, d, ce, clk, rst_n);
        output [15:0] q;
        input [ 7:0] d;
        input ce, clk, rst_n;
        wire [15:0] q;
        wire [ 7:0] n1;
        not u0 (ce_n, ce);
        regblk [B]#(.SIZE( 8))[/B] u1
                              (.q(n1), .d (d), .ce(ce), .clk(clk), .rst_n(rst_n));
        regblk [B]#(.SIZE(16))[/B] u2
                              (.q (q), .d({d,n1}), .ce(ce_n),.clk(clk), .rst_n(rst_n));
endmodule

module regblk (q, d, ce, clk, rst_n);
[B]        parameter SIZE = 4;     // Default[/B]
        output [SIZE-1:0] q;
        input [SIZE-1:0] d;
        input ce, clk, rst_n;
        reg [SIZE-1:0] q;
        always @(posedge clk or negedge rst_n)
                if (!rst_n) q <= 0;
                else if (ce) q <= d;
endmodule

Example 10 - Instantiation using named parameter passing

I think this optionally externally over-rides a default value given to a Param within regblk

Cluso99 · 2014-08-12 19:56

This is what I am trying to do...

In dig.v there is the code which instantiates 8x cogs

genvar i;
generate
    for (i=0; i<8; i++)
    begin : coggen
        cog cog_(    .nres        (nres),
                    .clk_cog    (clk_cog),
                    .clk_pll    (clk_pll),
                    .ena_bus    (ena_bus),
                    .ptr_w        (ptr_w[i]),
                    .ptr_d        (ptr_d),
                    .ena        (cog_ena[i]),
                    .bus_sel    (bus_sel[i]),
                    .bus_r        (bus_r[i]),
                    .bus_e        (bus_e[i]),
                    .bus_w        (bus_w[i]),
                    .bus_s        (bus_s[i]),
                    .bus_a        (bus_a[i]),
                    .bus_d        (bus_d[i]),
                    .bus_q        (bus_q),
                    .bus_c        (bus_c),
                    .bus_ack    (bus_ack[i]),
                    .cnt        (cnt),
                    .pll_in        (pll),
                    .pll_out    (pll[i]),
                    .pin_in        (pin_in),
                    .pin_out    (outx[i]),
                    .pin_dir    (dirx[i])    );
    end
endgenerate

In cog.v there is this section which instantiates a vid section for each cog instantiated.
But, I only want to instantiate a vid section for cog 0, so i want something like this...

// vid
[COLOR=#ff0000]// only create a video section for cog=0
if vid.i==0 then[/COLOR]
wire vidack;
wire [31:0] vid_pin_out;

cog_vid cog_vid_  (    .clk_cog    (clk_cog),
                    .clk_vid    (plla),
                    .ena        (ena),
                    .setvid        (setvid),
                    .setscl        (setscl),
                    .data        (alu_r),
                    .pixel        (s),
                    .color        (d),
                    .aural        (pll_in),
                    .carrier    (pllb),
                    .ack        (vidack),
                    .pin_out    (vid_pin_out) );
[COLOR=#ff0000]// is an else required to supply default values to the wires vid_ack and vid_pin_out ???
endif;[/COLOR]

potatohead · 2014-08-12 20:06

Does it really matter which cog gets VID?

KeithE · 2014-08-12 20:25

I've never tried that, but from the LRM

module multiplier(a,b,product);
parameter a_width = 8, b_width = 8;
localparam product_width = a_width+b_width; // can not be modified
// directly with the defparam statement
// or the module instance statement #
input [a_width-1:0] a;
input [b_width-1:0] b;
output [product_width-1:0] product;

generate
    if((a_width < 8) || (b_width < 8))
        CLA_multiplier #(a_width,b_width) u1(a, b, product);
        // instantiate a CLA multiplier
    else
        WALLACE_multiplier #(a_width,b_width) u1(a, b, product);
        // instantiate a Wallace-tree multiplier
endgenerate
// The generated instance name is u1

endmodule

Cluso99 · 2014-08-12 22:23

potatohead wrote: »

Does it really matter which cog gets VID?

Well, I am not sure how you would then allocate a single VGA to a cog.
Anyway, I am doing this to save gates on the FPGA. I wouldn't expect this to be in a final P1.5.

BTW I had just done some testing and each VGA uses on Cyclone IV EP4CE22..
~394 LEs
~163 Combinatorial
~152 Registers
0 Memory

BTW2 The EP4CE22 has 66Kx9bits of memory. Remember the 9th bit is unusable (unless going to 36bit which I think is way too complex for P1.5).
Gogs each use 2KB = 16KB, so that leaves 50KB for Hub Ram. IIRC 50KB didn't work.

pik33 · 2014-08-12 23:06

Make a new cog code without a video. Instantiate n cogs with video and 8-n cogs without video. Seems to be the simplest way. Also instaed of the for-loop you can copy the instantiation code and then paste it 8 times. It may be not elegant but then every cog can be different.

Cluso99 · 2014-08-13 00:06

pik33 wrote: »

Make a new cog code without a video. Instantiate n cogs with video and 8-n cogs without video. Seems to be the simplest way. Also instaed of the for-loop you can copy the instantiation code and then paste it 8 times. It may be not elegant but then every cog can be different.

I am sure there is a simple way to do this. I have worked out how to disable the VGA always, so just got to get the dig.I parameter passed down for use.

jmg · 2014-08-13 00:40

Cluso99 wrote: »

I am sure there is a simple way to do this. I have worked out how to disable the VGA always, so just got to get the dig.I parameter passed down for use.

Can you not use the method shown in post #5 ?
That has a default inner setting, over ruled by a passed value, which is what you want ?

Willy Ekerslyke · 2014-08-13 00:42

No time to test this but it compiles clean. Hope you can decipher the cut & pastes below..

// In dig.v add COGID parameter
generate
    for (i=0; i<8; i++)
    begin : coggen
        cog #( .COGID(i) )
            cog_(   .nres       (nres),
                    .clk_cog    (clk_cog),
..
..

// In cog.v add incoming COGID param with default value of zero
module              cog
 #(
 parameter integer COGID = 0    // cog number (0..8)
    )
(
input               nres,           // reset
..
..

//  In cog.v use conditional generate of video logic
generate
  if (COGID==0) begin
    cog_vid cog_vid_  ( .clk_cog    (clk_cog),
                        .clk_vid    (plla),
                        .ena        (ena),
                        .setvid     (setvid),
                        .setscl     (setscl),
                        .data       (alu_r),
                        .pixel      (s),
                        .color      (d),
                        .aural      (pll_in),
                        .carrier    (pllb),
                        .ack        (vidack),
                        .pin_out    (vid_pin_out) );
  end
endgenerate

The existing parameter statement is cog.v will generate a warning as they should be localparams change them if you wish.

Cluso99 · 2014-08-13 00:49

jmg,
Sorry, but I don't understand post #5 at all.

Willy,
Thankyou. That seems more like I am after so will give that a try.

KeithE · 2014-08-13 07:28

If you're worried about the undriven outputs (stated in #6), then you can create and instantiate a cog_vid_dummy with the post #13 method which is basically the LRM example that I posted in #8. Your cog_vid_dummy would drive the outputs to whatever values makes the most sense.

Cluso99 · 2014-08-13 12:44

Keith,
I managed to set the outputs when i disabled the vga completely and this worked.
I am working with your solution but so far no success last night when i retired, although i am making progress.
Each time it fails quartus locks up partially and i have to close it. When i reopen, some v files are lost and i have to recopy all files back again.
Where i am up to is quartus complains about dig.v line 92 which is the line following the cogid declaration. It seems quartus is not seeing the cogid statement as a separate statement to the following statement.
I will post this section and the error in the morning (its 5:40am here).
Thanks for your help.

pmrobert · 2014-08-13 13:18

Cluso, here's a dig.v code segment that compiles properly under Q2 14.

genvar i;

generate
	for (i=0; i<8; i++)
	begin : coggen
	cog #( .COGID(i) )
		 cog_(	.nres		(nres),
					.clk_cog	(clk_cog),
					.clk_pll	(clk_pll),
					.ena_bus	(ena_bus),
					.ptr_w		(ptr_w[i]),
					.ptr_d		(ptr_d),
					.ena		(cog_ena[i]),
					.bus_sel	(bus_sel[i]),
					.bus_r		(bus_r[i]),
					.bus_e		(bus_e[i]),
					.bus_w		(bus_w[i]),
					.bus_s		(bus_s[i]),
					.bus_a		(bus_a[i]),
					.bus_d		(bus_d[i]),
					.bus_q		(bus_q),
					.bus_c		(bus_c),
					.bus_ack	(bus_ack[i]),
					.cnt		(cnt),
					.pll_in		(pll),
					.pll_out	(pll[i]),
					.pin_in		(pin_in),
					.pin_out	(outx[i]),
					.pin_dir	(dirx[i])	);
	end
endgenerate

Cluso99 · 2014-08-13 13:24

Thanks pmrobert.
I believe that is what i have (on my xoom tablet in bed, haha).

But, is the indentation of the line following the cogid statement required? I dont think i have this.

BTW Does COGID need to be in caps? Another possibility.

pmrobert · 2014-08-13 13:36

The case causes a problem, the indention does not.

Info (12128): Elaborating entity "dig" for hierarchy "dig:core"
Error (10130): Verilog HDL error at dig.v(92): parameter "COGiD" is not a formal parameter of instantiated module
Error (12152): Can't elaborate user hierarchy "dig:core"
Error: Quartus II 64-Bit Analysis & Synthesis was unsuccessful. 2 errors, 15 warnings
	Error: Peak virtual memory: 569 megabytes
	Error: Processing ended: Wed Aug 13 16:31:28 2014
	Error: Elapsed time: 00:00:01
	Error: Total CPU time (on all processors): 00:00:01
Error (293001): Quartus II Full Compilation was unsuccessful. 4 errors, 15 warnings

This is the thrown error if there is a case disagreement between any of the three COGID containing statements.
BTW, this is an excellent idea for easily reducing used LEs!

Cluso99 · 2014-08-13 13:43

Thanks. I just used lower case for cogid in all places. Does it have to be all caps?

That was my reason - to save logic in the smaller fpgas. We are going to need more space for new code.

pmrobert · 2014-08-13 13:59

I didn't try all lowercase, just replacing the "I" with "i" in all 3 instances. That worked OK. I just tried "cogid" in 13.1 and it compiled fine.

KeithE · 2014-08-13 14:22

Good luck Cluso - I figured that you were trying to save gates. I'm pretty busy and don't have quartus installed, so I'm giving pretty short somewhat generic answers. I wish I had a week solid to play with it - a lot could be done. I guess you'll be playing with reduced cog counts shortly ;-)

Cluso99 · 2014-08-13 14:27

This is weird or I am still asleep...
This works but the commented section doesn't. To me they look the same???

genvar i;
generate
    for (i=0; i<8; i++)
    begin : coggen
    cog #( .COGID(i) )
         cog_(    .nres        (nres),
                    .clk_cog    (clk_cog),
                    .clk_pll    (clk_pll),
                    .ena_bus    (ena_bus),
                    .ptr_w        (ptr_w[i]),
                    .ptr_d        (ptr_d),
                    .ena        (cog_ena[i]),
                    .bus_sel    (bus_sel[i]),
                    .bus_r        (bus_r[i]),
                    .bus_e        (bus_e[i]),
                    .bus_w        (bus_w[i]),
                    .bus_s        (bus_s[i]),
                    .bus_a        (bus_a[i]),
                    .bus_d        (bus_d[i]),
                    .bus_q        (bus_q),
                    .bus_c        (bus_c),
                    .bus_ack    (bus_ack[i]),
                    .cnt        (cnt),
                    .pll_in        (pll),
                    .pll_out    (pll[i]),
                    .pin_in        (pin_in),
                    .pin_out    (outx[i]),
                    .pin_dir    (dirx[i])    );
    end
endgenerate
/*
generate
    for (i=0; i<8; i++)
    begin : coggen
        cog #( .COGID(i) )
        cog cog_(    .nres        (nres),
                    .clk_cog    (clk_cog),
                    .clk_pll    (clk_pll),
                    .ena_bus    (ena_bus),
                    .ptr_w        (ptr_w[i]),
                    .ptr_d        (ptr_d),
                    .ena        (cog_ena[i]),
                    .bus_sel    (bus_sel[i]),
                    .bus_r        (bus_r[i]),
                    .bus_e        (bus_e[i]),
                    .bus_w        (bus_w[i]),
                    .bus_s        (bus_s[i]),
                    .bus_a        (bus_a[i]),
                    .bus_d        (bus_d[i]),
                    .bus_q        (bus_q),
                    .bus_c        (bus_c),
                    .bus_ack    (bus_ack[i]),
                    .cnt        (cnt),
                    .pll_in        (pll),
                    .pll_out    (pll[i]),
                    .pin_in        (pin_in),
                    .pin_out    (outx[i]),
                    .pin_dir    (dirx[i])    );
    end
endgenerate
*/

Can you see a typo? Anyway, doesn't matter, its working thanks.

With 48B RAM, no ROM (so it will notwork at present) and VGA only available for cog 0, it saves ~1,500 LEs. There is more that could be saved as currently all cogs generate the resulting wires.
With all VGA removed, it saves ~3,150 LEs (13,029 vs 16,178)
Here are the cog.v changes (3 lines following the module... line)...

`include "cog_ram.v"
`include "cog_alu.v"
`include "cog_ctr.v"
`include "cog_vid.v"


module                cog
    #(
    parameter integer COGID = 0        // cog number(0..7) from dig.v
       )
(
input                nres,            // reset

and...

// =======================================
// VGA only available for Cog 0
generate
    if (COGID > 0)
    begin
        always @(posedge clk_cog)
            vidack         <= 1'b0;
        always @(posedge clk_cog)
            vid_pin_out    <= 32'b0;
    end
    if (COGID == 0)
    begin
        cog_vid cog_vid_  (    .clk_cog    (clk_cog),
                            .clk_vid    (plla),
                            .ena        (ena),
                            .setvid        (setvid),
                            .setscl        (setscl),
                            .data        (alu_r),
                            .pixel        (s),
                            .color        (d),
                            .aural        (pll_in),
                            .carrier    (pllb),
                            .ack        (vidack),
                            .pin_out    (vid_pin_out) );
    end
endgenerate
// =======================================

jmg · 2014-08-13 14:35

Cluso99 wrote: »

Can you see a typo? Anyway, doesn't matter, its working thanks.

They differ on this line
[ cog_( .nres (nres), ] <> [ cog cog_( .nres (nres),]

pmrobert · 2014-08-13 14:46

Shuffling off to see how to do #defines/#ifdef/etc. in Verilog. I don't want multiple versions of nearly identical code.

jmg · 2014-08-13 14:51

pmrobert wrote: »

Shuffling off to see how to do #defines/#ifdef/etc. in Verilog. I don't want multiple versions of nearly identical code.

It's just like you'd expect, but with the # replaced with verilog char ` (see `include)
`define dKeyWord
`ifdef dKeyWord
`else
`endif

KeithE · 2014-08-13 15:06

Defines? Here's a paper you might check out: http://www.veripool.org/papers/Preproc_Good_Evil_SNUGBos10_pres.pdf

See if you can handle it with parameters. defines are useful - we use them more for global things. e.g. FPGA versus ASIC. If you try to use them for heterogeneous cogs then you may run into some difficulties.

Cluso99 · 2014-08-13 15:13

jmg wrote: »

They differ on this line
[ cog_( .nres (nres), ] <> [ cog cog_( .nres (nres),]

Thanks. Just realised myself... need more coffee!!!

Cluso99 · 2014-08-13 21:01

I have reduced the code that instantiates the VGA and saved more LEs.
For reference, my code has all Hub RAM 48KB & Chips Reset fix for DIRA.

8 cogs with VGA:
14,683 LEs
13,359 Combinational
5,445 Registers

8 cogs but only 1 VGA (cog 0):
13,022 LEs
12,217 Combinational
4,381 Registers

8 cogs, and NO VGA:
13,022 LEs
12,051 Combinational
4,229 Registers

I did find that simplifying the statements around the vid generation for only cog 0 reduced the logic, depending on the arrangement.
For example, the two blocks below gave quite different results!!!

// =======================================
// VGA only available for Cog 0
generate
    if (COGID > 0)
    begin
        always @(posedge clk_cog)
            vidack         <= 1'b0;
        always @(posedge clk_cog)
            vid_pin_out    <= 32'b0;
    end
    if (COGID == 0)
    begin
        cog_vid cog_vid_  (    .clk_cog    (clk_cog),
                            .clk_vid    (plla),
                            .ena        (ena),
                            .setvid        (setvid),
                            .setscl        (setscl),
                            .data        (alu_r),
                            .pixel        (s),
                            .color        (d),
                            .aural        (pll_in),
                            .carrier    (pllb),
                            .ack        (vidack),
                            .pin_out    (vid_pin_out) );
    end
endgenerate
// =======================================

This significantly reduced the LEs used

// =======================================
// VGA only available for Cog 0
generate
    if (COGID == 0)
    begin
        cog_vid cog_vid_  (    .clk_cog    (clk_cog),
                            .clk_vid    (plla),
                            .ena        (ena),
                            .setvid        (setvid),
                            .setscl        (setscl),
                            .data        (alu_r),
                            .pixel        (s),
                            .color        (d),
                            .aural        (pll_in),
                            .carrier    (pllb),
                            .ack        (vidack),
                            .pin_out    (vid_pin_out) );
    end
    else
    begin
        always @(posedge clk_cog)
            vidack         <= 1'b0;
        always @(posedge clk_cog)
            vid_pin_out    <= 32'b0;
    end
endgenerate
// =======================================

And simpler again, but used the same as above

// =======================================
// VGA only available for Cog 0
generate
    if (COGID == 0)
    begin
        cog_vid cog_vid_  (    .clk_cog    (clk_cog),
                            .clk_vid    (plla),
                            .ena        (ena),
                            .setvid        (setvid),
                            .setscl        (setscl),
                            .data        (alu_r),
                            .pixel        (s),
                            .color        (d),
                            .aural        (pll_in),
                            .carrier    (pllb),
                            .ack        (vidack),
                            .pin_out    (vid_pin_out) );
    end
endgenerate
// =======================================

cog-dig.zip

pik33 · 2014-08-13 21:36

Maybe it will be better to make vga for cog7 instead of cog0; cog0 is busy with spin interpreter and it has to be free to run vga pasm code.

jmg · 2014-08-13 21:40

Cluso99 wrote: »

I have reduced the code that instantiates the VGA and saved more LEs.

Can you give an original for reference (if the first one is not original ?) and the target device and reported MHz and reported % spare resource, for these builds.
With so many targets, and MHz values floating about, it can be good to know which is what.
Knowing how much is spare, gives an indication of what else could be included.

Cluso99 wrote:

<paste>
BTW2 The EP4CE22 has 66Kx9bits of memory. Remember the 9th bit is unusable (unless going to 36bit which I think is way too complex for P1.5).

36b could be practically used for things like extending D,S fields, with little core code change.
It would give Binary super-set opcodes.

On the topic of FPGA RAM, I notice the Cyclone V has 10b quanta on BRAM, not 9, for 10K blocks, which means you could tile the memory 4 x 40 to have 5 columns, each 32b wide (but the height needs to be binary for simplicity?)

On a Cyclone V that means you could do 4 x 40, but read as 5x32, for 160k, freeing up 32 10K blocks, which could deliver COGs with an average of 1536 Longs.(and still have opcode bits spare in the 40b wide).
A mix of 2048 and 1024 would be one way to map that, or make them all 1536

Packs well into a Cyclone V, but is rather less FPGA portable, as I'm not sure who else has 10b wide memory ?.

Questions on the P1 Verilog code to anyone...

Comments