Interesting. The original code I gave in #13 didn't zero the unused vidack & vid_pin_out because it made no difference to device usage in my setup. Without the initialising:
Flow Status Successful - Thu Aug 14 06:56:27 2014
Quartus II 32-bit Version 13.1.0 Build 162 10/23/2013 SJ Web Edition
Revision Name top
Top-level Entity Name top
Family Cyclone III
Device EP3C16F484C6
Timing Models Final
Total logic elements 13,402 / 15,408 ( 87 % )
Total combinational functions 12,299 / 15,408 ( 80 % )
Dedicated logic registers 4,390 / 15,408 ( 28 % )
Total registers 4390
Total pins 42 / 347 ( 12 % )
Total virtual pins 0
Total memory bits 458,752 / 516,096 ( 89 % )
Embedded Multiplier 9-bit elements 0 / 112 ( 0 % )
Total PLLs 1 / 4 ( 25 % )
With:
Flow Status Successful - Thu Aug 14 07:20:16 2014
Quartus II 32-bit Version 13.1.0 Build 162 10/23/2013 SJ Web Edition
Revision Name top
Top-level Entity Name top
Family Cyclone III
Device EP3C16F484C6
Timing Models Final
Total logic elements 13,402 / 15,408 ( 87 % )
Total combinational functions 12,300 / 15,408 ( 80 % )
Dedicated logic registers 4,390 / 15,408 ( 28 % )
Total registers 4390
Total pins 42 / 347 ( 12 % )
Total virtual pins 0
Total memory bits 458,752 / 516,096 ( 89 % )
Embedded Multiplier 9-bit elements 0 / 112 ( 0 % )
Total PLLs 1 / 4 ( 25 % )
Then I realised that my project is set to optimise for space rather than speed. It'd be interesting to see if you can make further space savings by changing your project's optimisation too.
Interesting. The original code I gave in #13 didn't zero the unused vidack & vid_pin_out because it made no difference to device usage in my setup. Without the initialising:
Then I realised that my project is set to optimise for space rather than speed. It'd be interesting to see if you can make further space savings by changing your project's optimisation too.
Here is my latest compile results...
+---------------------------------------------------------------------------------+
; Flow Summary ;
+------------------------------------+--------------------------------------------+
; Flow Status ; Successful - Thu Aug 14 17:41:01 2014 ;
; Quartus II 64-Bit Version ; 14.0.0 Build 200 06/17/2014 SJ Web Edition ;
; Revision Name ; top ;
; Top-level Entity Name ; top ;
; Family ; Cyclone IV E ;
; Device ; EP4CE22F17C6 ;
; Timing Models ; Final ;
; Total logic elements ; 12,971 / 22,320 ( 58 % ) ;
; Total combinational functions ; 12,063 / 22,320 ( 54 % ) ;
; Dedicated logic registers ; 4,195 / 22,320 ( 19 % ) ;
; Total registers ; 4195 ;
; Total pins ; 42 / 154 ( 27 % ) ;
; Total virtual pins ; 0 ;
; Total memory bits ; 524,288 / 608,256 ( 86 % ) ;
; Embedded Multiplier 9-bit elements ; 0 / 132 ( 0 % ) ;
; Total PLLs ; 1 / 4 ( 25 % ) ;
+------------------------------------+--------------------------------------------+
If you have time, I would love to see what results Quartus 13 comes up with.
Attached are my modified files (I have modified cog.v, dig.v, hub.v, hub_mem.v). deo_cluso_20140814.zip
Currently, I have no idea of the settings. I basically just downloaded and ran.
I do have timing errors and I am not sure what the max speed is (or where to look yet - still learning)
I am not sure how to do the $readmemh command.
Can anyone answer help?
// RR20140811 Bigger Hub RAM & No ROM
// RAM is 48KB and remaps 48-64KB to 32-48KB
// requires $readmemh("hub_rom_high.hex",memname,$8000,$BFFF);
// =======================================
module hub_mem
(
input clk_cog,
input ena_bus,
input w,
input [3:0] wb,
input [13:0] a,
input [31:0] d,
output [31:0] q
);
// 12288 x 32 ram with byte-write enables ($0000..$BFFF) = 48KB for DE0 (compiles)
reg [7:0] ram3 [12287:0];
reg [7:0] ram2 [12287:0];
reg [7:0] ram1 [12287:0];
reg [7:0] ram0 [12287:0];
// ??? $readmemh("hub_rom_high.hex",memname,$8000,$BFFF);
reg [7:0] ram_q3;
reg [7:0] ram_q2;
reg [7:0] ram_q1;
reg [7:0] ram_q0;
wire [13:0] addr = !a[13] ? a[13:0] : {2'b10, a[11:0]}; // remap RAM: 48-64KB overlays 32-48KB
always @(posedge clk_cog)
begin
if (ena_bus && w && wb[3])
ram3[addr[13:0]] <= d[31:24];
if (ena_bus)
ram_q3 <= ram3[addr[13:0]];
end
always @(posedge clk_cog)
begin
if (ena_bus && w && wb[2])
ram2[addr[13:0]] <= d[23:16];
if (ena_bus)
ram_q2 <= ram2[addr[13:0]];
end
always @(posedge clk_cog)
begin
if (ena_bus && w && wb[1])
ram1[addr[13:0]] <= d[15:8];
if (ena_bus)
ram_q1 <= ram1[addr[13:0]];
end
always @(posedge clk_cog)
begin
if (ena_bus && w && wb[0])
ram0[addr[13:0]] <= d[7:0];
if (ena_bus)
ram_q0 <= ram0[addr[13:0]];
end
assign q = {ram_q3, ram_q2, ram_q1, ram_q0};
endmodule
/*
Copyright 2014 Parallax Inc.
This file is part of the hardware description for the Propeller 1 Design.
The Propeller 1 Design is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your option)
any later version.
The Propeller 1 Design is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along with
the Propeller 1 Design. If not, see <http://www.gnu.org/licenses/>.
*/
If you have time, I would love to see what results Quartus 13 comes up with.
I'm already using 13.1. Can't use 14 because it doesn't support my board's Cyclone III.
Currently, I have no idea of the settings. I basically just downloaded and ran.
In Quartus go to the Assignments menu and select Settings. In the Settings window select Analysis & Synthesis Settings and then on the right you can select Optimistsation for Speed, Area or a Balance between the two.
I'm already using 13.1. Can't use 14 because it doesn't support my board's Cyclone III.
Yes, thats what I would like to see - 13.1 results. I am using 14, so it would be a good comparison.
In Quartus go to the Assignments menu and select Settings. In the Settings window select Analysis & Synthesis Settings and then on the right you can select Optimistsation for Speed, Area or a Balance between the two.
Thanks. Just checked and its set for
Under Assignments / Settings / Physical Synthesis Optimisations / Effort Level / Extra is ticked (optimise during synthesis and fitting; major compilation time increase)
Other options were Fast & Normal. There are other sections but nothing ticked.
I was trying to avoid having 4 files, each with interleaved bytes fron longs.
I think you would then need to re-organise the memory declaration so there is a single name to point $readmemh at.
Seems that can be any width, eg 32,36,40,160 wide, the hex file line width just matches the Array Name width..
Looking at the Cyclone V, if the memory is declared as 160b (4x40) wide, you can extract 5x32 from that for no wastage.( and a 25% boost in available Memory, on the CV). (Likewise 8x36 would allow 9x32 extract)
Is this possible and if so, can you help with the syntax for this line please?
ram32{[31:24][addr[13:0]]} <= d[31:24];
From this I should be able to figure out the problem 2 lines lower.
I have converted the RAM to 32bit wide to make it easier to load the default ROM file.
reg [31:0] ram32 [12287]; // RAM 48KB as longs
reg [31:0] ram32_q;
wire [13:0] addr = !a[13] ? a[13:0] : {2'b10, a[11:0]}; // remap RAM: 48-64KB overlays 32-48KB
always @(posedge clk_cog)
begin
if (ena_bus && w && wb[3])
ram32{[31:24][addr[13:0]]} <= d[31:24];
if (ena_bus)
ram32_q[31:24] <= ram32{[31:24][addr[13:0]]};
end
I am unsure if I need to bracket groups, and whether the else works for all???
The else part doesn't work for all - each '?' operator needs a matching ':'.
if ix[oh:ol] == 6'b000110
if ix[wz:im] == 4'b0001
sa <= 14'b0, ix[sh:sl]
else if ix[wz:wr] == 3'b010
sa <= ix[22:0]
else
sa <= 32'b0
else
sa <= 32'b0
Is there a better way to do this???
If you are doing this within an always block (as implied by the '<=') then you could try this:
sa <= 32'b0; // set default value
if ix[oh:ol] == 6'b000110
if ix[wz:im] == 4'b0001
sa <= 14'b0, ix[sh:sl]
else if ix[wz:wr] == 3'b010
sa <= ix[22:0]
Also noticed this in cog.v
always @(posedge clk_cog or negedge ena)
if (!ena)
p <= 1'b0;
else if (m[3] && !(cond && jump_cancel))
p <= px + 1'b1;
Shouldn't p <= 1'b0 be p <= 9'b0 ???
Verilog will extend the 1'b0 with as many extra leading 'b0 as necessary to match the width of the destination. From a code-documentation point of view it may seem useful for the constant's width to match the destination but can you trust that it's been maintained correctly when the width of the destination was changed?
Comments
Although it is also possible some will run with less than 8 cogs.
With:
Then I realised that my project is set to optimise for space rather than speed. It'd be interesting to see if you can make further space savings by changing your project's optimisation too.
Attached are my modified files (I have modified cog.v, dig.v, hub.v, hub_mem.v).
deo_cluso_20140814.zip
Currently, I have no idea of the settings. I basically just downloaded and ran.
I do have timing errors and I am not sure what the max speed is (or where to look yet - still learning)
I am not sure how to do the $readmemh command.
Can anyone answer help?
The hex file looks like this:
I'm already using 13.1. Can't use 14 because it doesn't support my board's Cyclone III.
In Quartus go to the Assignments menu and select Settings. In the Settings window select Analysis & Synthesis Settings and then on the right you can select Optimistsation for Speed, Area or a Balance between the two.
Under Assignments / Settings / Physical Synthesis Optimisations / Effort Level / Extra is ticked (optimise during synthesis and fitting; major compilation time increase)
Other options were Fast & Normal. There are other sections but nothing ticked.
Our RAM is built as byte blocks. Do you know how I name this?
I can build the hex file easily.
I was trying to avoid having 4 files, each with interleaved bytes fron longs.
Flow Status Successful - Thu Aug 14 10:54:11 2014
Quartus II 64-Bit Version 13.1.0 Build 162 10/23/2013 SJ Web Edition
Revision Name top
Top-level Entity Name top
Family Cyclone V
Device 5CGXFC5C6F27C7
Timing Models Final
Logic utilization (in ALMs) 8,010 / 29,080 ( 28 % )
Total registers 5884
Total pins 42 / 364 ( 12 % )
Total virtual pins 0
Total block memory bits 524,288 / 4,567,040 ( 11 % )
Total DSP Blocks 0 / 150 ( 0 % )
Total HSSI RX PCSs 0 / 6 ( 0 % )
Total HSSI PMA RX Deserializers 0 / 6 ( 0 % )
Total HSSI TX PCSs 0 / 6 ( 0 % )
Total HSSI TX Channels 0 / 6 ( 0 % )
Total PLLs 1 / 12 ( 8 % )
Total DLLs 0 / 4 ( 0 % )
I think you would then need to re-organise the memory declaration so there is a single name to point $readmemh at.
Seems that can be any width, eg 32,36,40,160 wide, the hex file line width just matches the Array Name width..
Looking at the Cyclone V, if the memory is declared as 160b (4x40) wide, you can extract 5x32 from that for no wastage.( and a 25% boost in available Memory, on the CV). (Likewise 8x36 would allow 9x32 extract)
ram32{[31:24][addr[13:0]]} <= d[31:24];
From this I should be able to figure out the problem 2 lines lower.
I have converted the RAM to 32bit wide to make it easier to load the default ROM file.
I have figured it out
I found this
http://stackoverflow.com/questions/9075289/bit-half-byte-byte-memory-access-in-32-bit-memory-using-verilog
and note the comment there
BlockRAMs have an option to support individual byte enables.
I think that matters, as if you need single-byte writes into 32b memory, without a read+mask, then you need to create the BLOCK RAM using byteena[3:0]
The Altera doc CV-52002 2014.06.30 has a byteena[3:0] example in
Table 2-13: byteena Controls in x40 Data Width
I think this detail scuttles my idea of packing 5 x 32 in 4 x 40, as we do not have 5 byteena lines. Oh well...
This is what I am trying to achieve... Is there a better way to do this???
Also noticed this in cog.v Shouldn't p <= 1'b0 be p <= 9'b0 ???
If you are doing this within an always block (as implied by the '<=') then you could try this: Verilog will extend the 1'b0 with as many extra leading 'b0 as necessary to match the width of the destination. From a code-documentation point of view it may seem useful for the constant's width to match the destination but can you trust that it's been maintained correctly when the width of the destination was changed?