I'm going to give up on getting the RISC V development tools going on a Raspberry Pi for now. My 32 GB SD card gave up the ghost (coincidence?) and an 8 GB card is just too small even after nuking Mathematica, LibreOffice, Minecraft, and SonicPi. Seems like cross compiling is the way to go - seeing what goes on in the sausage factory makes me a bit nostalgic for TurboPascal and LightSpeed C. Maybe since Peter is watching there will be a small Forth ;-)
Let me know if any warnings are still there. Cool that it runs at 100 MHz and you get to play with the PLL.
Well don't let me stop you from trying to get the toolchain up. Maybe you'll have better luck - I was trying to build everything, but perhaps that was simply too ambitious. Are you an expert in cross compilation?
(I built OpenCV on that same SD card, and that was pretty large. I probably would have given up if I tried to do it all on my own. Fortunately Adrian Rosebrock has a set of nice instructions on his blog. Now he sells a prebuilt SD image as part of his OpenCV book bundles; probably because he realized that most people would get stuck or bored along the way.)
If any Verilog warnings stump you feel free to post.
As you noticed cross-compiling is easy enough, we do it all the time for Propeller, AVR, STM32, etc, etc. But getting the tool chain built and installed can be a nightmare. Works well enough with a good build system and some good instructions and perhaps a guru on hand to advise. Often times I have given up in frustration.
I was curious how if it's doable for the average person with just a Pi and not much else. e.g. a high-school student
Those digital design projects for schools look interesting.
I mentioned it here somewhere before recently but now we have Free and Open Source tools for synthesizing Verilog for those cheap Lattice devices. We have Free and Open Source RISC V (picorv32 and others) implementations that fit on those Lattice parts with space to spare for other useful logic. We have the Free and Open Source RISC V GCC tool chain. We have Yosys. All of which runs on the Raspberry Pi. As I said, soon we will have 10 year old kids turning out SoCs with their own digital designs.
So yes, this is all doable for the average person.
Perhaps it needs packaging up into an easy to use system. The Icoboard project is an attempt at this: http://icoboard.org/
Those high school projects could certainly make use of this I'm sure.
It should be easy to `ifdef it away. Just have a SIMULATION define. You can create a 100 MHz clock pretty easily if you hard code everything. Either just make a new clock, or use delays and xor. I have to run, and you seem excited to figure this sort of stuff out so I'll let it at that for now.
I played around with it a bit. Here's some advice that you can ignore or not.
First I noticed that most files didn't define a timescale. It's good to add them - at some point this will save you some hassle. One easy way to do this is to make a header file to define it. Then if you ever want to change it you can edit this one file. And if it's not suitable for a particular file, then you can do something else there. (E.g. perhaps a PLL needs a different timescale to simulate properly)
$ cat inc/timescale.vh
`timescale 1 ns / 1 ps
// Use this with non-blocking delays if desired.
// e.g. q <= `D d;
// See http://www.sunburst-design.com/papers/CummingsSNUG2002Boston_NBAwithDelays.pdf
//
`ifndef D
`define D (#1)
`endif
Then put
`include "timescale.vh"
at the top of all of your files. (The cpu already has an embedded timescale.)
You can tell Icarus where to look for include files on the command line or through a command file. (Some call this a manifest file since you can also list all of your sources in it)
Here's an example:
I had been carefully avoiding thinking about actual time scaling in simulations. It's just logic right Anyway I hadn't got around to finding out what that mysterious "timespec" thing meant. But of course it's a good idea to get that right. I added your timespec.vh and pushed to the repo.
The manifest is a great idea too. I also added that.
That paper you linked to is a monster. TL;DR most of it! But these guidelines from there look good:
Guideline #1: When modeling sequential logic, use nonblocking assignments.
Guideline #2: When modeling latches, use nonblocking assignments.
Guideline #3: When modeling combinational logic with an always block, use blocking
assignments.
Guideline #4: When modeling both sequential and combinational logic within the same always
block, use nonblocking assignments.
Guideline #5: Do not mix blocking and nonblocking assignments in the same always block.
Guideline #6: Do not make assignments to the same variable from more than one always block.
Guideline #7: Use $strobe to display values that have been assigned using nonblocking
assignments.
Guideline #8: Do not make assignments using #0 delays.
I added a guidlines.txt to the repo with that in it as a reminder.
He also has some nice suggestions about generating reset and clock in test benches towards the end.
I'll have a look at your suggestions re: clock generation in the tb after a nap.
Heater - there are a lot of interesting papers there. For FPGA work you can sort of take it easy ;-)
I have a question about a statement that was made earlier in this thread by Andy.
The assembler will produce 2 instructions for li that will be a LUI and an ADDI to compose a 32bit constant
Since I don't have all of the RISC V tools I've been playing with a somewhat broken assembler written in python. (I don't recommend it, but it was quick to write some really simple test. https://github.com/wueric/riscv_assembler if anyone must look.)
The question is since ADDI and ORI both sign extend, is the above statement true? I'm probably missing something really simple but I'm getting a lot of 1's in the msbs.
Older, simpler CPU designs would just add cycles and perform more ops. 6809 could end up with 14 cycle plus, if one did something crazy like program counter relative, indexed by register, post decrement...
And there was LEA, load effective address so an instruction like that could be computed once, result captured for simpler faster forms.
With a pipeline, breaking ops out as instructions does the same thing, a bit of speed traded for maybe program size, complexity.
then does it become something different? In this case bit 11 is a one, so sign extension comes into play. Unless I'm really confused. I think that I found that macro in tc-riscv.c but was just now studying it.
Edited to add: the relevant part seems to be load_const
And edited again: I should read your examples more closely. Got fooled be decimal versus hex. They are doing some transforming in there for 123456789.
Heater : could you please post your firmware.xx files ? Pretty please ? . I'll try to compile the toolchain but using mingw,seeing that cygwin and the linux on windoze already failed....
Compiling with mingw seems to be a dead end too. It checks, for instance, that bcopy is available, says no but uses it anyways and complains that cannot be found. free seems also not to be available .
Ubuntu on windoze doesn't even go that far:
On the MAX10 (I got a Max10-Lite board: 50 kLUTS + 64 MB SDRAM):
Flow Status Successful - Thu Apr 13 11:04:11 2017
Quartus Prime Version 16.1.0 Build 196 10/24/2016 SJ Lite Edition
Revision Name xoro
Top-level Entity Name xoro_top
Family MAX 10
Device 10M50DAF484C7G
Timing Models Final
Total logic elements 2,752 / 49,760 ( 6 % )
Total registers 1276
Total pins 15 / 360 ( 4 % )
Total virtual pins 0
Total memory bits 395,264 / 1,677,312 ( 24 % )
Embedded Multiplier 9-bit elements 8 / 288 ( 3 % )
Total PLLs 1 / 4 ( 25 % )
UFM blocks 0 / 1 ( 0 % )
ADC blocks 0 / 2 ( 0 % )
Compiling on Ubuntu for windoze halts here:
checking whether rename honors trailing slash on destination... no
checking whether rename honors trailing slash on source...
^C
^C^C^C^C
It is so dead that it doesn't even react to Ctrl-C... I didn't learn the words to describe that. (There are no words to describe how ugly that is (you are), yes there are but you don't known them yet).
it is even worse that I thought, bash doesn't start anymore...
The MAX 10 looks like a great board. On the other hand I was thinking of downsizing to a Lattice and checking out the open source FPGA tool chain.
Your riscv tools build on "Winbuntu" (The worst of both worlds!) failed in the same way mine did. Locked up solid. Tried it twice.
I ended up building the tools on my Debian box and copying the result into Winbuntu. Works fine. Probably building in Debian under Virtual box is a better idea.
Ale, two common issues I have found before with github and mixed windows/unix source/compile:
1) Never use github "download as zip", It is broken. You will not get the exact original files.
2) Most tools (iverilog, perl ...) are sensitive to CRLF/LF, use dos2unix (or unix2dos) before and check if that solves the issue.
(Quite interesting links on this thread, and thanks KeithE for you tips.
Also this thread mentions two great genius programmers of XXI century : Clifford and Fabrice
About cross compiling : the current king of cross compiling is Rob Landley 'Aboriginal Linux'. A set of scripts and with one single command it will automagically download all sources, cross compile linux, and start the kernel into a QEMU virtual machine, all flawlessly).
Looking around it seems that there's not much of an alternative to using riscv64-unknown-*-gcc. (e.g. LLVM
This backend currently only supports assembly generation and riscv64-unknown-*-gcc must be used to assemble and link the executable.
So I'll have to get around to building that on the Pi. But first I wanted to give those free and open synthesis tools a try.
It seems that XORI is easier for my dumb brain than ADDI. Part of this adventure is getting some exposure to RISC V assembly so I wanted to play around a bit. (In the code below I have the full 32-bit constants, but only a portion is used in each line) I guess it's a good idea to always sign extend in hardware and just deal with this in the tools. It should simplify the hardware.
# Loading a 32-bit constant with bit 11 set
# First load the complement
LUI x31,0x55555555
# Then XOR with the constant
XORI x31,x31,0xaaaaaaaa
#
# Loading a 32-bit constant with bit 11 clear
# Just load the constant
LUI x31,0x55555555
# And XOR with the constant
XORI x31,x31,0x55555555
BTW - you can write to x0, but the value is effectively zeroed for reads. I think that's handled by this code. You can only see the value if you look at dbg_reg_x0 in waveforms. I've seen this style before in other processors. I was looking here before I understood the constant loading since I thought some code I wrote broke things.
1295 always @* begin
1296 decoded_rs = 'bx;
1297 if (ENABLE_REGS_DUALPORT) begin
1298 cpuregs_rs1 = decoded_rs1 ? cpuregs[decoded_rs1] : 0;
1299 cpuregs_rs2 = decoded_rs2 ? cpuregs[decoded_rs2] : 0;
1300 end else begin
1301 decoded_rs = (cpu_state == cpu_state_ld_rs2) ? decoded_rs2 : decoded_rs1;
1302 cpuregs_rs1 = decoded_rs ? cpuregs[decoded_rs] : 0;
1303 cpuregs_rs2 = cpuregs_rs1;
1304 end
1305 end
For one thing it appears that he proves those parallel case statements. And you can see what he's doing in the FORMAL sections.
Edited to add: I find that using all of the Pi's cores with "make -j$(nproc)" can cause hangs when building IceStorm. This is with a fan blowing on the board as well. Not sure what's going on with that. Maybe it's due to the limited amount of RAM?
So I wrote a little SPI driver in Verilog with the intention of accessing the ADC on the DE0 nano. Basically it's intended to clock out 16 bits from a register to the device over MOSI on the falling edge of the SCLK and clock in 16 bits from the device over MISO on the rising edge of SCLK. Sounds simple enough.
(The ADC only needs 4 channel select bits out and only delivers 12 data bits in, I thought I'd handle that detail in whatever is using this module).
Now, I'd like to have a test bench for this to run under icarus before going to Quartus and spending hours synthesizing and running things.
But it looks to me like I would then have to write a verilog simulation of the ADC SPI interface for my driver to talk to.
Now, I'd like to have a test bench for that verilog ADC simulation before I use it.
But that test bench would look like my ADC driver that I want to test....
Chicken and egg.
How do real verilog designers handle this?
At this point I'm inclined to write a test bench that just checks the right bits get clocked out on the right edges and forget about clocking bits in.
Just hope that works and then go to Quartus.
Unlike the UART transmitter I don't think I can get to the ADC pins with a scope to check what comes out.
Here is the SPI driver so far, do let me know if it is crappy, actually I have never bit banged a SPI device before so the whole idea might be wrong :
`include "timescale.vh"
module spi (
input wire clk,
input wire resn,
input wire trig,
output reg [15:0] rdData,
input wire [15:0] wrData,
// SPI interface
output reg SCLK,
output reg SS,
output reg MOSI,
input wire MISO
);
reg [3:0] state;
reg [15:0] bitCount;
reg [15:0] clkCounter;
// Generate SPI clock
// ADC operates from 0.8 to 3.2 MHz
always @ (posedge clk) begin
clkCounter = clkCounter + 1;
if (clkCounter == 33) begin
SCLK <= !SCLK;
end
end
always @ (SCLK) begin
if (!resn) begin
SS <= 1;
MOSI <= 0;
state <= 0;
bitCount <= 0;
end else begin
case (state)
0: begin
// Idle
if (trig) begin
if (SCLK == 0) begin
// SS should be lowered on the first falling edge of SCLK
SS <= 0;
state <= 1;
bitCount <= 15;
end
end
end
1: begin
if (SCLK == 0) begin
// In order to avoid potential race conditions, the
// user should generate MOSI on the negative edges of SCLK.
MOSI <= wrData[bitCount];
bitCount <= bitCount - 1;
if (bitCount == 0) begin
state <= 2;
end
end else begin
// Capture data bits on the rising edge of SCLK.
rdData[bitCount] <= MISO;
end
end
2: begin
if (SCLK == 1) begin
// SS should be raised on the last rising edge of an operational frame.
SS <= 1;
MOSI <= 0;
state <= 0;
bitCount <= 0;
end
end
default: ;
endcase
end
end
endmodule
Typically you would have a design engineer and verification engineer developing based on the same specification. It's always dangerous when the same person does both. This helps to catch errors that are caused by mistakes or by misinterpreting the specification. Also the either of these parties might buy or reuse "silicon proven" IP. And the verification IP is typically written in a more behavioral style. Of course this is no guarantee of success. So everyone uses FPGAs or other emulation platforms to interface to real hardware before takeout if at all possible ;-)
One question you might ask is how you would do this in software with GPIOs. Looking at that "always @ (SCLK) begin" makes me wonder if the synthesizer will deal with it. Maybe there could be a single flop to reclock output data on the falling edge? Or perhaps all of the logic could run from clk and take action on the edges of SCLK treating it like a data signal? Using clk for everything means you don't need to worry about different clock domains and timing constraints. Since you probably don't care about power maybe best to start simple?
Also I guess you could do this in software with GPIOs and then write the hardware ;-)
Edited to add: also if you use clk for everything then hopefully you don't need to worry about synchronization. In this example it looks like you're sending a multiunit value out on rdData and software just reads that at anytime to get the latest reading. If it's changing with some skew relative to clk, then software could get a bad reading. (And if you start using generated clocks in FPGAs you need to be careful. e.g. sending clocks through fabric intended for data.)
Edited again to add: this does bring up one interesting area. If you find yourself using multiple clocks, then you should make sure to understand CDC (clock domain crossing.) Search for synchronizers and perhaps metastability. There are all sorts of structures for different situations. The classic is a FIFO, but it's a complex place to start. There is a paper here on below. But the simplest situation would be sampling a single-bit asynchronous signal. A lot of people just put that through a couple of flip-flops. Whatever you do I recommend making a library of CDC stuff, and always instantiating from there. Often you'll see code with CDC structures written manually, and then it's harder to make changes. Or flag things - e.g. there are tools that look for CDC logic, and maybe you can give them hints.
That is exactly the document I am working from. In fact the few comments I have in the code are pretty much cut and pasted from there, just to remind me what is expected.
Typically you would have a design engineer and verification engineer developing based on the same specification. It's always dangerous when the same person does both. This helps to catch errors that are caused by mistakes or by misinterpreting the specification.
Oh yeah. Had my fill of that either writing or testing software in the avionics flight control software business. I was on the team testing the code for the Boeing 777's primary flight computers. Happy days!
In this case there is only me. Well, and guys like you if you want to pitch in.
One question you might ask is how you would do this in software with GPIOs.... Also I guess you could do this in software with GPIOs and then write the hardware ;-)
Indeed, I'd do it much the same way as I presented in Verilog! Thing is in software, on a Propeller say, you get a very quick turn around time as you try and fail, edit try and fail... I don't want to do that in the immensely slow and cumbersome Quartus!
Looking at that "always @ (SCLK) begin" makes me wonder if the synthesizer will deal with it. Maybe there could be a single flop to reclock output data on the falling edge? Or perhaps all of the logic could run from clk and take action on the edges of SCLK treating it like a data signal? Using clk for everything means you don't need to worry about different clock domains and timing constraints. Since you probably don't care about power maybe best to start simple?
Can we talk about "clock domains"?
My idea of different clock domains is dealing with two physically different clocks, say running from two different crystal oscillators, that have an unknown and varying phase relationship.
To my mind, everything I have done so far is in the same clock domain. Even if I divide that clock down the phase relations are constant.
Thing is, I have done a bit of digital design in the distant past. I have fixed problems with multi-processor systems running off different clocks getting into a mess when interacting with each other.
Problem is I don't yet have a good feel of how this Verilog thing maps to actual gates and transistors.
In this example it looks like you're sending a multiunit value out on rdData and software just reads that at anytime to get the latest reading. If it's changing with some skew relative to clk, then software could get a bad reading.
Good point.
Actually my idea was not that this module gets read by software directly. I was thinking to have a higher level module that triggered this process, waited for the transfer to complete, then read all the 16 bits of rData. That higher level process would cycle through all 8 channel addresses, and it would provide the bus interface to the processor. It would provide eight 16 bit registers from which software could get the latest ADC readings.
Perhaps I have not though this through enough...For example, my proposed higher level process has no idea when the transfer is complete!
Maybe I'm being paranoid, but the question is what would guarantee that:
always @ (posedge clk) begin
...
SCLK <= !SCLK;
is going to generate a zero skew clock where the edges line up with clk? It's not going to be asynchronous, but I've heard of some bad experiences. But it has worked for some devices in the past as well. I know some guys only use special clocking hardware (e.g. Xilinx MMCMs) to generate clocks. Maybe it's ok for Altera Cyclone IV parts. But somehow the data signal from the flop has to get back onto the clocking fabric. There's probably going to be delay. How well is it all characterized?
It just so happens that this caused me to think that random people reading the thread might like to know about how to solve the general problem where something is truly asynchronous. (A simple example would be a UART receiver)
Let's say that you choose to be paranoid and somewhat lazy. Then there are multiple ways you can go. One way is to detect edges on SCLK and react to those. Then you don't need to worry about timing any more than you already are - o.k. you do need to make sure that SPI to the ADC is going to meet setup/hold times versus SCLK.
reg SCLK_del;
wire SCLK_posedge = SCLK & ~SCLK_del;
...
@(posedge clk) begin
...
SCLK_DEL <= SCLK;
...
if (SCLK_posedge) begin
...
end
end
Hopefully you can understand the above idea. By referencing how you would do it with GPIOs one idea is that you would only being using positive edges of some high frequency clock. It's not the most elegant or power efficient way, but maybe not the worst. (The next step could be to use clock enables to make it more power efficient.)
Comments
Let me know if any warnings are still there. Cool that it runs at 100 MHz and you get to play with the PLL.
Last time I did such a big build on a Pi it was the Qt 5 libraries. Had to mount all the files and provide a big swap file on a network share!
There is still plenty of warnings. I keep making more everyday
(I built OpenCV on that same SD card, and that was pretty large. I probably would have given up if I tried to do it all on my own. Fortunately Adrian Rosebrock has a set of nice instructions on his blog. Now he sells a prebuilt SD image as part of his OpenCV book bundles; probably because he realized that most people would get stuck or bored along the way.)
If any Verilog warnings stump you feel free to post.
Even a 2TB Seagate in a USB2 enclosure was >2x as fast compiling as compiling on a uSD.
I was curious how if it's doable for the average person with just a Pi and not much else. e.g. a high-school student. At our local schools some of the students (lucky!) get to do hardware design in high school through a program called project Lead the Way. e.g. http://leigh.cuhsd.org/apps/pages/index.jsp?uREC_ID=442020&type=d&pREC_ID=954149 and https://www.pltw.org/our-programs/pltw-engineering-curriculum#curriculum-7
As you noticed cross-compiling is easy enough, we do it all the time for Propeller, AVR, STM32, etc, etc. But getting the tool chain built and installed can be a nightmare. Works well enough with a good build system and some good instructions and perhaps a guru on hand to advise. Often times I have given up in frustration. Those digital design projects for schools look interesting.
I mentioned it here somewhere before recently but now we have Free and Open Source tools for synthesizing Verilog for those cheap Lattice devices. We have Free and Open Source RISC V (picorv32 and others) implementations that fit on those Lattice parts with space to spare for other useful logic. We have the Free and Open Source RISC V GCC tool chain. We have Yosys. All of which runs on the Raspberry Pi. As I said, soon we will have 10 year old kids turning out SoCs with their own digital designs.
So yes, this is all doable for the average person.
Perhaps it needs packaging up into an easy to use system. The Icoboard project is an attempt at this: http://icoboard.org/
Those high school projects could certainly make use of this I'm sure.
I want to try running those synthesis tools on the Pi soon.
As far as the RISC V tools - I'll probably settle for a simple RISC V assembler at first.
http://bellard.org/riscvemu/js/
iverilog does not know what it is so I commented it out for the test bench.
Hmmm...now the test bench does not work anymore because I un commented it out.
I don't know how to deal with the PLL in iverilog. I really don't want to add all those Quartus generated files in the db/ and such directories.
I should just 'define it away somehow.
First I noticed that most files didn't define a timescale. It's good to add them - at some point this will save you some hassle. One easy way to do this is to make a header file to define it. Then if you ever want to change it you can edit this one file. And if it's not suitable for a particular file, then you can do something else there. (E.g. perhaps a PLL needs a different timescale to simulate properly) Then put at the top of all of your files. (The cpu already has an embedded timescale.)
You can tell Icarus where to look for include files on the command line or through a command file. (Some call this a manifest file since you can also list all of your sources in it)
Here's an example: Then to run iverilog you just do: There appears to be a little typo here if you want a 50 MHz clock in simulation: Then you can get your 100 MHz clock which is nice if you want the correct baud rate once there's a UART receiver BFM.
Wow, great advice there.
I had been carefully avoiding thinking about actual time scaling in simulations. It's just logic right Anyway I hadn't got around to finding out what that mysterious "timespec" thing meant. But of course it's a good idea to get that right. I added your timespec.vh and pushed to the repo.
The manifest is a great idea too. I also added that.
That paper you linked to is a monster. TL;DR most of it! But these guidelines from there look good:
Guideline #1: When modeling sequential logic, use nonblocking assignments.
Guideline #2: When modeling latches, use nonblocking assignments.
Guideline #3: When modeling combinational logic with an always block, use blocking
assignments.
Guideline #4: When modeling both sequential and combinational logic within the same always
block, use nonblocking assignments.
Guideline #5: Do not mix blocking and nonblocking assignments in the same always block.
Guideline #6: Do not make assignments to the same variable from more than one always block.
Guideline #7: Use $strobe to display values that have been assigned using nonblocking
assignments.
Guideline #8: Do not make assignments using #0 delays.
I added a guidlines.txt to the repo with that in it as a reminder.
He also has some nice suggestions about generating reset and clock in test benches towards the end.
I'll have a look at your suggestions re: clock generation in the tb after a nap.
Big thanks for all that.
I have a question about a statement that was made earlier in this thread by Andy.
Since I don't have all of the RISC V tools I've been playing with a somewhat broken assembler written in python. (I don't recommend it, but it was quick to write some really simple test. https://github.com/wueric/riscv_assembler if anyone must look.)
The question is since ADDI and ORI both sign extend, is the above statement true? I'm probably missing something really simple but I'm getting a lot of 1's in the msbs.
Which look like this in the compiled .elf file:
The "li a0,0xffff0040" became two instructions:
lui a0,0xffff0
addi a0,a0,64 # ffff0040 <end+0xfffee994>
and the "li a1, 123456789" became:
lui a1,0x75bd
addi a1,a1,-747 # 75bcd15 <end+0x75bb669
So yep, it's true.
Older, simpler CPU designs would just add cycles and perform more ops. 6809 could end up with 14 cycle plus, if one did something crazy like program counter relative, indexed by register, post decrement...
And there was LEA, load effective address so an instruction like that could be computed once, result captured for simpler faster forms.
With a pipeline, breaking ops out as instructions does the same thing, a bit of speed traded for maybe program size, complexity.
Edited to add: the relevant part seems to be load_const
And edited again: I should read your examples more closely. Got fooled be decimal versus hex. They are doing some transforming in there for 123456789.
Ubuntu on windoze doesn't even go that far:
That is a problem with the CR/LF (we are in 2017 for pit's sake!). I'm reloading everything in the ubuntu for windoze.... .
All code is available in the repo: https://github.com/ZiCog/xoro there is a "Clone or download" button there which let's you download a zip file.
But just this once I attach a ZIP of the firmware directory for you.
Most of the code in there is not used, it came from the picorv32 repo. Just start.S and helloworld.c
I was just looking into replacing most of that with something that builds real programs (with a main() ) that use newlib.
On the MAX10 (I got a Max10-Lite board: 50 kLUTS + 64 MB SDRAM):
Flow Status Successful - Thu Apr 13 11:04:11 2017
Quartus Prime Version 16.1.0 Build 196 10/24/2016 SJ Lite Edition
Revision Name xoro
Top-level Entity Name xoro_top
Family MAX 10
Device 10M50DAF484C7G
Timing Models Final
Total logic elements 2,752 / 49,760 ( 6 % )
Total registers 1276
Total pins 15 / 360 ( 4 % )
Total virtual pins 0
Total memory bits 395,264 / 1,677,312 ( 24 % )
Embedded Multiplier 9-bit elements 8 / 288 ( 3 % )
Total PLLs 1 / 4 ( 25 % )
UFM blocks 0 / 1 ( 0 % )
ADC blocks 0 / 2 ( 0 % )
Compiling on Ubuntu for windoze halts here:
It is so dead that it doesn't even react to Ctrl-C... I didn't learn the words to describe that. (There are no words to describe how ugly that is (you are), yes there are but you don't known them yet).
it is even worse that I thought, bash doesn't start anymore...
Your riscv tools build on "Winbuntu" (The worst of both worlds!) failed in the same way mine did. Locked up solid. Tried it twice.
I ended up building the tools on my Debian box and copying the result into Winbuntu. Works fine. Probably building in Debian under Virtual box is a better idea.
1) Never use github "download as zip", It is broken. You will not get the exact original files.
2) Most tools (iverilog, perl ...) are sensitive to CRLF/LF, use dos2unix (or unix2dos) before and check if that solves the issue.
(Quite interesting links on this thread, and thanks KeithE for you tips.
Also this thread mentions two great genius programmers of XXI century : Clifford and Fabrice
About cross compiling : the current king of cross compiling is Rob Landley 'Aboriginal Linux'. A set of scripts and with one single command it will automagically download all sources, cross compile linux, and start the kernel into a QEMU virtual machine, all flawlessly).
It seems that XORI is easier for my dumb brain than ADDI. Part of this adventure is getting some exposure to RISC V assembly so I wanted to play around a bit. (In the code below I have the full 32-bit constants, but only a portion is used in each line) I guess it's a good idea to always sign extend in hardware and just deal with this in the tools. It should simplify the hardware.
BTW - you can write to x0, but the value is effectively zeroed for reads. I think that's handled by this code. You can only see the value if you look at dbg_reg_x0 in waveforms. I've seen this style before in other processors. I was looking here before I understood the constant loading since I thought some code I wrote broke things.
For one thing it appears that he proves those parallel case statements. And you can see what he's doing in the FORMAL sections.
Edited to add: I find that using all of the Pi's cores with "make -j$(nproc)" can cause hangs when building IceStorm. This is with a fan blowing on the board as well. Not sure what's going on with that. Maybe it's due to the limited amount of RAM?
So I wrote a little SPI driver in Verilog with the intention of accessing the ADC on the DE0 nano. Basically it's intended to clock out 16 bits from a register to the device over MOSI on the falling edge of the SCLK and clock in 16 bits from the device over MISO on the rising edge of SCLK. Sounds simple enough.
(The ADC only needs 4 channel select bits out and only delivers 12 data bits in, I thought I'd handle that detail in whatever is using this module).
Now, I'd like to have a test bench for this to run under icarus before going to Quartus and spending hours synthesizing and running things.
But it looks to me like I would then have to write a verilog simulation of the ADC SPI interface for my driver to talk to.
Now, I'd like to have a test bench for that verilog ADC simulation before I use it.
But that test bench would look like my ADC driver that I want to test....
Chicken and egg.
How do real verilog designers handle this?
At this point I'm inclined to write a test bench that just checks the right bits get clocked out on the right edges and forget about clocking bits in.
Just hope that works and then go to Quartus.
Unlike the UART transmitter I don't think I can get to the ADC pins with a scope to check what comes out.
Here is the SPI driver so far, do let me know if it is crappy, actually I have never bit banged a SPI device before so the whole idea might be wrong :
Typically you would have a design engineer and verification engineer developing based on the same specification. It's always dangerous when the same person does both. This helps to catch errors that are caused by mistakes or by misinterpreting the specification. Also the either of these parties might buy or reuse "silicon proven" IP. And the verification IP is typically written in a more behavioral style. Of course this is no guarantee of success. So everyone uses FPGAs or other emulation platforms to interface to real hardware before takeout if at all possible ;-)
One question you might ask is how you would do this in software with GPIOs. Looking at that "always @ (SCLK) begin" makes me wonder if the synthesizer will deal with it. Maybe there could be a single flop to reclock output data on the falling edge? Or perhaps all of the logic could run from clk and take action on the edges of SCLK treating it like a data signal? Using clk for everything means you don't need to worry about different clock domains and timing constraints. Since you probably don't care about power maybe best to start simple?
Also I guess you could do this in software with GPIOs and then write the hardware ;-)
Edited to add: also if you use clk for everything then hopefully you don't need to worry about synchronization. In this example it looks like you're sending a multiunit value out on rdData and software just reads that at anytime to get the latest reading. If it's changing with some skew relative to clk, then software could get a bad reading. (And if you start using generated clocks in FPGAs you need to be careful. e.g. sending clocks through fabric intended for data.)
Edited again to add: this does bring up one interesting area. If you find yourself using multiple clocks, then you should make sure to understand CDC (clock domain crossing.) Search for synchronizers and perhaps metastability. There are all sorts of structures for different situations. The classic is a FIFO, but it's a complex place to start. There is a paper here on below. But the simplest situation would be sampling a single-bit asynchronous signal. A lot of people just put that through a couple of flip-flops. Whatever you do I recommend making a library of CDC stuff, and always instantiating from there. Often you'll see code with CDC structures written manually, and then it's harder to make changes. Or flag things - e.g. there are tools that look for CDC logic, and maybe you can give them hints.
http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO2.pdf
An another edit: Some of the other papers there reference synchronization:
http://www.sunburst-design.com/papers/CummingsSNUG2001SJ_AsyncClk.pdf
That is exactly the document I am working from. In fact the few comments I have in the code are pretty much cut and pasted from there, just to remind me what is expected. Oh yeah. Had my fill of that either writing or testing software in the avionics flight control software business. I was on the team testing the code for the Boeing 777's primary flight computers. Happy days!
In this case there is only me. Well, and guys like you if you want to pitch in. Indeed, I'd do it much the same way as I presented in Verilog! Thing is in software, on a Propeller say, you get a very quick turn around time as you try and fail, edit try and fail... I don't want to do that in the immensely slow and cumbersome Quartus! Can we talk about "clock domains"?
My idea of different clock domains is dealing with two physically different clocks, say running from two different crystal oscillators, that have an unknown and varying phase relationship.
To my mind, everything I have done so far is in the same clock domain. Even if I divide that clock down the phase relations are constant.
Thing is, I have done a bit of digital design in the distant past. I have fixed problems with multi-processor systems running off different clocks getting into a mess when interacting with each other.
Problem is I don't yet have a good feel of how this Verilog thing maps to actual gates and transistors. Good point.
Actually my idea was not that this module gets read by software directly. I was thinking to have a higher level module that triggered this process, waited for the transfer to complete, then read all the 16 bits of rData. That higher level process would cycle through all 8 channel addresses, and it would provide the bus interface to the processor. It would provide eight 16 bit registers from which software could get the latest ADC readings.
Perhaps I have not though this through enough...For example, my proposed higher level process has no idea when the transfer is complete!
I really appreciate your input on all this.
is going to generate a zero skew clock where the edges line up with clk? It's not going to be asynchronous, but I've heard of some bad experiences. But it has worked for some devices in the past as well. I know some guys only use special clocking hardware (e.g. Xilinx MMCMs) to generate clocks. Maybe it's ok for Altera Cyclone IV parts. But somehow the data signal from the flop has to get back onto the clocking fabric. There's probably going to be delay. How well is it all characterized?
It just so happens that this caused me to think that random people reading the thread might like to know about how to solve the general problem where something is truly asynchronous. (A simple example would be a UART receiver)
Let's say that you choose to be paranoid and somewhat lazy. Then there are multiple ways you can go. One way is to detect edges on SCLK and react to those. Then you don't need to worry about timing any more than you already are - o.k. you do need to make sure that SPI to the ADC is going to meet setup/hold times versus SCLK.
Hopefully you can understand the above idea. By referencing how you would do it with GPIOs one idea is that you would only being using positive edges of some high frequency clock. It's not the most elegant or power efficient way, but maybe not the worst. (The next step could be to use clock enables to make it more power efficient.)