Trying to do an audio bootloader I decided to do an uart receiver.
Ale - it might have been lost in all of the noise, but how were you planning to get audio into the FPGA? I've seen what looks like pretty low cost I2S output boards, but a quick search didn't show anything like that for input.
Is your UART using oversampling? Maybe that's what the _34 is about? Typically UARTs use something like 8X or 16X to try to sample the center of the data bits. But perhaps that's overkill when you're not dealing with true RS232 and long cables and the resulting slow edges?
For reference here are a couple of simple UARTs that you could compare your code to - the second has a lot of magic constants.
Note: the bit clock is only started when the starting edge is detected (bad for signal tap). I'll change it to a normal always on clock , I thought it was a neat idea... It is based on Heater's transmitter.
It's not a bad idea. It's really acting like a clock enable right? And you need to reset it on a start bit anyways correct? I don't see why this would be bad for signal tap if it's using clk as its clock. (In SOCs I've seen the equivalent of clk being gated until a start bit is seen. This is for chips that required low standby power.
I reworked the receiver, now it works. The FTDI chip sends consecutive bytes almost without a pause between the stop bit and the falling edge of the start bit, I had problems with that, now it seems to work:
I'm sure that it can be trimmed a bit... too many flops
/**
* Uart receiver
*
*
* Memory Map
*
* XXXX_XX40 : Uart Transmitter
* XXXX_XX50 : Uart receiver
*
* XXXX_XX50 : recieved byte
*
* Write
* 31 16
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
* | | | | | | | | | | | | | | | | |
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
*
*
* 15 0
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
* | | | | | | | | | | | | | | | | |
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
*
*
* Read
* 31 16
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
* | | | | | | | | | | | | | | | | |
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
*
*
* 15 8 7 0
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
* | | | | | | | | E | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
* +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
*
*/
//`include "timescale.vh"
module uart_rx #(
parameter [31:0] BAUD_DIVIDER = 54 //868 // 100MHz / 115200 baud
) (
// Bus interface
input wire clk,
input wire resetn,
input wire enable,
input wire mem_valid,
output wire mem_ready,
input wire mem_instr,
input wire [3:0] mem_wstrb,
input wire [31:0] mem_wdata,
input wire [31:0] mem_addr,
output wire [31:0] mem_rdata,
output wire bit_clock_o,
// Serial interface
input wire serial_in // The serial input.
);
// Internal Variables
reg [7:0] shifter;
reg [7:0] buffer;
reg [1:0] state;
reg [3:0] bitCount;
reg [2:0] clkCount;
reg [15:0] bitTimer;
reg bufferEmpty; // TRUE when ready to accept next character.
reg rdy;
reg old_serial_in;
reg old2_serial_in;
reg started;
reg bit_clock;
assign bit_clock_o = bit_clock;
// UART RX Logic
always @ (posedge clk or negedge resetn) begin
if (!resetn) begin
bufferEmpty <= 1'b1; // empty
bitTimer <= 0;
rdy <= 0;
bit_clock <= 0;
end else begin
if (mem_valid & enable) begin
if ((|mem_wstrb == 1'b0) && (bufferEmpty == 1'b0)) begin
bufferEmpty <= 1'b1;
end
rdy <= 1;
end else begin
rdy <= 0;
end
// Generate bit clock timer for 115200 baud from 50MHz clock
if (bitTimer == BAUD_DIVIDER / 2)
begin
bitTimer <= 'h0;
bit_clock <= ~bit_clock;
end
else
bitTimer <= bitTimer + 1;
if ((state == 2'd2) && (started == 1'b0))
bufferEmpty <= 1'b0; // received
end
end
always @(posedge bit_clock)
begin
old_serial_in <= serial_in;
old2_serial_in <= old_serial_in;
if ((old2_serial_in == 1'b1) && (old_serial_in == 1'b0)) // start condition
begin
if (started == 0)
begin
state <= 2'h0;
started <= 1'b1;
clkCount <= 3'h5; // 2 cycles already elapsed
end
end
if (started)
begin
case (state)
// Idle
2'd0: begin
bitCount <= 7;
if (clkCount == 3'h0)
begin
state <= 2'd1;
clkCount <= 3'h7;
end
else
clkCount <= clkCount - 3'd1;
end
2'd1: begin
if (clkCount == 3'd4)
shifter <= { old_serial_in, shifter[7:1] }; // shift in
if (clkCount == 3'd0)
begin
clkCount <= 3'h7;
if (bitCount == 4'd0)
begin
state <= 2'd2;
end
else
begin
bitCount <= bitCount - 4'd1;
end
end
else
clkCount <= clkCount - 3'd1;
end
2'd2 : begin // stop bit
buffer <= shifter;
//if (clkCount == 3'h0)
// begin
started <= 1'b0;
// end
//else
// clkCount <= clkCount - 3'd1;
end
default : ;
endcase
end
end
// Tri-state the bus outputs.
assign mem_rdata = enable ? { bufferEmpty, buffer } : 'b0;
assign mem_ready = rdy;
initial
started = 0;
endmodule
I think that most UART receivers which expect a single stop bit, will start searching for the next start bit immediately after confirming the presence of half of a stop bit. This is to so they can tolerate ~5% differences in the tx/rx baud rates. So in your test bench you might try running the transmitter a little faster than the receiver and see what happens.
The idea with the audio bootloader was to use a digital input with a schmitt trigger (the MAx10 has such inputs), a resistor divider and a series capacitor. When the audio signal swings with enough amplitude, like 3 V, you can detect the edges of a manchester encoded signal. A method used on some atmegas and attinys. I thought it was kind of neat. I may still give it another go but I'll do a small board without flying wires, I think cross-talk was playing a rol in not letting me detect the signal correctly. I'll post my code later on github. You need a resettable timer (makes the process easier).
I love you. I have been fighting with this Quartus memory thing all day. Your post above has fixed things.
At some point I realized that Quartus did not understand Icarus style HEX files. So I pulled Intel hex format out of the firmware build with "riscv32-unknown-elf-objcopy -O ihex firmware/firmware.elf firmware/firmware.hex"
But loading that into the Quartus HEX editor did not look right at all.
So I thought creating a .mif file from the firmware.bin was the way to go. I made a makemif.py script to do that. But loading that into the Quartus hex editor showed everything correct except address zero which always had a content of zero! No idea why, no matter what I did address zero always contained zeros. For sure not in my .mif file output.
But now, looking at your Python code I see you use "ADDRESS_RADIX=UNS;" so I changed my script to do that, instead of HEX, as well. BINGO! When loaded into the Quartus hex editor address zero now contains the correct data!
Then the last little detail. That pesky little "~" on the clock. I happened to notice, late in the day, that there is little "not" symbols on the little schematic the Wizard displays. I had tried adding the "~" but of course nothing worked because the .mif file was not loaded correctly at the time....
And Quartus doe not generate any error when it cannot find or understand a memory initialization file...
Grrr....
Working xoro with 64K of Quartus Wizard memory is now pushed to the repo.
Now, I'm not sure how your memory.v works without tri-state mem_rdata and mem_ready ?
Also you have ".wren(|mem_wstrb)," which surely means memory writes can happen even when the memory device is not enabled. How does that work?
The enables are only for writes, reads have no enables. the byte enables are used for byte enables and are ored together for the master write enable.
In my case I needed the "~", I am using 50 MHz instead of 100. If you OR the bus together, you can rise the clock a few MHz.
And Quartus does not generate any error when it cannot find or understand a memory initialization file...
Hier the two waveforms with inverted clock (works) and without. I think one could use the look ahead interface for RAM access with normal clock.
I forgot the to use the posedge for rdy, then everything is delayed and it should work, but you lose one clock per instruction. But the Fmax goes up to 91 MHz for the slow model. And the vcd files I posted are garbage. Quartus do not group the buses together .
I understand that the byte enables are only for selecting bytes on write. I just worry that if wren is not gated by enable and mem_valid then writes can occur when other devices are being addressed. Which might cause some head scratching bugs.
I had some fun with the UART. At first I thought it ate some received bytes, and that some bytes where wrongly received, timing problems. In reality, I am quite positive that the problem was the CPU BUS, so to say. The mechanism I was using to signal that the buffer was full or empty. I had to delay setting the buffer as full for as long as the current memory access cycle needed. If not, the CPU sees old data or doesn't see the buffer full flag. I have LA traces . The code doesn't convince me as very robust, it may have some other unseen corner cases. I had to reduce the RAM allocated to the CPU (not a problem for such tests) to allow Signal Tap to allocate 768 kBits for the signals, I wanted to see as many received chars as possible (I had to hike up the bit rate to 1 MBit to be able to see something with so little data).
I posted my code and the changes to Heater's code on github. https://github.com/raps500/07_PicoRV32
I've updated my RISC-V emulator to support the rdcycle instruction, and to run on both Propeller1 and Propeller2. To build for Propeller2 requires a recent version of fastspin (at least 3.6.2) and some way to load binaries to the platform. Dave Hein's loadp2 works wonderfully for this. The multiply and divide instructions use the P2 qmul and qdiv, so you'll need an FPGA that has those (I think some of the smaller ones don't?).
Performance of the emulated RISC-V binaries on P1 is pretty similar to the performance of PropGCC CMM, although the binaries are much larger (more like LMM size). The P2 performance on my DE2-115 is substantially better (something like 50% fewer cycles than on the P1 for most tests I've run).
I'm porting my project from the MAX10 (DE10-Lite) to the Cyclone V (BemicroCV board), using the same modules only regenerating the memories and the pll, everything has another name (why!?):
The first thing to notice is the fmax whent down to 56 MHz. It works at 64 anyways...
The problem with the FTDI usb-to serial adapter is also present. I have to disconnect the terminal from the port and then is the USB Blaster recognized again.
The compile times went up...
I have another working build for the Lattice MachXO2. It should be good till 40 MHz clock, without mul and div and so on. With 16 kBytes RAM
I'll post the updates to github, firstly i have to suffle the files around a bit.
I did it ! I got the RISCV to talk to the DDR3 on the bemicro board. Writing is a matter of one or two clocks, reading is a 11 to 13 clocks (of 20 ns each) affair, no what i'd call a speed demon. That is using the "ddr3_example" provided by the altera wiki on the BeMicro CV page. It uses a 32-bit port, with a burst count of 1, I wonder if one could read more than one word using a higher count, I'll have to try that. I dropped the traffic generator and replaced it with the pico riscv. Not pretty, not fast but it is a proof of concept.
I don't understand the Cinque board specification. Seems to be very short of RAM, 16KB, for such a powerful 32 bit processor.
At the price it's more attractive form to get a DE0 Nano FPGA board and put a RISC V core on it. Which I'm working on. The performance will be down by a factor of 4 or so but the Nano offers so much more. If more performance is required multiple picorv32 cores can be put on there which would be better for real-time work, like the Propeller concept.
The Arduino form factor is a big turn off for me. What with it's wonky layout. I have yet to become desirous of any shields that would make me ignore that.
Having said that, I'll be ordering one, it's a first of sorts and well ... gotta support the cause.
I don't understand the Cinque board specification. Seems to be very short of RAM, 16KB, for such a powerful 32 bit processor.
Well, yes, but RAM costs money and they wanted the 300MHz banner more than they wanted a practical MPU.
With only QuadSPI support, this will be fairly ordinary in final real-world XIP speed.
Thing is, the ESP32 has two 32 bit processors running at 240MHz with 520KB internal SRAM, 4MB FLASH, 28 GPIO, besides all it's wireless goodies.
One wonders what the point of having the puny RISC-V implementation on the board is.
hehe, yes a wry smile is needed there....
ESP32 is interesting on a few fronts, it is RAM based, but manages to include the QSPI flash die inside the compact QFN packages.
I also see it has 64b U/D counters, and Ethernet MAC, and ADC and DACs .... ( and no ARM core in sight )
The RISC V guys make the claim that smaller companies who want to build chips now a days have a real problem. They can't use Intel because it's too big and power hungry and only available from Intel. They can't use ARM because of the hassle and expense of licensing the instruction set, never mind actual IP cores. They could use some other open core but then they have a problem with software support. The RISC V guys claim that an open and free instruction set architecture is what these small guys need. Especially if open and free software support is available. Which it is for RISC V. They could come up with their own ISA and core and software support but that is a hassle and expense.
The ESP devices use an Espressif core, wherever that came from. It is not ARM. It kind of shows the RISC V guys have a good argument there.
Comments
Is your UART using oversampling? Maybe that's what the _34 is about? Typically UARTs use something like 8X or 16X to try to sample the center of the data bits. But perhaps that's overkill when you're not dealing with true RS232 and long cables and the resulting slow edges?
For reference here are a couple of simple UARTs that you could compare your code to - the second has a lot of magic constants.
https://github.com/cliffordwolf/icotools/blob/master/icosoc/mod_rs232/mod_rs232.v
https://www.inf.ethz.ch/personal/wirth/ProjectOberon/SourcesVerilog/RS232R.v
It's not a bad idea. It's really acting like a clock enable right? And you need to reset it on a start bit anyways correct? I don't see why this would be bad for signal tap if it's using clk as its clock. (In SOCs I've seen the equivalent of clk being gated until a start bit is seen. This is for chips that required low standby power.
I'm sure that it can be trimmed a bit... too many flops
The idea with the audio bootloader was to use a digital input with a schmitt trigger (the MAx10 has such inputs), a resistor divider and a series capacitor. When the audio signal swings with enough amplitude, like 3 V, you can detect the edges of a manchester encoded signal. A method used on some atmegas and attinys. I thought it was kind of neat. I may still give it another go but I'll do a small board without flying wires, I think cross-talk was playing a rol in not letting me detect the signal correctly. I'll post my code later on github. You need a resettable timer (makes the process easier).
no output registers, and don't forget to invert the clock !, maybe that is what you forgot...
iram.qip:
iram.v
iram_bb.v
The memory file:
I love you. I have been fighting with this Quartus memory thing all day. Your post above has fixed things.
At some point I realized that Quartus did not understand Icarus style HEX files. So I pulled Intel hex format out of the firmware build with "riscv32-unknown-elf-objcopy -O ihex firmware/firmware.elf firmware/firmware.hex"
But loading that into the Quartus HEX editor did not look right at all.
So I thought creating a .mif file from the firmware.bin was the way to go. I made a makemif.py script to do that. But loading that into the Quartus hex editor showed everything correct except address zero which always had a content of zero! No idea why, no matter what I did address zero always contained zeros. For sure not in my .mif file output.
But now, looking at your Python code I see you use "ADDRESS_RADIX=UNS;" so I changed my script to do that, instead of HEX, as well. BINGO! When loaded into the Quartus hex editor address zero now contains the correct data!
Then the last little detail. That pesky little "~" on the clock. I happened to notice, late in the day, that there is little "not" symbols on the little schematic the Wizard displays. I had tried adding the "~" but of course nothing worked because the .mif file was not loaded correctly at the time....
And Quartus doe not generate any error when it cannot find or understand a memory initialization file...
Grrr....
Working xoro with 64K of Quartus Wizard memory is now pushed to the repo.
Now, I'm not sure how your memory.v works without tri-state mem_rdata and mem_ready ?
Also you have ".wren(|mem_wstrb)," which surely means memory writes can happen even when the memory device is not enabled. How does that work?
Anyway, a very big thanks to you Ale.
I don't know what is best.
In my case I needed the "~", I am using 50 MHz instead of 100. If you OR the bus together, you can rise the clock a few MHz.
In my case it also crashed...
I forgot the to use the posedge for rdy, then everything is delayed and it should work, but you lose one clock per instruction. But the Fmax goes up to 91 MHz for the slow model. And the vcd files I posted are garbage. Quartus do not group the buses together .
I posted my code and the changes to Heater's code on github. https://github.com/raps500/07_PicoRV32
That's a lot of changes you have make and all the files have moved around. I won't have time to look into it all for a few days now.
Performance of the emulated RISC-V binaries on P1 is pretty similar to the performance of PropGCC CMM, although the binaries are much larger (more like LMM size). The P2 performance on my DE2-115 is substantially better (something like 50% fewer cycles than on the P1 for most tests I've run).
The github repo is https://github.com/totalspectrum/riscvemu.
Of course, all values of 'run' are allowed
eg Someone has Linux 'running' on the 8-bit AVRs ! (just at glacial speed)
Given a P1 with a Gadgetganster 32MB SDRAM board, or some such, it could be doable.
The first thing to notice is the fmax whent down to 56 MHz. It works at 64 anyways...
The problem with the FTDI usb-to serial adapter is also present. I have to disconnect the terminal from the port and then is the USB Blaster recognized again.
The compile times went up...
I have another working build for the Lattice MachXO2. It should be good till 40 MHz clock, without mul and div and so on. With 16 kBytes RAM
I'll post the updates to github, firstly i have to suffle the files around a bit.
http://hackaday.com/2017/05/20/arduino-cinque-the-risc-v-esp32-wifi-bluetooth-arduino/
I will work on some pre-fetch logic to avoid such an overhead (DRAM read latency).
At the price it's more attractive form to get a DE0 Nano FPGA board and put a RISC V core on it. Which I'm working on. The performance will be down by a factor of 4 or so but the Nano offers so much more. If more performance is required multiple picorv32 cores can be put on there which would be better for real-time work, like the Propeller concept.
The Arduino form factor is a big turn off for me. What with it's wonky layout. I have yet to become desirous of any shields that would make me ignore that.
Having said that, I'll be ordering one, it's a first of sorts and well ... gotta support the cause.
With only QuadSPI support, this will be fairly ordinary in final real-world XIP speed.
Better to wait for the next silicon iteration ?
One wonders what the point of having the puny RISC-V implementation on the board is.
It does not seem attractive, except for those like me that want to cheer along for open hardware and the RISC V.
I'd be very happy if they supplied that RISC V chip on a tiny break out board.
ESP32 is interesting on a few fronts, it is RAM based, but manages to include the QSPI flash die inside the compact QFN packages.
I also see it has 64b U/D counters, and Ethernet MAC, and ADC and DACs .... ( and no ARM core in sight )
The RISC V guys make the claim that smaller companies who want to build chips now a days have a real problem. They can't use Intel because it's too big and power hungry and only available from Intel. They can't use ARM because of the hassle and expense of licensing the instruction set, never mind actual IP cores. They could use some other open core but then they have a problem with software support. The RISC V guys claim that an open and free instruction set architecture is what these small guys need. Especially if open and free software support is available. Which it is for RISC V. They could come up with their own ISA and core and software support but that is a hassle and expense.
The ESP devices use an Espressif core, wherever that came from. It is not ARM. It kind of shows the RISC V guys have a good argument there.