RISC V ?

Tubular · 2018-09-28 05:03

Interesting

What does successfully streamed mean, do they mean synthesized?

jmg · 2018-09-28 05:13

Tubular wrote: »

Interesting

What does successfully streamed mean, do they mean synthesized?

Yes, that was strange English - it may even mean engineering samples, if they indicate 2019 deliveries.

Heater. · 2018-09-28 08:37

It may well mean streamed all the way from Verilog design to actual silicon samples.

As happened to the P2 yesterday. Yeah ha!

KeithE · 2018-10-01 03:47

VLSI types often use the word "stream" when referring to reading or writing GDS files. This may be because they used to be written to tape, and then sent via snail mail. I believe that this is why we still use the term tape-out. I remember at my first job after college, there was a person who was in charge of creating, verifying, and shipping the tape. This still happened in the early 90's.

Here's a typical quote:

"Once the layout is complete, it is streamed out to a graphic data system (GDS) file, and a full sign-off verification run is completed."

Heater. · 2018-10-01 06:11

Still today "bitstream" is that binary blob of data that gets loaded to an FPGA to configure it:

The term bitstream is frequently used to describe the configuration data to be loaded into a field-programmable gate array (FPGA).

https://en.wikipedia.org/wiki/Bitstream

ozpropdev · 2018-10-01 06:19

Heater. wrote: »

Still today "bitstream" is that binary blob of data that gets loaded to an FPGA to configure it:

or some days bitstream" is that binary blob of data that gets loaded to an FPGA to configure confuse it:

AJL · 2018-10-01 07:36

KeithE wrote: »

VLSI types often use the word "stream" when referring to reading or writing GDS files. This may be because they used to be written to tape, and then sent via snail mail. I believe that this is why we still use the term tape-out.

It goes further back than that. "Tape-out" originated with the use of opaque tape to create the artwork manually on a transparent sheet, which was the input to the photo-lithography process.

To give a time frame, the MOS 6502 was taped out this way.

Heater. · 2018-10-01 08:15

Even today FPGAs are configured with "streams". They often boot up by reading a serial stream from an attached memory device.

Cluso99 · 2018-10-01 09:01

AJL wrote: »

KeithE wrote: »

VLSI types often use the word "stream" when referring to reading or writing GDS files. This may be because they used to be written to tape, and then sent via snail mail. I believe that this is why we still use the term tape-out.

It goes further back than that. "Tape-out" originated with the use of opaque tape to create the artwork manually on a transparent sheet, which was the input to the photo-lithography process.

To give a time frame, the MOS 6502 was taped out this way.

And before that, pcbs were "taped out". Red and blue on the same transparency or black on multiple transparencies.

kwinn · 2018-10-01 15:06

Cluso99 wrote: »

AJL wrote: »

KeithE wrote: »

VLSI types often use the word "stream" when referring to reading or writing GDS files. This may be because they used to be written to tape, and then sent via snail mail. I believe that this is why we still use the term tape-out.

It goes further back than that. "Tape-out" originated with the use of opaque tape to create the artwork manually on a transparent sheet, which was the input to the photo-lithography process.

To give a time frame, the MOS 6502 was taped out this way.

And before that, pcbs were "taped out". Red and blue on the same transparency or black on multiple transparencies.

Ah, those were the good old days. Been there, done that, so happy when PC software to do that and more came along.

Heater. · 2018-10-01 16:41

Ah those were the good old days. The engineers in the company I started work at had a whole department of young girls to do PCBs layout with sticky tape on transparencies.

I was so happy some years later when I worked on a team creating software for schematic capture and PCB layout. Which I'm amazed to find is still on the market: https://www.zuken.com/en/products/pcb-design/cadstar

jmg · 2018-10-08 22:35

A RISC-V design contest ...

https://www10.edacafe.com/nbc/articles/1/1619547/RISC-V-Design-Contest-Calls-Embedded-Designers-Push-Limits-Innovation

"There are four categories the entries will compete in:

Smallest Microsemi SmartFusion®2 or IGLOO®2 implementation
Smallest Lattice iCE40 UltraPlus™ implementation
Highest-performance Microsemi SmartFusion®2 or IGLOO®2 implementation
Highest-performance Lattice iCE40 UltraPlus™ implementation"

"The contest is sponsored by Google, Antmicro and Microchip Technology, through its Microsemi subsidiary, which are Founding Platinum members of the RISC-V Foundation."]

First place prize: $6,000 USD
Second place prize: $3,000 USD + Splash Kit + iCE40 UltraPlus MDP
Third place prize: $1,000 USD + PolarFire Evaluation Kit + iCE40 UltraPlus Breakout Board

Heater. · 2018-10-09 08:38

No way is my RISC V core effort going to be ready by the closing date, November the 26th. Not that I would expect it to be competitive anyway.

It will be interesting to see how my naive RISC V core implementation compares to the winner of the smallest category. I have to keep my effort simple, all that caching and pipelining stuff starts to boggle my mind.

Evil thought: One could implement a really small 32 bit core by using the ZPU instruction set architecture. Then write an emulation of the RISC V for it. Boom smallest RISC V implementation!

Heater. · 2018-10-09 08:48

Hey, I missed the memo about Microsemi being bought by Microchip earlier this year.

Has anyone here got any experience of Microsemi FPGAs? I only heard of the company quite recently and never heard of any hobbyist types using Microsemi devices.

KeithE · 2018-10-09 16:37

Is there any affordable Linux box using RISC V that we can buy? I only saw a ~$1k FPGA board. Maybe a VM is the way to go for now.

It will be interesting to see how the Lattice entries compare to Clifford's picorv32. They should give him an honorary prize regardless.

jmg · 2018-10-09 16:51

Heater. wrote: »

Evil thought: One could implement a really small 32 bit core by using the ZPU instruction set architecture. Then write an emulation of the RISC V for it. Boom smallest RISC V implementation!

Someone has done almost exactly that, with highly compact x86 and 8051 using microcode roms,
Fwir, very low LE counts of a few hundred and around 5-10 MIPS for 100+ MHz SysCLKs
Ie not fast, but VERY small, and a useful idea for adding more cores in a design.

Heater. · 2018-10-09 19:05

@KeithE,

Is there any affordable Linux box using RISC V that we can buy?

As far as I can tell the closest thing we have to that so far is the HiFive Unleashed board from SiFive. A real RISC V chip that runs Linux. It's not cheap.
https://www.sifive.com/boards/hifive-unleashed

I keep goading the Raspberry Pi guys that the next Raspberry Pi should be a RISC V machine. After all their old Cambridge university pals are just across town doing this https://www.lowrisc.org/

My guess is that any winner of this competition, in either category, already has a core design in place. There is not enough time to create a new one from scratch. And they have to be Open Sourced on Github. That might only leave picorv32 at the low end and the Berkeley BOOM at the high end. It would be nice to be surprised though.

@jmg,

Yep. Somebody even emulated an ARM processor well enough to run Linux on an 8 bit ATMEL chip. https://dmitry.gr/?r=05.Projects&proj=07. Linux on 8bit

I'm sure an 8 bit ATMEL is very small in FPGA logic blocks. And could emulate a RISC V as well.

Question is, having moved the RISC V implementation from hardware logic blocks to software, will it actually use less resources of the FPGA?

How are the judges weighing up logic block usage vs RAM block usage?

Could get tricky.

Dolu1990 · 2018-10-09 19:15

Heater. wrote: »

@KeithE,

Is there any affordable Linux box using RISC V that we can buy?

As far as I can tell the closest thing we have to that so far is the HiFive Unleashed board from SiFive. A real RISC V chip that runs Linux. It's not cheap.
https://www.sifive.com/boards/hifive-unleashed

I keep goading the Raspberry Pi guys that the next Raspberry Pi should be a RISC V machine. After all their old Cambridge university pals are just across town doing this https://www.lowrisc.org/

My guess is that any winner of this competition, in either category, already has a core design in place. There is not enough time to create a new one from scratch. And they have to be Open Sourced on Github. That might only leave picorv32 at the low end and the Berkeley BOOM at the high end. It would be nice to be surprised though.

Boom has several issues : Low frequancy on FPGA and probably too big for a ICE40
But VexRiscv ...

About emulation, you are right, it open a lot of door, especialy that the requirements for the contest are quite high. A lot of useless CSR (for a softcore) has to be implemented in order to pass the compliance tests.

Dolu1990 · 2018-10-09 19:19

Currently Picorv32 isn't compliant with the https://github.com/riscv/riscv-compliance/tree/master/riscv-test-suite/rv32i
As it implement it's one CSR model.

Honnestly, a regular RISC-V softcore optimized for area will not be small with the current requirements. Unless it emulate things.

Heater. · 2018-10-09 19:29

So the field is wide open then.

A picorv32 for example cannot handle misaligned memory accesses, as the RISC V spec says it should.

We don't need the BOOM to satisfy these competition rules. Or at least not all of it.

This all gets very messy.

DiodeRed · 2018-10-09 21:03

For size optimization, maybe rather than emulation could take a hybrid approach. Directly implement the simple cases, microcode the complex cases.

Edit: By "directly implement" what I would have in mind is... define a microcode based mostly on a modified subset of risc-v instructions, such that some instructions translate trivially to single microcode instructions, and those that don't are possible to implement with a sequence of the microcode instructions.

Heater. · 2018-10-10 03:13

Whether going with emulation or microcode one is making a similar trade off, replacing FPGA LUTs with FPGA memory blocks.

It's not clear to me how the judges in this competition will weigh LUTs vs RAM in estimating the size of the design.

KeithE · 2018-10-10 04:57

Jan Gray made some posts on twitter. There were some ideas about how to judge. I think one simple metric for sixe would be based on how many one could squeeze into a given FPGA.

Here's one sequence - really in response to the guy who is trying to make a discrete RISC V, but in the context of the contest. I didn't have time to think about it too much, but maybe it could inspire some interesting approaches.

"#FPGA 1/ If I wanted to build a compact @risc_v from discrete logic, and if 'discrete' includes discrete SRAM chps, then I would build a simple bit-serial FPGA (a few thousand 4-LUTs) in TTL+SRAM, and build a @risc_v MCU on top of that. ... @azonenberg @babbageboole

#FPGA 2/The idea (http://fpgacpu.org/usenet/emulating_fpgas.html) is build a n x 4-LUT FPGA with two RAMs:
* uint8 REGS[] # to-space/from-space flip-flops
* uint16 LUTS[] # combinational logic
and assume a simple logic cell architecture where every LUT output is always registered in its local FF.

#FPGA 3/ (This LUT arch simplification punts some things to logic synthesis but you can still do multiple level LUT logic across multiple clock cycles; some FFs will have to hold by incorporating 'clock enable' behavior in their LUT logic equation.)

#FPGA 4/ The two SRAMs:

REGS: Each LUT's flip-flop is stored in bit0 of REGS.

LUTS. A logical LUT is stored in five uint16's at LUTS[i*8]:
0: A # index of first LUT input
1: B
2: C
3: D # index of last LUT input
4: LUTMASK # 16b lookup table

#FPGA 5/ To implement the N-LUT FPGA, the discrete logic + SRAMs performs:
initial { from = 0, to = N }

for i in range(N):
REGS[to+i] = LUTMASK[{REGS[from|A],REGS[from|B],REGS[from|C],REGS[from|D]}]
{ to = from, from = to }

#FPGA 6/ Discrete logic implements a little state machine + counter to read the four bits REGS[from|A] etc., read the LUTMASK, pass the 16:1 LUTMASK mux, write to REGS[to|i].

#FPGA 7/ I used from- and to-space partition of REGS[] to make order of evaluation of FPGA LUTs insignificant, but a cleverer synthesis tool could also do away with that affordance.

#FPGA 8/ Perhaps some output-only REGS are mapped to explicit FFs ('374s) to implement MAR and MDRout; other input-only REGS are mapped to input FFs for MDRin, etc.
Thus interface to SRAM memory for .text/.data, I/O CSRs, UART, parallel port, etc.

#FPGA 9/9 This is probably an adequate FPGA for a RISC-V MCU. Basic LUTs + input fan-in should be adequate even for crummy ripple-carry parallel adders. For bonus marks and extra slowness, implement parts of the MCU itself with a bit-serial microarchitecture. :-)

#FPGA An unappreciated property of RAMs (be they discrete SRAMs, or BRAMs) is (besides the storage cells) each RAM provides one or more free 2^N : 1 multiplexers. The bit-serial FPGA above takes advantage of this for LUT input fan-in.

#FPGA BTW the most convenient hardware platform to explore this compact bit-serial FPGA implementation is of course in an FPGA.

For the cost of a 36 Kb BRAM and some LUTs and LUTRAM, implement a super slow vFPGA with n<=512 4-LUTs — one vFPGA cycle per ~n clock cycles."

Heater. · 2018-10-10 08:25

Wow, can make any sense out of that twitter stream. Is it always like that? They seem to be suggesting build the ALU and such using RAM look up rather than the normal logic circuits.

Robert Baruch is building a RISC V out of logic chips and RAMS etc here:

and that is his approach as well.

Robert has had to back track and redesign things a couple of times along the way already. I thought it would be useful to play with the design in FPGA first and then implement the result using real chips. But perhaps that's not fun for him, he already done a CPU design in FPGA.

But the twitter guys mention "bit serial" which was my original thought for a discreet RISC V in TTL. In my bit serial design:

1) There would be a one bit ALU. For ALU operations the 32 bit operands are shifted in one bit at a time and the result shifted out.

2) All the registers would be 32 bit shift registers.

3) This has the neat outcome that one gets dual port registers for free. As bits are shifted in one at a time for the write they are coming out the other end for the read.

4) Of course there would be 32 bit clocks, at least, for every RISC V clock cycle so we are not running fast. But hey, who cares? We want a small, cheap, simple to build logic circuit.

KeithE · 2018-10-10 17:15

>Is it always like that?

Definitely not a great choice for giving detailed explanations. I couldn't immediately follow all of the details, but will revisit when I have more time. He's done a lot of FPGA work. e.g. "Jan Gray gets 1680 RISC-V processors to dance on the head of a Xilinx Virtex UltraScale+ VU9P FPGA at Hot Chips"

>I thought it would be useful to play with the design in FPGA first

Or just simulation. You could even just write models for the discrete parts that you intend to use. If you are making schematics, then you should be able to extract and simulate a netlist. To me it seems like an extreme case of premature fabrication.

>We want a small, cheap, simple to build logic circuit

Yes - having something really small that uses a standard tool-chain is a nice tool to have available.

Heater. · 2018-10-10 19:08

KeithE,

Wow, sounds like Jan Gray can easily clean up in the "small" category of this competition: https://forums.xilinx.com/t5/Xcell-Daily-Blog-Archived/Shazam-Jan-Gray-gets-1680-RISC-V-processors-to-dance-on-the-head/ba-p/789549

Or just simulation....

Yes, my thoughts exactly. If you have an idea to use this or that TTL/CMOS logic chips or RAMs/ROMs in your hardware solution then it's not such a stretch to write the logic that does what they do in Verilog or Chisel or SpinalHDL. Then assemble those components into your proposed design, then simulate it using Icarus or Verilator. Moving that to an actual FPGA is optional. When it checks out start building it using the real chips.

Personally I would not want to be dealing with schematics at all.

Yes - having something really small that uses a standard tool-chain is a nice tool to have available.

This is idea is going to bug me now. What is the minimum amount of TTL/CMOS logic chips or ROMs/RAMs required to build a machine capable of running code generated by the RISC V GGC compiler? No matter how slowly.

Robert Baruch is leading the way but his approach requires multiple 32 bit buses. I'm not sure but I think that by serializing everything we could make it much smaller and cheaper.

jmg · 2018-10-10 19:18

Heater. wrote: »

This is idea is going to bug me now. What is the minimum amount of TTL/CMOS logic chips or ROMs/RAMs required to build a machine capable of running code generated by the RISC V GGC compiler? No matter how slowly.

The minimum is likely a 25c MCU (3x3mm), with a SPI flash(1.5x1.5mm) attached for the code.
The limitation here would likely be emulated RAM, (1k+ byte region) but you did not say what code it needs to run

The next step of ~ 70c, gets you XMC1xxx ARM, VQFN24, with 16K RAM as emulation host

Heater. · 2018-10-10 20:12

jmg,

Well, yes, one can use whatever cheapo processor to run an emulation of the machine you want.

That's how Linux runs on an 8 bit AVR that is in turn running an emulation of a 32 bit ARM processor. Somebody here has already written a RISC V emulator that runs on the Propeller.

That reduces the whole problem down to a software problem. How to write the emulator.

That is not an interesting challenge. Not in the spirit of wanting to build the CPU from good old fashioned hardware, logic chips.

In the extreme one would want to build the whole thing from individual discrete transistors. Like the Monster 6502. https://monster6502.com/

Or looked at another way, the challenge is to build a processor without using a processor to do it. Which is kind of cheating.

...but you did not say what code it needs to run

Hmm...given that you have a cheapo 8 bit MCU that can some how access a pile of memory, by whatever means, then the code that it needs to run on that MCU is an emulation of a RISC V machine. Which I suspect can be done in a handful of kilobytes. The RISC V instruction set is really small and simple.

Heater. · 2018-10-10 20:16

jmg,

But yes, in terms the RISC V design contest, is it so that to create the smallest FPGA implementation one could implement a simple 8 bit CPU in Verilog and have that emulate the RISC V? Would it really be smaller? I don't know.

KeithE · 2018-10-11 01:05

Many years ago Chuck Moore was shopping around the idea of emulating an x86 with his little machine Forth cores. The J1A Forth core (not Chuck's) has been ported to the small Lattice HX20-1K part - http://www.excamera.com/sphinx/article-j1a-swapforth.html

"The j1a and its current peripherals (LEDs, uart) take 1162 of the available 1280 logic cells on the iCE40 HX1K. So there is room for some more peripherals"

So ignoring the RAM you could fit 4.5 of those on the UP5K part.

RISC V ?

Comments