RISC V ?

1679111223

Comments

  • KeithE wrote: »
    There is a flow included with picorv32. I probably missed something given you said 5280, but here are results for the 7680 LC part the HX8K.

    Thanks, but I may have been unclear.
    I want to target the iCE40UP5K–SG48, which is a new member with 5280 LUT, but more important, it includes 128kBytes SRAM, and 8 x (16 x 16 Multiply & 32 bit Accumulator Blocks), in a QFN48

    (HX8K has much less RAM on chip, at just 128k Bits)

    I have Lattice iCECube2 tool flows for this, but the open-source efforts do not yet generate bitstreams for this new iCE40.
    So I was looking for Lattice-Tool-Chain(Lattice iCECube2) design flows.

  • RamonRamon Posts: 348
    edited April 2017 Vote Up0Vote Down
    Heater. wrote: »
    Here is what I am seeing on the nano (In order of removing things) :
    Everything:        2717 LE, 1276 regs, 12%
    No barrel shifter: 2564 LE, 1219 regs, 11%
    No div:            2143 LE, 1019 regs, 10%
    No mul:            1964 LE,  928 regs,  9%
    

    Heater, have you tried the RV32E version?

    (from https://github.com/cliffordwolf/picorv32

    For even smaller size it is possible disable support for registers x16 .. x31 as well as RDCYCLE[H] , RDTIME[H] , and RDINSTRET[H] instructions, turning the processor into an RV32E core.

    Furthermore it is possible to choose between a dual-port and a single-port register file implementation. The former provides better performance while the latter results in a smaller core.

    Note: In architectures that implement the register file in dedicated memory resources, such as many FPGAs, disabling the 16 upper registers and/or disabling the dual-port register file may not further reduce the core size. )


    Does anyone knows how to force Quartus to implement the registers without block ram? Don't know if there is any directive or parameter to do that.

    I remember there was a and old thread in Yosys discussing about how was block ram inferred: https://www.reddit.com/r/yosys/comments/1yisxp/using_yosys_to_techmap_block_rams/
  • I think I have found something (thanks to this link: http://www.alteraforum.com/forum/archive/index.php/t-50792.html)

    A google search for "no_rw_check attribute" lead me to : http://quartushelp.altera.com/15.0/mergedProjects/hdl/vlog/vlog_file_dir_ram.htm

    "To use the ramstyle attribute in a Verilog Design File (.v), specify the synthesis attribute delimited by (* and *) preceding the Variable Declaration of an inferred RAM whose implementation you want to control. Specify the synthesis attribute value as "logic", "M512", "M4K", "M9K", "M20K", "M144K", "MLAB", or "M-RAM" depending on the type of memory block you want the Quartus II software to use when implementing the inferred RAM. If you use the synthesis attribute on anything but a variable that represents a RAM, or if you specify an illegal value, the Quartus II software ignores that synthesis attribute.

    By setting the value to "M512", "M4K", "M9K", "M20K", "M144K", "MLAB", or "M-RAM", you can choose the type of memory block that the Quartus II software uses when implementing the inferred RAM. If the attribute is set to "logic", then the RAM is implemented in logic cells."


  • jmg wrote: »
    ... the open-source efforts do not yet generate bitstreams for this new iCE40.
    So I was looking for Lattice-Tool-Chain(Lattice iCECube2) design flows.
    Yeah - doesn't sound like this will be supported by the open source tools any time soon.
    https://github.com/cliffordwolf/icestorm/issues/68
  • (Related to register implementation) How to implement register file to block RAM manually?
    https://github.com/cliffordwolf/picorv32/issues/30
  • Ramon wrote: »
    Does anyone knows how to force Quartus to implement the registers without block ram? Don't know if there is any directive or parameter to do that.
    Why would you want to do this? The block ram will give you an efficient dual port ram. I think anything else will be larger and slower.
  • KeithE wrote: »
    Yeah - doesn't sound like this will be supported by the open source tools any time soon.
    https://github.com/cliffordwolf/icestorm/issues/68

    Thanks, there is a comment there
    "make sense to start with the flagman chip on Eval board: http://www.digikey.com/short/3fpvm3"
    but the link fails, and search of Lattice flagman finds nothing ?

  • RamonRamon Posts: 348
    edited April 2017 Vote Up0Vote Down
    KeithE wrote: »
    Ramon wrote: »
    Does anyone knows how to force Quartus to implement the registers without block ram? Don't know if there is any directive or parameter to do that.
    Why would you want to do this? The block ram will give you an efficient dual port ram. I think anything else will be larger and slower.

    I just want to check how many LE are traded-off for M9K blocks. Yes, It will probably not make any sense. It will take too many LE instead of one M9K block.

    A quick look at picorv32.v to check how many 64 and 32 bit registers are used shows this:
    localparam integer regfile_size = (ENABLE_REGS_16_31 ? 32 : 16) + 4*ENABLE_IRQ*ENABLE_IRQ_QREGS;
    
    reg [63:0] count_cycle, count_instr;
    reg [31:0] reg_pc, reg_next_pc, reg_op1, reg_op2, reg_out;
    reg [31:0] cpuregs [0:regfile_size-1];
    

    Interestingly, if you disable counters ('ENABLE_COUNTERS64 = 0' for RV32E version) they are still defined but not used (they take space). Don't know if they can be commented (I think other code will need to be commented out too).

    There are 5 control registers (reg_xxxx) that cannot be changed.

    And for cpuregs, as ENABLE_IRQ is disabled by default, those are 32 by default. But can be reduced to 16.

    That means that the minimum 32-bit CPU registers are 21 (16x cpuregs + reg_pc, reg_next_pc, reg_op1, reg_op2, reg_out) if we don't use the counters. Those are 672 bits, they can almost fit in an Arria MLAB (640 bits LUTRAM).

    It could be interesting to check how 'ramstyle MLAB', 'ramstyle M9K' and 'ramstyle logic' compares.

    As DE0-Nano (Cyclone IV) does not have MLAB, those registers are using only 14% of one M9K block. Yes, those are not the only 32 bit registers used, so it is not relevant for actual M9K usage. But it still a good exercise for those that want to check if a minimal picoRV32E can fit in a big CPLD that has no block RAM. Also it can serve to test ways to implement the CPU that do not use FPGA (like Yosys/standard cell, OpenRAM, opencircuitdesign/Magic). Or for a future Yosys to Quartus synthesis comparison (now that they seem to start supporting that option too).
  • RamonRamon Posts: 348
    edited April 2017 Vote Up0Vote Down
    Heater, please can you upload the hex files into github?

    Quartus complains that it cannot find the firmware: 'Error (10054): can't open Verilog Design File "firmware/firmware0.hex"'

    After two hours download of Quartus 16.1 \
    30 minutes installing \
    another 30 minutes downloading and installing Python 3.x
    (because makehex.py don't work on versions 2.x), \

    ... now I have found that I need to download the riscv toolchain and compile it for msys? minGW? to get 'make'

    Please, have mercy ! (my 5 years old computer took 4 minutes to get into 11% and throw that error)
  • Ramon,

    I tried the RV32E version. The whole build got down to 7% of the Nano. I forget actual number of LEs.

    But my code would not run. I guess it needs recompiling with some option or other. And the start.S would need changing.
    I gave up messing with it.

    It also won't run with ENABLE_REGS_DUALPORT=0. At least not at 100MHz.

    I disabled those counters and got down to 1,726 LE ( 8 % )
  • Thank you, those numbers are interesting. I am curious about how far can we go on minimal resource usage.

    Haven't found much info about the RV32E or RV32EC variants. It seems that almost nobody cares about a minimalist riscv (maybe because it won't let you get into a Linux kernel or any other fancy multiuser OS) and maybe the toolchains are not configured by default to that option.

    There is also a report (don't remember if it is 'Analysis & Synthesis > Summary') where it will show the number of memory blocks used and other interesting things. Do you remember, or did you keep that info?
  • You mean like:

    Flow Status Successful - Sun Apr 16 15:59:16 2017
    Quartus Prime Version 16.1.0 Build 196 10/24/2016 SJ Lite Edition
    Revision Name xoro
    Top-level Entity Name xoro_top
    Family Cyclone IV E
    Device EP4CE22F17C6
    Timing Models Final
    Total logic elements 1,726 / 22,320 ( 8 % )
    Total registers 758
    Total pins 15 / 154 ( 10 % )
    Total virtual pins 0
    Total memory bits 395,264 / 608,256 ( 65 % )
    Embedded Multiplier 9-bit elements 0 / 132 ( 0 % )
    Total PLLs 1 / 4 ( 25 % )

  • Thanks, I was looking for this line:

    'Total memory bits 395,264 / 608,256 ( 65 % )'

    The EP4CE22F17C6 has exactly 66 M9K bloks (8,192 bis, or 9,216 bits including parity)

    It seems that this CPU use around 48/49 blocks (50 KB)? Don't know how much of that is for the hex files?

    I will try tomorrow to compile riscv toolchain for mingw/msys and get more detailed info about M9K blocks/LE usage.
  • KeithEKeithE Posts: 937
    edited April 2017 Vote Up0Vote Down
    Ramon wrote: »
    Heater, please can you upload the hex files into github?
    He attached them earlier in the thread.
    http://forums.parallax.com/discussion/download/120163/firmware.zip
  • KeithEKeithE Posts: 937
    edited April 2017 Vote Up0Vote Down
    Ramon wrote: »
    Interestingly, if you disable counters ('ENABLE_COUNTERS64 = 0' for RV32E version) they are still defined but not used (they take space). Don't know if they can be commented (I think other code will need to be commented out too).
    Registers which are just declared and not used should get pruned out of the design.

    Edited to add: also reg doesn't necessarily imply a storage element (flip-flop or RAM). It can be used for combinatorial logic. There's now a datatype called logic which was supposed to make this more clear. Not sure that worked.
  • Ramon,

    Well, no. HEX files are built from source. I don't like to put derived files in a git repo.

    I do feel you pain with downloading, building and installing things. There has been a lot of that going on around here.

    The only memory blocks the actual CPU uses are for it's regs, 32 time 32 bits. So most of the RAM block usage reported is the 48KBytes of memory space.
  • It's Easter: too bad that Heater didn't discover picorv32 before making the big California road trip. Could have made a nice prop2 "Easter Egg" if he had access to the right files ;-)

    Anyways - if anyone is interested in working on RISC V related projects, there may be some inspiration here: http://www.lowrisc.org/docs/gsoc-2017-ideas/
  • KeithEKeithE Posts: 937
    edited April 2017 Vote Up0Vote Down
    In case there are any nutballs like me that want to build RISC V C code on a Pi. I confirmed that the icoboard image mentioned previously does work. I didn't want to reimage an SD card, so I did the following. Maybe someone will point out a much easier way. (I did this under Mint Linux)
    1 - grabbed the .img from here http://icoboard.org/get-started-with-your-icoboard-and-a-raspi.html
    2 - unzip 2017-03-02-raspbian-jessie-icotools.zip
    3 - parted 2017-03-02-raspbian-jessie-icotools.img
    (Used this to figure out the offset 70254592 below - see online for details)
    4 - sudo mkdir /mnt/pi
    5 - sudo mount -o loop,offset=70254592 2017-03-02-raspbian-jessie-icotools.img /mnt/pi
    
    Then I tarred up the appropriate subdirectories in /mnt/pi/opt. Things are a little large and I only wanted riscv32i. Here are the filesizes after the tarballs are compressed. They are roughly 600 MB each when uncompressed. I'm sure that there's a lot of cruft that could be cleaned up to reduce this.
    -rw-r--r--  1 keith keith  239664868 Apr 16 15:34 riscv32ic.tar.bz2
    -rw-r--r--  1 keith keith  239654975 Apr 16 15:36 riscv32imc.tar.bz2
    -rw-r--r--  1 keith keith  239559853 Apr 16 15:35 riscv32im.tar.bz2
    -rw-r--r--  1 keith keith  239568188 Apr 16 15:33 riscv32i.tar.bz2
    Edited to add: Is it not completely INSANE to have a separate HUGE set of files for each little tweak? m = multiply, c = compressed instructions. Why aren't these switches?
    
    I used this to build Heater's helloWorld on a Pi, and ran a simulation. I can see "Hello world!" being printed in gtkwave. Look at uartTx.buffer and view as ASCII - the simulation does NOT terminate, so you need to wait a minute or two and Ctrl-C it.
    This is much easier than building these tools on the Pi. (I filled up a USB stick when I tried this a couple of nights ago - 8 GB of working space is not quite enough. Also for anyone that tries it's probably best not to use FAT. Or if you do, then you probably need to mount it with the exec option.)
  • Ramon wrote: »
    Thank you, those numbers are interesting. I am curious about how far can we go on minimal resource usage.
    Is this focus on size because you want to pack multiple cores into some larger FPGA ?
    The gain by shrinking registers seems minimal, as that means tools chains and libraries need careful mode control.
    With multiple cores, you might be able to share BOOT memory, and with XIP you may have a choice of lock-step boot on initial reset, or soft-reset done one at a time ?

  • I'm trying to port a bootloader (audioboot loader, yes audio) to the riscv32, I added an input port, modified the output port, added resettable timers... let's see.
    (Original from :https://github.com/ChrisMicro/AttinySound
  • Ale,

    An audio bootloader? What an awesome idea!

    Back in 1980 or so we built a Mororola 6809 board, all wire wrap. The guy I was working with added a cassette tape storage. Programs were written out to a UART that was then FSK modulated. He managed to decode the audio for retrieval with just a flop or two and some C and R. It was good for 300 baud!

    We never thought to make it into an actual boot loader. We had a monitor program in EEPROM that had a load command to read programs from tape.

    What kind of modulation scheme does this bootloader use? I always thought we could transfer data over audio a lot faster than 300 baud!
  • KeithE,

    Of course. There are plenty of nutballs out here. Getting the Pi building FPGA
    configs is on the end of my very long TODO list.

    Well done!

  • Meanwhile, down in Columbia they have an actual RISC V micro-controller chip :

  • KeithE wrote: »
    Ramon wrote: »

    Keith, thank you. I recall that Heater posted that before, but did you believe that I was searching for that attachment through all pages of the thread and I was unable to find it !

    And, I still cannot find the original post. I searched "firmware" and "attachment" and I can only find your post, but not the original post by Heater. I guess there should be some some attachment icon somewhere that I am not able to see yet.
  • KeithE wrote: »
    Registers which are just declared and not used should get pruned out of the design.

    Yes you are right. I have run Quartus today and I didn't found the counters in the block ram so they must be actually prunned.

    I was confused about this code (lines 1339 .. 1345):
    		if (ENABLE_COUNTERS) begin
    			count_cycle <= resetn ? count_cycle + 1 : 0;
    			if (!ENABLE_COUNTERS64) count_cycle[63:32] <= 0;
    		end else begin
    			count_cycle <= 'bx;
    			count_instr <= 'bx;
    		end
    

    Why does he need to assign 'x' / don't care? for simulation maybe?
  • Heater.Heater. Posts: 21,174
    edited April 2017 Vote Up0Vote Down
    Now, now.

    The current firmware is attached. It spits out "Hello World!" to the serial TX pin, PIN_A13, whatever that is on the Nano header.

    It's probably the same as it was before. The hex files are good, most of the rest is not used.

  • Heater. wrote: »
    I do feel you pain with downloading, building and installing things. There has been a lot of that going on around here.

    I actually like downloading, building and installing. It is just that 'gcc' is soooooo special ...

    4 hours spend today to build the risc-v toolchain. This time I used a relative modern computer with a E6600 at 3.00GHz.

    So I followed all steps. Cloning git ... (I was fortunate to have a 7 MB download speed, otherwise I would still be downloading hundreds of MB), ./configure , make ... and first error : I need a compiler with c++11 syntax.

    My 13.04 has gcc 4.7.3 by default, and risc-v developpers consider this version 'old', although it is only is 4 years old. OK, if I need 4.8 for the latest risc-v toolchain commit I will install it.

    Then finding a ppa for 13.04 and GCC 4.8. Great, google saved my life, two post with complete instructions about how to do upgrade c++ / g++ to 4.8. So I follow all steps and start trying again to compile the risc-v toolchain with my new version 4.8.1 gcc.

    After 15 minutes throwing garbage to the console, second error : something about unknown as opcode. OK, no problem should be some gcc first stage compiler glitch. I will not modify anything and try again ...

    Then third 'make' and it is now going better, after 20 minutes it crash badly with a coredump message.

    Then tried again ... 30 minutes throwing garbage to the console (now I can understand why so many changed to clang instead gcc with that wonderful percentage bar) ... and same coredump message.

    At this point, I was now thinking that maybe I should try another thing. And I guess that someone would made that decission earlier, but I'll give it a try again (another make) ... but also in parallel I opened a remote shell to a xeon with 18 cores and 16 GB ECC to check what the hell is going on with this toolchain. I repeat all step for this computer (with gcc version 4.8.2) and it worked right for the first time.

    So now I go back to my computer and found that after five failed attempts it worked. Now I also have the toolchain in /opt

    I don't know if the problem was that my computer has bad RAM? or that I do need at least a 4.8.2 gcc to correctly compile? I think I already have enough, and I will not repeat the experience for MinGW/MSYS.

    I can understand some hate for how things are done nowadays.

    First: what is wrong with a simple tar.gz?
    Second: what is wrong with a .deb, or .rpm, or even a windows .zip installer?
    Third: why do they need to use the latest compiler?
    Is it really that c++2011 is so much better than c++2003?

    I remember those old days when a compiler fit in a floppy disk ... too many things are going wrong.
  • Ramon wrote: »
    Why does he need to assign 'x' / don't care? for simulation maybe?

    I think so. When I was browsing around I saw him do a similar thing for simulation elsewhere.
  • Ramon wrote: »
    ..........

    I can understand some hate for how things are done nowadays.

    First: what is wrong with a simple tar.gz?
    Second: what is wrong with a .deb, or .rpm, or even a windows .zip installer?
    Third: why do they need to use the latest compiler?
    Is it really that c++2011 is so much better than c++2003?

    I remember those old days when a compiler fit in a floppy disk ... too many things are going wrong.

    I remember those days as well, and even earlier, including upgrading from a 1702 eprom board to a 2708 board so I could fit a small assembler in with the eprom monitor. Those were fun days, and finding the Propeller chip and this forum brought a lot of that fun back.
    In science there is no authority. There is only experiment.
    Life is unpredictable. Eat dessert first.
  • Ramon,

    That sounds like many of attempts to build this and that over the years.

    What platform are you building on?

    I built the riscv toolchain on a tiny Atom industrial PC running Debian Jessie. Worked first time. Debian is not famous for having the latest cutting edge everything.

    To be fair:

    a simple tar.gz will not work on all platforms.

    A .deb, .rpm, or windows .zip install is not going to work everywhere. And besides that is extra work to maintain. These guys have better things to do. Like create the software. This is all very new remember.

    C++2011 is six years old now. I don't think they are asking too much to use that.

    But, yep. All these things trip us up. Thank God for Debian and GCC.
    (now I can understand why so many changed to clang instead gcc with that wonderful percentage bar)
    Hmmm...last time I built stuff with clang there was the same verbose output. No percentage bar.

    Clang is great mind. It has kept the GCC guys on their toes!

Sign In or Register to comment.