RISC V ?

1235716

Comments

  • KeithEKeithE Posts: 883
    edited April 11 Vote Up0Vote Down
    You'll want to clone Heater's https://github.com/ZiCog/xoro

    But you'll also need to clone https://github.com/cliffordwolf/picorv32 and follow the instructions to build the RISC V tools. That is unless heater gives you the helloworld image. (You might be able to build the RISC V tools from heater's xoxo, but I figure it's safest to build from picorv32.)
  • Great.

    My little project is here: https://github.com/ZiCog/xoro

    If you happen to have a DE0 Nano it should work out of the box.

    Baud rate on the UART turns out to be 38400 baud at the moment.
  • Heater. wrote: »
    Great.
    Baud rate on the UART turns out to be 38400 baud at the moment.
    I was wondering how the math worked out. A comment in the UART states:
    115200 baud when driven from 50Mhz clock
    I borked my Pi when trying to build the RISC V tools.
  • Heater. wrote: »
    Great.

    My little project is here: https://github.com/ZiCog/xoro

    If you happen to have a DE0 Nano it should work out of the box.

    Baud rate on the UART turns out to be 38400 baud at the moment.
    Thanks! Does it include instructions on how to run it in iverilog?

  • I added some rudimentary instructions to the repo: https://github.com/ZiCog/xoro

    A little something on rebuilding the firmware, and running on a DE0 Nano or under Icarus.

    You don't really need the picorv32 repository but the instructions Clifford has there for building the riscv tool chain are great.

    Yes, don't use my Makefile to try and download and build the riscv tool chain. I should remove all that stuff from there.
  • Heater. wrote: »
    I added some rudimentary instructions to the repo: https://github.com/ZiCog/xoro

    A little something on rebuilding the firmware, and running on a DE0 Nano or under Icarus.

    You don't really need the picorv32 repository but the instructions Clifford has there for building the riscv tool chain are great.

    Yes, don't use my Makefile to try and download and build the riscv tool chain. I should remove all that stuff from there.
    Thanks! That's very helpful. Partly I just wanted to see how to invoke the Icarus tools so I can play with my own simple CPU.

  • This "own CPU" thing seems to be an itch that many softies have. Some kind of innate desire to get down from the heights of database design, high level languages and such to the bottom of the computing stack.

    For example, the other day I met a guy hacking some x86 assembler on an old laptop. Turned out he was a SAP guy but decided to learn something about the lower levels of computing. He was hacking boot code to put in the MBR of disk drives so he could boot straight into games with no operating system at all. Excellent!

    I have been fascinated by this "own CPU" thing for decades. Never happened because: I barely have I an idea how to do design one, the work involved in building all that hardware seemed to much, I then need a compiler or at least assembler to do anything with it.

    Now, I seem to be a breath away from my "own CPU". It would have to be a ZPU or RISC V so that I have dev tools available to use with it. It would have to be a verilog design for FPGA, to save all that hardware building.

    All of a sudden that all seems like cheating:)



  • Heater. wrote: »
    This "own CPU" thing seems to be an itch that many softies have. Some kind of innate desire to get down from the heights of database design, high level languages and such to the bottom of the computing stack.

    For example, the other day I met a guy hacking some x86 assembler on an old laptop. Turned out he was a SAP guy but decided to learn something about the lower levels of computing. He was hacking boot code to put in the MBR of disk drives so he could boot straight into games with no operating system at all. Excellent!

    I have been fascinated by this "own CPU" thing for decades. Never happened because: I barely have I an idea how to do design one, the work involved in building all that hardware seemed to much, I then need a compiler or at least assembler to do anything with it.

    Now, I seem to be a breath away from my "own CPU". It would have to be a ZPU or RISC V so that I have dev tools available to use with it. It would have to be a verilog design for FPGA, to save all that hardware building.

    All of a sudden that all seems like cheating:)


    I have an assembler already written for my simple CPU. There never will be a C compiler although it supports an instruction and data cache so I suppose it could address a lot of memory.

  • Curiouser and curiouser.

    Do you already have a software simulation of your CPU that the assembler works against?
  • Heater. wrote: »
    Curiouser and curiouser.

    Do you already have a software simulation of your CPU that the assembler works against?
    No. The CPU is utterly trivial. I'm not sure it would even be possible to write a C compiler for it. I'd have to add lots of new instructions.

  • Even more curious.

    My idea of an utterly trivial CPU is the SUBLEQ. It only has one instruction "subleq", subtract and branch if the result is less than or equal to zero.

    As there is only one instruction there is no need for an instruction decoder!

    So:

    1) Read and operand from memory.
    2) Read another operand from the next memory location.
    3) Read a branch address from the next memory location.
    4) Subtract the first operand from the second and write the result back to the first. (or is it vice versa?)
    5) If the result is less than or equal zero jump to the branch address.
    6) Continue from 1)

    With this we have a "Turing Complete" machine that can do anything any other computer can do.

    There is even a compiler for a C like language to subleq code. I'm kind of tempted to try that.

    So, I guess if your CPU has at least that capability a C compiler, or similar, is possible.

  • Heater. wrote: »
    Even more curious.

    My idea of an utterly trivial CPU is the SUBLEQ. It only has one instruction "subleq", subtract and branch if the result is less than or equal to zero.

    As there is only one instruction there is no need for an instruction decoder!

    So:

    1) Read and operand from memory.
    2) Read another operand from the next memory location.
    3) Read a branch address from the next memory location.
    4) Subtract the first operand from the second and write the result back to the first. (or is it vice versa?)
    5) If the result is less than or equal zero jump to the branch address.
    6) Continue from 1)

    With this we have a "Turing Complete" machine that can do anything any other computer can do.

    There is even a compiler for a C like language to subleq code. I'm kind of tempted to try that.

    So, I guess if your CPU has at least that capability a C compiler, or similar, is possible.

    I sometimes wonder what computing would be like if performance were unlimited, never minding memory. If that were enabled, somehow, a SUBLEQ architecture would be as good as any. Maybe quantum computing could get away with something so simple, if every possible state were to be evaluated in a small amount of time.
  • I sometimes also fantasize about such things. Given infinite performance we would not need multiple cores. The one computer could do it all. In fact we would only need one computer ever. It could do everything.

    There seems to be some limits though. Computing requires energy input. A vast amount of compute power would require a vast amount of energy input. Einstein tells us of the equivalence of mass and energy. So that energy is like a mass, it has gravity. That gravity would collapse the machine into a black hole!

    I don't fathom the quantum computing thing at all. Sure the thing can hold a gazillion states at the same time, in superposition. How does one ever get a "correct" answer out of it? Quantum mechanics talks of probabilities. Does not sound like a recipe for getting concrete results.

  • Heater. wrote: »
    Even more curious.

    My idea of an utterly trivial CPU is the SUBLEQ. It only has one instruction "subleq", subtract and branch if the result is less than or equal to zero.

    As there is only one instruction there is no need for an instruction decoder!

    So:

    1) Read and operand from memory.
    2) Read another operand from the next memory location.
    3) Read a branch address from the next memory location.
    4) Subtract the first operand from the second and write the result back to the first. (or is it vice versa?)
    5) If the result is less than or equal zero jump to the branch address.
    6) Continue from 1)

    With this we have a "Turing Complete" machine that can do anything any other computer can do.

    There is even a compiler for a C like language to subleq code. I'm kind of tempted to try that.

    So, I guess if your CPU has at least that capability a C compiler, or similar, is possible.
    I've heard of that. I would hate to have to write a compiler for it.

    I very stupidly never documented the instruction set of my processor so I guess I'll have to reverse engineer it from either the Verilog or the assembler. I do know that it had 16 32 bit registers and used 3 addresses for arithmetic instructions and a immediate value of 12 bits so I guess it must have been limited to 4K of memory. Here are the instructions:
    {   "halt", 0x00,   NoArgHandler    },
    {   "add",  0x01,   RegHandler      },
    {   "sub",  0x02,   RegHandler      },
    {   "and",  0x03,   RegHandler      },
    {   "or",   0x04,   RegHandler      },
    {   "xor",  0x05,   RegHandler      },
    {   "ld",   0x86,   MemHandler      },
    {   "st",   0x87,   MemHandler      },
    {   "br",   0x08,   BranchHandler   },  /* beq r0,r0 */
    {   "beq",  0x08,   CBranchHandler  },
    {   "bne",  0x09,   CBranchHandler  },
    {   "bsr",  0x0a,   BsrHandler      },
    {   "jmp",  0x0b,   JmpHandler      },
    {   "nop",  0x7f,   NoArgHandler    },
    

  • There seem to be a lot of DIY CPU makers who just do it to get forth up. b16, J1, ...

    I've wondered about Chip's thoughts too - at least along the lines of what is an optimal ISA.
  • KeithE,
    I've wondered about Chip's thoughts too - at least along the lines of what is an optimal ISA.
    What is an optimal ISA?

    Optimal for what? Fast as possible, least hardware as possible, energy efficient as possible, optimal for compilers or human ASM programmers, etc, etc.

    Sounds like the CISC vs RISC debate that has been going on for two decades or more.

    In the meantime Intel with their CISC ISA adopted RISC internally for speed. They translate the horrible x86 instructions you see on the outside to simpler RISC instructions, in hardware, for internal evaluation.

    Also in the meantime, in the other direction, the simple RISC instruction set of ARM has grown like crazy over the years and is now very CISC.

    Also in the meantime, those RISC pioneers and their students at Berkeley have stuck to their ideals over the years and honed the RISC idea into RISC V.

    They seem to have been homing in on the balance between what the hardware can do and the compilers can make use of.

    Is that optimal for everything? Probably not.





  • I know - it all depends on your goals. What's optimal for one application may not be for another. I've just seen papers where they profile what gets used and then make decisions based on that, but there are potentially flaws with that approach. Maybe it's good to decide what to drop, but doesn't tell you where you could add something. There is a genealogy chart up on the RISC V site - https://riscv.org/risc-v-geneology/

    Anyways here's some feedback on the log that you posted earlier for what it's worth:
    Warning (10858): Verilog HDL warning at xoro_top.v(50): object CLOCK_100_SHIFTED used but never assigned
    Warning (10030): Net "CLOCK_100_SHIFTED" at xoro_top.v(50) has no driver or initial value, using a default initial value '0'

    Warning (10858): Verilog HDL warning at xoro_top.v(51): object CLOCK_10 used but never assigned
    Warning (10030): Net "CLOCK_10" at xoro_top.v(51) has no driver or initial value, using a default initial value '0'

    Warning (10858): Verilog HDL warning at xoro_top.v(52): object CLOCK_LOCKED used but never assigned
    Warning (10030): Net "CLOCK_LOCKED" at xoro_top.v(52) has no driver or initial value, using a default initial value '0'
    These are because you are sending floating wires to RND_OUT here (Instantiation of pll_sys is commented out, perhaps because you didn't meet timing?):

    assign RND_OUT = {CLOCK_100, CLOCK_100_SHIFTED, CLOCK_10, CLOCK_LOCKED};
    Warning (10230): Verilog HDL assignment warning at xoro_top.v(56): truncated value with size 32 to match size of target (8)

    Warning (10030): Net "pcpi_rd" at xoro_top.v(31) has no driver or initial value, using a default initial value '0'
    Warning (10030): Net "irq" at xoro_top.v(36) has no driver or initial value, using a default initial value '0'
    Warning (10030): Net "pcpi_wr" at xoro_top.v(30) has no driver or initial value, using a default initial value '0'
    Warning (10030): Net "pcpi_wait" at xoro_top.v(32) has no driver or initial value, using a default initial value '0'
    Warning (10030): Net "pcpi_ready" at xoro_top.v(33) has no driver or initial value, using a default initial value '0'
    These are because you declared these are regs, but never initialized them. They might as well be wires with zeros assigned.
    Info (12128): Elaborating entity "gpio" for hierarchy "gpio:gpio"
    Warning (10230): Verilog HDL assignment warning at gpio.v(46): truncated value with size 32 to match size of target (8)
    q <= gpio; // add gpio[7:0] to eliminate the warning
    Info (12128): Elaborating entity "prng" for hierarchy "prng:prng"
    Warning (10230): Verilog HDL assignment warning at prng.v(42): truncated value with size 64 to match size of target (1)
    q <= prng_out; // prng_out[0]
  • jmgjmg Posts: 10,611
    edited April 11 Vote Up0Vote Down
    Heater. wrote: »
    Even more curious.

    My idea of an utterly trivial CPU is the SUBLEQ. It only has one instruction "subleq", subtract and branch if the result is less than or equal to zero.

    As there is only one instruction there is no need for an instruction decoder!

    So:

    1) Read and operand from memory.
    2) Read another operand from the next memory location.
    3) Read a branch address from the next memory location.
    4) Subtract the first operand from the second and write the result back to the first. (or is it vice versa?)
    5) If the result is less than or equal zero jump to the branch address.
    6) Continue from 1)

    With this we have a "Turing Complete" machine that can do anything any other computer can do.

    There is even a compiler for a C like language to subleq code. I'm kind of tempted to try that.
    The SUBLEQ core is very simple, but the code size and speed suffer greatly, so this is mainly an academic curiosity.
    (Like the guy who boots Luinux (over some hours?) on an 8 bit AVR, by emulating ARM opcodes in software )

    Of more practical use are compact cores that also give efficient code.

    The Lattice Mico8 is quite compact, and well suited to FPGA, but when I asked, Lattice (quite strangely) said they have no plans to offer a Mico8 example for iCE40 ?!
    I'd have thought the 128kBytes in the new ice40up5ksg48 made such a port a no-brainer.
  • Any idea what the "N" extension in this RISC V posting is referring to? Everything else is pretty obvious from the ISA specification, but I don't know what it could be.

    https://groups.google.com/a/groups.riscv.org/forum/#!searchin/isa-dev/whirlwind/isa-dev/sUQk6pFG0eA/AGNF44e4AAAJ
  • jmgjmg Posts: 10,611
    KeithE wrote: »
    Any idea what the "N" extension in this RISC V posting is referring to?

    Wow, Twelve extensions already, tho only 5 are showing as frozen & V2.0


  • I think when lattice first released the mico it was more open too. Then they decided to lock it to their parts. Or am I not remembering correctly. If it were open then I would think a port would be trivial. Maybe open cores has a clone?

    The extensions aren't so bad. If I weren't on a phone I would post the descriptions.
  • jmgjmg Posts: 10,611
    edited April 11 Vote Up0Vote Down
    KeithE wrote: »
    I think when lattice first released the mico it was more open too. Then they decided to lock it to their parts. Or am I not remembering correctly. If it were open then I would think a port would be trivial. Maybe open cores has a clone?
    The source is there, when I try a build of the Mico8 (v3.15) it seems to pass synth, with this result
    Resource Usage Report for isp8 
    Mapping to part: ice40up5ksg48
    Cell usage:
    GND             5 uses
    SB_CARRY        8 uses
    SB_DFFER        29 uses
    SB_DFFR         41 uses
    SB_DFFS         4 uses
    SB_GB           1 use
    VCC             5 uses
    pmi_addsub_8s_8s_off_XP_pmi_addsub  1 use
    pmi_distributed_dpram_32s_5s_8s_noreg_none_binary_XP_pmi_distributed_dpram_Z4  2 uses
    pmi_distributed_spram_16s_4s_11s_noreg_none_binary_XP_pmi_distributed_spram_Z1  1 use
    pmi_distributed_spram_32s_5s_8s_noreg_none_binary_XP_pmi_distributed_spram_Z5  1 use
    pmi_rom_512s_9s_18s_noreg_disable_async_prom_init\.hex_hex_XP_pmi_rom_Z2  1 use
    SB_LUT4         292 uses
    I/O ports: 40
    I/O primitives: 40
    SB_GB_IO       1 use
    SB_IO          39 uses
    I/O Register bits:                  0
    Register bits not including I/Os:   74 (1%)
    Total load per clock:
       isp8|clk: 1
    @S |Mapping Summary:
    Total  LUTs: 292 (5%)
    Distribution of All Consumed LUTs = LUT4 
    Distribution of All Consumed Luts 292 = 292 
    Mapper successful!
    

    ie the Source seems ok, and reports about as expected, but the next step fails with
    Error: Module pmi_distributed_spram_32s_5s_8s_noreg_none_binary_XP_pmi_distributed_spram_Z5 is not a valid primitive. Please check!
    Error: Module pmi_addsub_8s_8s_off_XP_pmi_addsub is not a valid primitive. Please check!
    Error: Module pmi_distributed_spram_16s_4s_11s_noreg_none_binary_XP_pmi_distributed_spram_Z1 is not a valid primitive. Please check!
    Error: Module pmi_distributed_dpram_32s_5s_8s_noreg_none_binary_XP_pmi_distributed_dpram_Z4 is not a valid primitive. Please check!
    Error: Module pmi_distributed_dpram_32s_5s_8s_noreg_none_binary_XP_pmi_distributed_dpram_Z4 is not a valid primitive. Please check!
    Error: Module pmi_rom_512s_9s_18s_noreg_disable_async_prom_init_hex_hex_XP_pmi_rom_Z2 is not a valid primitive. Please check!
    

    So it seems to be somewhat primitive-locked onto the MachXO (or XP?) devices, and someone inside Lattice could remap those primitives in seconds, one would think ?
  • KeithEKeithE Posts: 883
    edited April 11 Vote Up0Vote Down
    Right that's the problem - Xilinx does that as well. It's not really the high-level source.

    Here are the RISC V extensions:
    M - integer multiplication and division
    A - atomic
    F - single-precision floating point
    D - double-precision floating point
    Q - quad-precision floating point
    
    L - decimal floating point
    C - compressed instructions
    V - vector operations
    B - bit manipulation
    T - transactional memory
    P - packed-SIMD instructions
    N - ???typo or ???
    
  • KeithE,

    Thanks for looking over my Quartus warnings.
    These are because you are sending floating wires to RND_OUT here (Instantiation
    of pll_sys is commented out, perhaps because you didn't meet timing?):
    Actually it runs at 100MHz just fine. I commented out the pll whilst hacking on the test bench. I'll put it back again. I could not make it go any faster than 100MHz.
    These are because you declared these are regs, but never initialized them. They might as well be wires with zeros assigned.
    If that gets rid of the warnings that will be great. The only one of those signals I'm going to use is irq, eventually.

  • jmg,

    Yes, the subleq is more an academic curiosity. When I first heard of such things it was something of an eye opener that a Turing complete machine could be made so easily.

    Did you know that you can compile C code down to nothing but MOV instructions for x86? Intel's MOV can do all the arithmetic and flow control you need. Amazing.

    subleq code size could be huge. Every instruction is three words which gets a bit out of hand if you want a 32 or 64 bit machine!

    I was wondering about compressing the subleq code? What if the operands were encoded into
    variable length byte streams? Something like UTF-8 encoding.
    The Lattice Mico8 is quite compact, and well suited to FPGA
    I wonder about these things. Altera has it's NIOS, Xilinx has it's MicroBlaze. I presume that these are closed IP blocks that only work on the vendors FPGA's. As such I'm not much interested.


  • AribaAriba Posts: 2,104
    edited April 12 Vote Up0Vote Down
    jmg wrote: »
    ....
    Of more practical use are compact cores that also give efficient code.

    The Lattice Mico8 is quite compact, and well suited to FPGA, but when I asked, Lattice (quite strangely) said they have no plans to offer a Mico8 example for iCE40 ?!
    I'd have thought the 128kBytes in the new ice40up5ksg48 made such a port a no-brainer.

    The Mico8 Toolchain is tightly integrated into Diamond, Lattices FPGA IDE and Synthesis Tool. But Diamond does not support the ICE40 Family. So it would be quite an effort for Lattice to integrate Mico8/32 Tools into ICEcube.

    Further: Mico8 uses 18bit wide instructions because the native Lattice FPGAs have all 9bit wide Block memories. But ICE40 has only 8bit wide memories, and the big singleport Ram is 16bit wide. It would be quite inefficient to pack 18bit Instructions into the 16bit Ram.

    Lattice works on its own RISC-V version, maybe that willl also be available for ICE40 Ultra, the memory width would fit perfectly. Maybe you can ask them about the RISC-V IP, here are some Infos about it: https://riscv.org/wp-content/uploads/2016/07/Wed0900_SoftwareProgrammableFPGAIoTPlatform.pdf (page 23+)

    Andy
  • Interesting. They want to use a RISC V core and allow for hardware design of accelerators to be done in C. No more Verilog or VHDL.

    Typical, I start to learn something and before I'm done it's obsolete!
  • Xoro now running at 100MHz. UART transmitting "Hello world!" continuously at 115200 baud.

    I had to make uartTX ouput two stop bits instead of one to get the terminal in SimpleIDE to sync on to this.

    The Rigol DS1054 Z is doing a very poor job of decoding this. Perhaps because I have a crappy flying lead in the path and the signal shows a lot of huge inductive spikes on the edges.
  • Heater. wrote: »
    Xoro now running at 100MHz. UART transmitting "Hello world!" continuously at 115200 baud.

    I had to make uartTX ouput two stop bits instead of one to get the terminal in SimpleIDE to sync on to this.

    The Rigol DS1054 Z is doing a very poor job of decoding this. Perhaps because I have a crappy flying lead in the path and the signal shows a lot of huge inductive spikes on the edges.

    Use a series resistor to help filter out those spikes, say 100k or more. Mostly though this is a problem of a poorly grounded probe but then again it's not always easy to find a ground to clip onto.

    (btw, I paid for my Rigol decoders but I know there is that other option....)

    btw again, I'm following this thread with interest.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    Tachyon Forth News Blog
    TACHYON DEMONSTRATOR
    Brisbane, Australia
  • I do have a ground connection, but it's all done via long dangly bits of wire. I'll try and neaten it up a bit. And the resistor, thanks.

    Anyway, the Prop Plug likes my serial output so we are good to go.

    Just been working on a SPI interface for the ADC on the DE-0 nano. I guess I could borrow a ready made one from somewhere but I want to do things myself.

Sign In or Register to comment.