Forum Update - Announcement about May 10th, 2018 update and your password.

RISC V ?

1141516171820»

Comments

  • That's exactly what my suggestion is.

    A currently unused bit in the ADD, SUB, etc, instruction encoding can steer the destination to one of 32 thirty-two bit output ports rather than a register.

    Similarly another two bits could cause reads from one of 32 thirty-two bit input ports rather than registers.

    That is direct instruction operation on ports without going through memory space with loads and stores. As in the Propeller.

    Setting a pin would be:
    or io1, io1, r1          // Assuming r1 has a bit set for a pin
    
    Clearing a pin is:
    and io1, io1, r1        // Where r1 has the inverse pattern.
    

    Not sure I see the need for instructions that work on single pins if we can do the above.

    Sadly I did not get very far with my XMOS dev boards, after some initial tinkering. Clocked I/O and such gets us into a whole other world...




  • I think it's better to put the pin registers in CSRs rather than memory mapping them. There are already standard instructions to do AND and OR on CSRs directly (CSRRC and CSRRS).

  • So they do. Cool. I have hardly looked at the CSR stuff yet.

    Certainly we want to get away from memory mapping them.
  • AleAle Posts: 2,322
    There is space for 4096 CSRs but some of them are already defined. The advantage of a RISCV core for the P3 would be a proven design without tiered memory with an established working ecosystem of tools. OTOH, one would need quite a bit of extensions to support what the P2 does in terms of assembly language.
    Ataradov's RISCV doesn't implement CSR support. It seems easy to add, I just wonder if such a proto-P3 is the way to go now. I like the idea already :), but I think we would end in something like the epiphany core: it needs some serious infrastructure to get it going, something the P1 doesn't and the P2 probably would not need either.
  • jmgjmg Posts: 11,259
    Ale wrote: »
    There is space for 4096 CSRs but some of them are already defined. The advantage of a RISCV core for the P3 would be a proven design without tiered memory with an established working ecosystem of tools. OTOH, one would need quite a bit of extensions to support what the P2 does in terms of assembly language.
    That depends if you want to mix RISCV and P3 cores ?
    If you have some COGs there, RISCV needs only be able to support some of the WAIT opcodes.
    Or, you could choose to split RISCV into a large-memory version, and smaller memory versions with lockable caches that become new cogs... That would need more opcode work.

  • jmg wrote: »
    Ale wrote: »
    There is space for 4096 CSRs but some of them are already defined. The advantage of a RISCV core for the P3 would be a proven design without tiered memory with an established working ecosystem of tools. OTOH, one would need quite a bit of extensions to support what the P2 does in terms of assembly language.
    That depends if you want to mix RISCV and P3 cores ?
    If you have some COGs there, RISCV needs only be able to support some of the WAIT opcodes.

    I think the idea is that the P3 COGS would themselves be RISCV cores, that is they would use the Risc-V instruction set (suitably extended, of course). That way all the Risc-V tools (assembler, linker, languages like C, C++, Java, Rust, etc.) would be available right from the start. We'd want to enhance those tools to add the P3 specific features (like pin control, waiting, and so forth) but that's a lot less work than starting a new instruction set from scratch.

    The P2 instruction set has a lot of redundancy, frankly, and most of the instructions aren't really needed. They're convenient for assembler programmers, but as memories get bigger and chips get faster less and less work is done directly in assembler. It would be interesting to factor out what's really key to the "Propeller" experience and add just those instructions / registers to a Risc-V.

    My take on doing a P1V type core with Risc-V ISA:

    1. OUTA, DIRA, INA, and all other pin control registers would be control/status registers (CSRs)
    2. Ditto for the rest of the P1 hardware registers
    3. Risc-V already has the equivalent of CNT, and a register holding the COG id, both also in CSRs.
    4. Risc-V already has a full set of math instructions (add, sub, shift, etc.). The ones P1 has beyond these can be synthesized from Risc-V ones, so they can be implemented as macros.
    5. coginit/cogstop could be done as subroutines that manipulate some internal registers.
    6. Equivalents of lockclr/locknew/lockret/lockset could be implemented with the Risc-V atomic read/modify/write instructions.
    7. movd/movs/movi would be left out (no self modifying code)
    8. muxc/muxz/etc. would also be left out (no C or Z bits in Risc-V); programmers would use and/or instead
    9. waitcnt/waitpeq/waitpne/waitvid would become custom instructions
    A. djnz/tjnz could be left out initially (Risc-V has a compare and branch that does most of these). There is a proposal apparently for DSP extensions to Risc-V that includes a zero overhead looping construct; we could consider including that extension

    The overall architecture would have 8 Risc-V cores, acting very much like COGs. Each core would have a 2K (or so) instruction cache, a 2K (or so) scratchpad RAM that it has exclusive access to, and slower access to a shared (HUB) memory. As in P1 COGs, there would be no interrupts.

    Have I missed anything? I don't think any of that is particularly outlandish, nor would there be much work required on the Risc-V tool side (just adding the few new custom instructions for waitxxx). Heck, we could even implement the wait instructions implicitly as writes to special CSRs, which would mean the standard Risc-V toolchain could be used for everything.

    P2 has a few new features. Frankly only the interrupts and smartpins represent a real change to the core abilities, and from the core perspective the smartpins are just exposed as additional registers to manipulate.
  • Heater.Heater. Posts: 20,698
    The original idea in my opening this thread was to drop a RISC V core into a Propeller design as a good way to get to use existing C/++ and other language compilers and tools on the Propeller. The RISC V core would run the big application C code, the COGs would do what they do for real-time, deterministic work.

    This was just a playful idea with no expectation of t happening.

    As the months go by, it looks like a more attractive idea to use the RISC V instruction set for all the COGs. No "master" RISC V core, all COGs equal, like the current prop.

    I guess this is really never going to happen. Chip has just spent 10 years refining the P2 instruction set. I cannot imagine he would want to throw it all away.

    As Eric points out the RISC V instruction set would need some extension to handle some Propeller features. That is OK. RISC V is designed to be extensible/customized.

    And as Eric says, high level languages will never used most of the P2 instructions.

    As a practical matter. Which is faster? My picorv32 on a DE0-nano runs at about 25MIPS. There are other RISC V designs that do better, at the cost of timing determinism. What is the speed, instructions per second, of a P2 on a DE0-nano?


  • AleAle Posts: 2,322
    I thought P2 images run at a max 80 MHz with 2 cycles per instruction meaning up to 40 MIPS.
  • Ale wrote: »
    I thought P2 images run at a max 80 MHz with 2 cycles per instruction meaning up to 40 MIPS.
    That's correct.
    And for a brief moment the FPGA's ran P2 @ 120MHz with 60MIPS. :cool:
    Melbourne, Australia
  • Ale wrote: »
    I thought P2 images run at a max 80 MHz with 2 cycles per instruction meaning up to 40 MIPS.

    Of course it's hard to compare "instructions per second". Sometimes Risc-V instructions do more (like "add A,B,C" is one instruction in Risc-V, two in P2). Sometimes P2 instructions do more (some of the P2 bit manipulation instructions could replace 2 or 3 Risc-V instructions).

    The real test would be getting fftbench or something similar running on both setups.
  • jmgjmg Posts: 11,259
    edited May 2 Vote Up0Vote Down
    ersmith wrote: »
    The real test would be getting fftbench or something similar running on both setups.

    yes, and also the FPGA-LUT usage of each core needs to be defined.
    I see this topic for RISC-V "Fast Interrupts for RISC-V", which means something like the P2 ROM could be ported to RISC-V as another benchmark.


  • Heater.Heater. Posts: 20,698
    Why would a P2 like RISC V machine need LUT?

    What about the P2 MIPS when running from HUB RAM?
  • jmgjmg Posts: 11,259
    Heater. wrote: »
    Why would a P2 like RISC V machine need LUT?
    :) I was meaning the FPGA logic fabric usage, not the P2 LUT RAM, so edited my post a little...

  • Heater.Heater. Posts: 20,698
    Ah, I see.

    I forget exactly the LUT usage of the picorv32 I was playing with last year but it was less than 10% of a DE0-nano. I remember thinking 8 of them should fit in there.
  • jmgjmg Posts: 11,259
    edited May 3 Vote Up0Vote Down
    Some interesting Embedded CPU features mentioned here, in a new MIPs MPU - and what they did to move toward hard real time...

    http://www.eejournal.com/article/mips-i7200-breaks-the-chain/

    relevant bits
    "Like its more conventional brethren, the I7200 does multithreading, which has become a MIPS hallmark. The CPU core can handle up to nine threads and switch between threads with zero overhead. Snippets of code can also be preloaded and “parked,” ready for instant deployment in the case of an interrupt handler or a high-priority task. This feature, combined with new scratchpad RAMs that bypass the cache, is designed to make the I7200 more deterministic – another important feature for exotic 5G or LTE Advanced modems."
    The new MIPS I7200 marks the debut of a brand-new instruction set called nanoMIPS, and it’s – gasp! – variable-length. The new nanoMIPS ISA is about 12% smaller than ARM’s Thumb2, and a good 15% to 20% smaller than MIPS16e,

    Addit: The venerable 8051 is a variable length opcode, of 8,16,24 bits, and some of the very newest 8051 designs execute all of those in 1 SysCLK. Most Microcontrollers have fast local RAM.
  • Cluso99Cluso99 Posts: 13,619
    jmg,
    You beat me to it. Was just about to post the link ;)
  • Heater.Heater. Posts: 20,698
    I almost stopped reading the article at "The High Sparrow and Lord Protector of RISC canon, MIPS Technologies, has decided that the RISC code is more what you’d call guidelines than actual rules."

    Since when did MIPS "own" the RISC idea? They just followed what Patterson and co. described at the time. As did SPARK, ARM, IBM's POWER, Intel's i860 etc, etc.

    That's not to say they don't have some neat ideas there to increase timing determinism and reduce latency, like bypassing the cache and so on.

    The big idea there seems to be that of a compressed instruction set. As per ARM Thumb. Which is good, saves memory and cache space and memory bandwidth.

    Except they are making the compressed ISA compulsory unlike the optional ARM Thumb.

    Is it better than the compressed ISA spec. of ARM or RISC V?

    Anyway, it's all useless until there is support for the ISA from GCC or Clang/LLVM. It's certainly not something open for use in a future Propeller.
  • jmgjmg Posts: 11,259
    Heater. wrote: »
    I almost stopped reading the article at "The High Sparrow and Lord Protector of RISC canon, MIPS Technologies, has decided that the RISC code is more what you’d call guidelines than actual rules."

    Well, yes the prose was of a certain style...
    Heater. wrote: »
    That's not to say they don't have some neat ideas there to increase timing determinism and reduce latency, like bypassing the cache and so on.
    The big idea there seems to be that of a compressed instruction set. As per ARM Thumb. Which is good, saves memory and cache space and memory bandwidth.
    Except they are making the compressed ISA compulsory unlike the optional ARM Thumb.
    Is it better than the compressed ISA spec. of ARM or RISC V?

    Some numbers are quoted, if you wade thru the prose far enough.
    Locking or bypass of cache has been done before, and local RAM is also seen in microcontrollers.
    Variable length opcodes have been around since CISC , but it is interesting they focused on a optimal merge of features, over backward compatible & put all of this in a MPU level part.

    From the sounds of it, they have a large customer which lets you do that type of design. (result is smaller silicon)

    It would be interesting to see real devices, if they ever appear on the open market.
    Tools will come, as they always do..

    Imagine if Microchip released a 2GHz version of this, in a Raspi form factor ?
  • Heater.Heater. Posts: 20,698
    I cannot see this as an MPU level part. It can run Linux for goodness sake!

    My prediction is we will never see them available on the open market.

    Either way, it's useless, it's not an instruction set open for use. No more use than x86 or ARM. It's not something Parallax could consider adding to a future Propeller.

  • jmgjmg Posts: 11,259
    Heater. wrote: »
    I cannot see this as an MPU level part. It can run Linux for goodness sake!
    ?
    Err, yes, I did say MPU not MCU ?
    This is what most understand by MPU, and yes, running Linux is quite common.


  • Heater.Heater. Posts: 20,698
    jmg,

    Sorry, yes, I'm getting my wires crossed. My synapses are collapsing after 5 days of hacking C++ 20 hours per day.

    Mind you the lines between micro-processor, micro-controller, system on a chip, and such are pretty blurry now a days. What with being able to cram so much into so small chips.

    It would be interesting to see a comparison of the code sizes of the new MIPS instruction set vs the compressed RISC V ISA. Which beats out even ARM Thumb.

    The rest of those new MIPS features could be implemented in a RISC V core of course.


Sign In or Register to comment.