Shop OBEX P1 Docs P2 Docs Learn Events
RISC V ? - Page 19 — Parallax Forums

RISC V ?

11719212223

Comments

  • Heater.Heater. Posts: 21,230
    Eric,
    Wow, sounds impressive! I hope I can get a chance to take a look at that soon.
    Probably not so impressive in the end. I had a brain storm over the Easter holiday. Looking over the The RISC-V Instruction Set Manual I started to scratch my head badly over those crazy immediate encodings and wondering how they would look in hardware. Then I thought I'd see how they look in Verilog and what kind of circuit Quartus built out of it. Then I thought heck let's try a little SpinalHDL experiment...

    Then I disappeared from the planet for 4 days, non-stop fighting with Scala/Spinal syntax/semantics, getting the InteliJ IDE to work nicely (I thought IDE's were supposed to make life easier), decoding the RISC V document, etc.

    When I came up for air I seemed to have an almost complete RISC V in Spinal!

    It's got a way to go. For example so far it has separate instruction and data memories. As per the datapath block diagram of the Sodor RISC V 1-Stage here: https://passlab.github.io/CSE564/notes/lecture08_RISCV_Impl.pdf

    I don't yet have the BranchCondition block and some fiddly control path stuff is missing.

    Anyway. About those immediates. I was mightily confused when reading the "Preface to Version 2.0" of the manual. It states that JAL is now a U-Type instruction and that the J-Type format was removed. But lo, JAL is still a J-Type in version 2.2. I started to feel I am walking on thin ice.

    Perhaps you could do me a favour. The following is my Spinal code for decoding all the immediate types. I think it's easy enough to follow even if you have never seen Scala or Spinal before. Do my immediate decoders look like what you have working in riscvemu?
    class JumpRegTargetGen extends Component {
      val io = new Bundle {
        val iTypeImmediate = in SInt (32 bits)
        val rs1 = in SInt (32 bits)
        val jalr = out SInt (32 bits)
      }
      io.jalr := io.iTypeImmediate + io.rs1
    }
    
    class BranchTargetGen extends Component {
      val io = new Bundle {
        val instruction = in UInt(32 bits)
        val pc = in SInt(32 bits)
        val branch = out SInt(32 bits)
      }
      val branchSignExtension = Bits(20 bits)
      branchSignExtension.setAllTo(io.instruction(31))
      val bTypeImmediate =  S(branchSignExtension ## io.instruction(7) ## io.instruction(30 downto 25) ## io.instruction(11 downto 8) ## B"0")
      io.branch := io.pc + bTypeImmediate
    }
    
    class JumpTargetGen extends Component {
      val io = new Bundle {
        val instruction = in UInt(32 bits)
        val pc = in SInt(32 bits)
        val jump = out SInt(32 bits)
      }
      val jSignExtension = Bits(12 bits)
      jSignExtension.setAllTo(io.instruction(31))
      val jTypeImmediate = S(jSignExtension ## io.instruction(19 downto 12) ## io.instruction(20) ## io.instruction(30 downto 21) ## B"0")
      io.jump := io.pc + jTypeImmediate
    }
    
    class ITypeSignExtend extends Component {
      val io = new Bundle {
        val instruction = in UInt(32 bits)
        val iTypeImmediate = out SInt(32 bits)
      }
      val iSignExtension = Bits(21 bits)
      iSignExtension.setAllTo(io.instruction(31))
      io.iTypeImmediate := S(iSignExtension ## io.instruction(30 downto 20))
    }
    
    class STypeSignExtend extends Component {
      val io = new Bundle {
        val instruction = in UInt(32 bits)
        val sTypeImmediate = out SInt (32 bits)
      }
      val sSignExtension = Bits(21 bits)
      sSignExtension.setAllTo(io.instruction(31))
      io.sTypeImmediate := S(sSignExtension ## io.instruction(30 downto 25) ## io.instruction(11 downto 7))
    }
    
    class UType extends Component {
      val io = new Bundle {
        val instruction = in UInt(32 bits)
        val uTypeImmediate = out SInt (32 bits)
      }
      io.uTypeImmediate := S(io.instruction(31 downto 12) ## B(0, 12 bits))
    }
    


  • Heater.Heater. Posts: 21,230
    By the way, SpinalHDL is pretty amazing. Consider the following:
      def lookupControlSignals(s: String ): UInt = {
        val (insValid, branchType, op1Sel, op2Sel, aluFun, wbSel, rfWen, memEnable, memWr, memMask, csrCmd) = controlSignals(s)
        io.aluFun := aluFun.asBits
        io.branchType := branchType
        io.op1Sel := op1Sel.asBits
        io.op2Sel := op2Sel.asBits
        io.wbSel := wbSel.asBits
        io.rfWen := rfWen
        io.memVal := memEnable
        io.memRw := memWr
        return 0
      }
    
    That is a function that takes a string and uses it as the key to look up something in a Scala Map object (controlSignals). That something is a tuple containing a bunch of variables (insValid, BranchType, etc) Those variables get assigned to our signals, io.aluFun, io.branchType, etc.

    That's odd, this gets converted to Verilog. Verilog has no Map or Tuple and such. Never mind String handling!

    No, this is a "generator". When your SpinalHDL code is run it generates a bunch of Verilog at each call site. For example, this:
          switch(funct3x) {
            is (opFunct3x.add.asBits) {
              lookupControlSignals("ADD")
            }
            is (opFunct3x.sub.asBits) {
              lookupControlSignals("SUB")
            }
            ...
          }
    
    Generates this Verilog:
            if((funct3x == `opFunct3x_staticEncoding_add)) begin
                io_aluFun = `ALU_binary_sequancial_ADD;
                io_pcSel = (3'b000);
                io_op1Sel = `OP1_binary_sequancial_RS1;
                io_op2Sel = `OP2_binary_sequancial_RS2;
                io_wbSel = `WB_binary_sequancial_ALU_1;
                io_rfWen = 1'b1;
                io_memVal = 1'b0;
                io_memRw = 1'b0;
            end else if((funct3x == `opFunct3x_staticEncoding_sub)) begin
                io_aluFun = `ALU_binary_sequancial_SUB;
                io_pcSel = (3'b000);
                io_op1Sel = `OP1_binary_sequancial_RS1;
                io_op2Sel = `OP2_binary_sequancial_RS2;
                io_wbSel = `WB_binary_sequancial_ALU_1;
                io_rfWen = 1'b1;
                io_memVal = 1'b0;
                io_memRw = 1'b0;
            end else if((funct3x == `opFunct3x_staticEncoding_sll_1)) begin
            ...
    

    Cool hey?


  • Cluso99Cluso99 Posts: 18,066
    _binary_sequancial_ ???
  • Heater.Heater. Posts: 21,230
    Ha! Hadn't noticed that. That is not my spelling that is generated by SpinalHDL.

    SpinalHDL is created by Charles Papon. Who is French I believe. There are a few nice Frenchisms scattered around his code and documentation. That's OK, Spinal is a brilliant piece of work.

    By the way ALU_binary_sequancial_SUB is just a Verilog definition of some bit pattern:
    `define ALU_binary_sequancial_SUB 4'b0001
    
    The "sequancial" part comes about because it's part of a Spinal enumeration (Think C enum) Like so:
    object opFunct3x extends SpinalEnum {
      val add, sub, sll, slt, sltu, xor, srl, sra, or, and = newElement()
      defaultEncoding = SpinalEnumEncoding("staticEncoding")(
        add  -> Integer.parseInt("0000", 2),
        sub  -> Integer.parseInt("1000", 2),
        sll  -> Integer.parseInt("0001", 2),
        slt  -> Integer.parseInt("0010", 2),
        sltu -> Integer.parseInt("0011", 2),
        xor  -> Integer.parseInt("0100", 2),
        srl  -> Integer.parseInt("0101", 2),
        sra  -> Integer.parseInt("1101", 2),
        or   -> Integer.parseInt("0110", 2),
        and  -> Integer.parseInt("0111", 2)
      )
    }
    
    Well, except they are not sequential in this case because I have overridden the default enum encoding.
  • Heater.Heater. Posts: 21,230
    SpinalHDL : An alternative hardware description language



    The RISC-V guys have their own alternative to Verilog/VHDL, Chisel, also written in Scala.
  • Heater. wrote: »
    Anyway. About those immediates. I was mightily confused when reading the "Preface to Version 2.0" of the manual. It states that JAL is now a U-Type instruction and that the J-Type format was removed. But lo, JAL is still a J-Type in version 2.2. I started to feel I am walking on thin ice.
    I think the change in the 2.0 instruction set was that an old J instruction (jump with no register offset) was removed, along with its corresponding immediate encoding. Now to do a jump you do JAL with register offset r0.

    JAL has a "UJ-Type" immediate. Like the U type it has the immediate in the upper 20 bits of the instruction, but unlike U the bits don't go straight through to the output, they need shuffling; there's some discussion of this in section 2.3 of the spec ("Immediate Encoding Variants").
    class JumpRegTargetGen extends Component {
      val io = new Bundle {
        val iTypeImmediate = in SInt (32 bits)
        val rs1 = in SInt (32 bits)
        val jalr = out SInt (32 bits)
      }
      io.jalr := io.iTypeImmediate + io.rs1
    }
    
    This doesn't look right to me. JALR has a 12 bit offset, not 32 bits (bits 20:31 of the opcode are the signed offset added to the register).

    The other immediate forms look OK, at least to my limited understanding of Scala/Spinal.

    Regards,
    Eric
  • Heater.Heater. Posts: 21,230
    Grrr... this is giving me headache.

    From the manual:

    "The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target
    address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the
    least-significant bit of the result to zero"


    The I-Type immediate is just the top 12 bits of the instruction, with the sign in bit 31.

    So my I-Type Immediate above is wrong and could be as simple as this:
    class ITypeSignExtend extends Component {
      val io = new Bundle {
        val instruction = in UInt(32 bits)
        val iTypeImmediate = out SInt(32 bits)
      }
      io.iTypeImmediate := S(io.instruction) >> U(20)
    }
    
    Given the I-Type is now correct then my JALR target is wrong, and should be:
    class JumpRegTargetGen extends Component {
      val io = new Bundle {
        val iTypeImmediate = in SInt (32 bits)
        val rs1 = in SInt (32 bits)
        val jalr = out SInt (32 bits)
      }
      io.jalr := (io.iTypeImmediate + io.rs1) & ~1
    }
    
    Assuming I connect up all the io ports correctly.

    Now, you might be wondering: Why put such simple operations into a class of their own and make it all so verbose?

    Well, the plan is to have one component for every box in the Sodor 1-Stage datapath diagram. And name on the signals connecting the boxes similarly to that diagram. Consider all that verbosity as comments. After all, my code has very few actual comments!

    The only exception is the "+4" box which is current just "pc4 := pc + 4" thrown in somewhere.

  • Heater.Heater. Posts: 21,230
    Eric,

    The sodor-spinal RISC V is almost complete and now has installation and run instructions.

    I'm continuing this thread here:
    https://forums.parallax.com/discussion/168355/sodor-spinal-a-risc-v-core-in-spinalhdl

    Saves cluttering up the P2 section all the time.
  • Heater.Heater. Posts: 21,230
    edited 2018-04-26 08:16
    ersmith,

    Does your RISC V emulator support interrupts?

    I ask because after much feretting around I can't for the life of me see how my simple RISC V HDL should handle interrupts.

    For example: A subroutine call in RISC V is done with JAL or JALR instructions. These set the program counter to the subroutine address and store the return address in a register. The ABI specifies using register x1 (a.k.a "ra") to hole the return address. There is no stack in use at the instruction level.

    As far as I can make out exceptions, traps and interrupts are implemented effectively forcing a JAL into the machine.

    Well that means x1 gets clobbered and the interrupted code loses any return address it had in there!

    One could use some other register for the interrupt's return address, but that could well clobber something else.

    The only explanation of interrupt handling I have found is here https://github.com/riscv/riscv-asm-manual/blob/master/riscv-asm.md but the example does not seem to handle any of this and infact never returns from the interrupt.

    And it uses the mysterious "WFI" instruction. WTF? Seems to get encoded as a LUI R0 with some weird immediate value by the assembler.

    Any ideas ?

    Or perhaps I should just forget about interrupts and just have multiple RISC V cores. Propeller style :)

    Edit: Peeking at UCB's Sodor RISC V https://github.com/ucb-bar/riscv-sodor I see that exceptions just stuff a new value into the PC. There seems to be no mechanism for ever returning from an interrupt, the return address is lost.

    Other edit: Hmm... in UCB Sodor the magic is in the statement "csr.io.pc := pc_reg". Sneaky, they are stashing the return address of exceptions into a Control and Status Register!





  • Heater. wrote: »
    Does your RISC V emulator support interrupts?
    No, I skipped that whole mess. I really like the way Prop1 handles interrupts :).

    Have you looked at the "Draft Privileged ISA Specification" (I think it's available from riscv.org)? It appears that there is an MRET instruction to return from interrupt, which uses register MEPC (which I guess is where the return address is placed, rather than x1).

  • Heater.Heater. Posts: 21,230
    edited 2018-04-26 12:25
    I did have a very brief scan of the "Draft Privileged ISA Specification". There is also a spec for interrupts which runs away with specifying a complicated Platform Level Interrupt Controller (PLIC) which is suitable for multi-core and privileged machines.

    Thing is, in the interests of minimalism (not to mention ignorance and stupidity), I don't want to include anything in my Sodor that I don't need or don't understand. I also don't want to spend all year working it out.

    The UCB Sodor design does not implement an MRET, neither does picorv32 if I remember, but they can both handle exceptions and interrupts. Perhaps it's time to look into what the picorv32 does with interrupts.

  • Heater.Heater. Posts: 21,230
    edited 2018-04-29 07:36
    BINGO! Found it.

    SiFive has a nice "RISC V 101" document here:
    https://cdn2.hubspot.net/hubfs/3020607/SiFive - RISCV 101 (1).pdf?t=1508537822079

    On page 35 "Interrupt Handler – Entry and Exit" we see that on interrupt the current state is saved:

    PC => MEPC
    Priv => MPP
    MIE => MPIE

    And their is pseudo code for an interrupt handler:
    Push Registers
    …
    async_irq = mcause.msb
    if async_irq
        branch async_handler[mcause.code]
    else
        branch synch_handler[mcause.code]
    …
    Pop Registers
    MRET
    

    Looks like sodor-spinal will be acquiring some CSRs and an MRET instruction.

    What I really want is some Propeller like INA, OUTA, DIRA instructions and so on.




  • Heater. wrote: »
    What I really want is some Propeller like INA, OUTA, DIRA instructions and so on.

    I handled INA, OUTA, DIRA, etc. as CSRs in the emulator (in fact all the COG memory was mapped into CSRs). waitcnt was also handled by writing a value to a special CSR which caused the emulator to wait until the specified cycle. I never got around to handling waitpne, waitpeq, etc. but I suppose they could also use a special CSR. On the other hand perhaps those and waitcnt should be custom instructions, I'm not sure that writing to CSR registers should have such drastic side effects. On the third hand, we don't have to be purists :). I guess whatever works.

    Wouldn't it be nice if Prop3 used the Risc-V ISA (with some appropriate extension)? It would save a lot of work on tools.
  • ersmith wrote: »
    Wouldn't it be nice if Prop3 used the Risc-V ISA (with some appropriate extension)? It would save a lot of work on tools.
    Agreed!

  • jmgjmg Posts: 15,140
    ersmith wrote: »
    Wouldn't it be nice if Prop3 used the Risc-V ISA (with some appropriate extension)? It would save a lot of work on tools.
    Yes, add smart pins, and WAITxx type opcodes, and enough local fast memory to remove jitter from the cores, and it looks interesting.
    If the P2 PAD ring proves solid from the FAB, that has very nice analog features missing from many RISC-V designs, and the inner bulk is 'just verilog'....
    This would need better XIP support than P2 currently has, but the memory choices for that are expanding.

  • Heater.Heater. Posts: 21,230
    ersmith,
    Wouldn't it be nice if Prop3 used the Risc-V ISA (with some appropriate extension)? It would save a lot of work on tools.
    Oh boy, that is kind of where I started with this thread. Although I did not say it so directly.
    ...we don't have to be purists...
    No, we don't. This is the Parallax forum. Aren't we supposed look at things sideways, from different angles... then hack something that works?!

    On the other hand, such Propeller like extensions to RISC V should not end up conflicting with whatever the compilers generate.

    From time to time I have been pondering what it would take to make a Propeller-like RISC V machine, for example that dictates:

    1) Multiple cores. Of course.

    2) Tight coupling between CPU core and I/O.

    3) Deterministic timing.

    4) Ability to wait on pins and counters etc.

    5) Shared memory between cores.

    6) Whatever else I have not thought of.

    In order to achieve 1), 3) and 5) I think we need memory that is local to a RISC V core that can be accessed deterministically, like COG space. Then we need memory that is shared between cores like HUB RAM. Perhaps with a rotating HUB mechanism like the Prop.

    I have not looked into the suggested methods of adding extensions to RISC V so hard but it seems there are many ways to achieve 2) and 4). Write I/O to status/control registers, or use some of the opcode space that is reserved for custom extensions.

    As for 6), what have I not thought of?

    Now, how does a RISC V compare to a Propeller in terms of raw performance? I have the picorv32 running on a DE0-nano at 100MHz which translates to 25MIPS. It takes about 10% of the LUTS. What speed is a P2 running on a DE0-nano?

    With all that in place we need Chip's magic sauce around it, the smart pins, analogie stuff etc.

  • jmgjmg Posts: 15,140
    Heater. wrote: »
    ....
    In order to achieve 1), 3) and 5) I think we need memory that is local to a RISC V core that can be accessed deterministically, like COG space. Then we need memory that is shared between cores like HUB RAM. Perhaps with a rotating HUB mechanism like the Prop.
    ....
    With all that in place we need Chip's magic sauce around it, the smart pins, analogie stuff etc.

    Looking at the newest Nuvoton 192MHz parts, I see peripheral support like this

    • Communication Interface
    - Up to 6 low power UART interfaces (17 Mbps), including 2 LIN interfaces
    - Up to 3 ISO-7816 interfaces (3.4 MHz), supporting full duplex UART mode
    - Three I²C interfaces (Up to 3.4 Mbps)
    - One SPI Flash interface (Up to 96 MHz) supports quad mode
    - One Quad-SPI interface (Up to 96 MHz)
    - Up to 4 SPI/ I²S interfaces (SPI up to 96 MHz, I²S up to 192 kHz/16-bit)
    - One I²S interface (192 kHz/32-bit)
    - Two USCI interfaces, supporting configurable UART/ SPI/ I²C
    - Two Secure Digital Host Controllers (50 MHz)
    • Memory
    - 512 KB zero-wait state Flash memory
    - 160 RAM, including 32 KB external SPI Flash cache
    - 4 KB Secure Protection ROM
    - 2 KB One-Time-Programmable ROM

    No mention in the overview of DTR/DDR memory, but the general industry trends are clear here.

    If that 32 KB flash cache can be locked per core, to act like fast local memory, that should give close to 'the best of both worlds'...
    A part with many cores might choose less cache size, or even differing cache sizes.
  • Heater.Heater. Posts: 21,230
    edited 2018-04-27 01:40
    I'm old school. Never heard of Nuvoton.

    Sounds like they have an ARM surrounded by typical SoC peripherals. Perhaps, faster, smaller, better...

    I don't think the Propeller is competing in that market, it's saturated already.

  • jmgjmg Posts: 15,140
    Heater. wrote: »
    I'm old school. Never heard of Nuvoton.

    Sounds like they have an ARM surrounded by typical SoC peripherals. Perhaps, faster, smaller, better...

    I don't think the Propeller, is competing in that market, it's saturated already.
    I agree P2 does not compete directly, but RISC-V moves more in that direction, and the point I was making was core related, but the peripheral support, and the XIP support, with a specific SPI large buffer.


  • Heater.Heater. Posts: 21,230
    jmg,

    I see what you mean.

    Thing is, RISC V is not a thing. It's just an idea. An idea that has been formalized, specified, and agreed on my many. An instruction set without an implementation.

    Perhaps people take notice and adopt it, perhaps they don't. Perhaps it changes the world, like the Unix API, hence POSIX, BSD, Linux, WSL. Perhaps it does not.

    Certainly at the moment the momentum is at the low end. As it was for other subversive ideas like the personal computer, the C language, TCP/IP and so on.

    I'm not sure I understand your comments re: XIP and SPI buffers. As far as I can tell they are not supportive of real-time, deterministic processor behavior.




  • jmgjmg Posts: 15,140
    Heater. wrote: »
    I'm not sure I understand your comments re: XIP and SPI buffers. As far as I can tell they are not supportive of real-time, deterministic processor behavior.
    If you have a large SPI buffer, that can become a lockable cache, where the time critical code loads once, and runs from that fast local memory. = Now Very Deterministic.
    ie pretty much what you already said "I think we need memory that is local to a RISC V core that can be accessed deterministically, like COG space. "


    If you have very large code, that does not fit in the buffer, or on chip, then it can work like a more usual cache, loading programs subroutines at a time.

  • Heater.Heater. Posts: 21,230
    OK, I think I'm with you.

    As long as every core has it's own space. Such that one core thrashing a big lot of code does not impact the timing of the others.

  • AleAle Posts: 2,363
    There is a pipelined small RISCV core here: https://github.com/ataradov/riscv

    I moved part of my work on the picoRV32 to this one, at least from local memory works very well. It is a bit larger than the picoRV32 and instructions take different amounts of cycles. Somewhere I have a couple pictures from SignalTap to post.
  • Heater.Heater. Posts: 21,230
    Thank's, hadn't spotted that one before. RISC V's are growing like weeds!

    Did you check how the performance compares to picorv32?

    Just wondering how much pipelining helps.
  • AleAle Posts: 2,363
    Some opcodes take 1 cycle, some like memory read and write take 2 cycles (they take like 5 but because operations are overlapped... you know). It supports compressed opcodes too (mixed 16/32 bit opcodes).
    
    00000000 <__start>:
       0:	6b20006f          	jal	x0,6b2 <entry>
    000006b2 <entry>:
     6b2:	7159                	c.addi16sp	x2,-112
     6b4:	6785                	c.lui	x15,0x1
     6b6:	6705                	c.lui	x14,0x1
     6b8:	d686                	c.swsp	x1,108(x2)
     6ba:	d4a2                	c.swsp	x8,104(x2)
     6bc:	d2a6                	c.swsp	x9,100(x2)
     6be:	d0ca                	c.swsp	x18,96(x2)
     6c0:	cece                	c.swsp	x19,92(x2)
     6c2:	ccd2                	c.swsp	x20,88(x2)
     6c4:	cad6                	c.swsp	x21,84(x2)
     6c6:	c8da                	c.swsp	x22,80(x2)
     6c8:	c6de                	c.swsp	x23,76(x2)
    
    
    1695 x 218 - 23K
  • Heater.Heater. Posts: 21,230
    Yes, but, the idea of pipelining is that you can crank the clock rate up as there is less to do in each stage of the pipe. And get more MIPS.

    Question is, did you get more MIPS?


  • Heater. wrote: »
    From time to time I have been pondering what it would take to make a Propeller-like RISC V machine, for example that dictates:

    1) Multiple cores. Of course.

    2) Tight coupling between CPU core and I/O.

    3) Deterministic timing.

    4) Ability to wait on pins and counters etc.

    5) Shared memory between cores.

    6) Whatever else I have not thought of.

    In order to achieve 1), 3) and 5) I think we need memory that is local to a RISC V core that can be accessed deterministically, like COG space. Then we need memory that is shared between cores like HUB RAM. Perhaps with a rotating HUB mechanism like the Prop.
    I think if each RISC-V core has its own cache memory that will go a long way. Say each one has 2K instruction cache and 2K data cache. Make the cache replacement code simple and easy to figure out (e.g. just a plain direct mapped cache). Then if your code fits in 2K then it fits in the cache and runs out of there. Nice and simple.

    Data is a bit more of a pain since there's a cache synchronization issue. Maybe instead of data cache we give them each 2K of workspace RAM that only they can access? It should appear at the same address in all cores (so the same binary can run on any core).
    Now, how does a RISC V compare to a Propeller in terms of raw performance? I have the picorv32 running on a DE0-nano at 100MHz which translates to 25MIPS. It takes about 10% of the LUTS. What speed is a P2 running on a DE0-nano?
    Oh, I think a RISC-V is going to be able to be clocked faster than a Propeller, and also (generally) accomplish more in each instruction (since there are two source operands). picorv32 isn't pipelined at all; I think more sophisticated designs (like orca) can do much better. Orca is claiming ~1 dhrystone MIP per MHz, which at 80 MHz translates to 140k dhrystones/second. The best I've seen Prop1 do (with PropGCC) is about 7k dhrystones/second. Even if we multiply by 5 to account for LMM overhead we're still looking at Orca being 4x as fast at the same clock speed, and most likely it can be clocked higher than Prop1.
    With all that in place we need Chip's magic sauce around it, the smart pins, analogie stuff etc.

    Yep. That's what would make the Prop3 a "Propeller".
  • AleAle Posts: 2,363
    edited 2018-04-28 04:38
    Sorry I forgot to mention that it does 55 MHz on the MAX10-C7 I used as target (DE10-Lite board). So, it achieves 55 to 27.5 MIPS, I'd say. Maybe I should compile the dhrystone benchmark :)
  • Heater.Heater. Posts: 21,230
    edited 2018-04-28 06:50
    That's not bad.

    My picorv32 is clocked at 100MHz. But at 3 clocks for most things, 5 for load/store/branch and 6 for JALR it's only making about 25MIPS.

    Never mind dhrystone, the real question in Propeller Land is: How fast can you toggle a pin? :)

    As it stands I have a memory mapped I/O port. That requires a lot of work:

    1) Toggle some bit in a register - 3 clocks
    2) Store that register out to the I/O port - 5 clocks
    3) Loop 1) - 5 or 6 clocks

    Total: 14 clocks. Or 7.7MHz in my case.

    What if we did something crazy like allowing arithmetic operations directly on I/O registers? In the instructions for ADD, SUB etc there is a 7 bit function field (funct7) that is mostly unused, we could use 3 bits of that to indicate the operations source/destination registers were not the actual registers but I/O registers.

    so:
    xor x2, x2, x1               // x2 := x2 EXOR x1
    
    effectively becomes (for example)
    xor io2, io2, x1             // io2 := io2 EXOR x1
    
    Which starts to look like the Propellers way of doing I/O.

    Heck you could do port to port operations on different ports directly:
    add io1, io2, io3            // io1 := io2 + io3 
    
    Hmm... that saves us 5 clocks in the toggle loop and gets to 12.5 MHz. Not so much really but sort of in P1 territory clock for clock.

    Of course this is all nuts it you are using an FPGA. Why not just do your high speed bit wiggling in Verilog? But for a P3 chip ....



  • AleAle Posts: 2,363
    Maybe one could add some "extensions" that toggle, set and clear pins directly. What I really found useful on that other chip where the clocked ports and the timers. I haven't followed all the P2 discussion in detail but I think we got them now. The XMOS chip has special instructions for handling port input and output, and yes you could toggle at 100 MHz, I only used the original G4 devices that could only go to 400 MHz. My VGA drivers used clocked bit-bang.
Sign In or Register to comment.