P3 ideas

Title says it all :)

Just had a thought...

The condition codes are possibly decoded early in the pipeline. Could this shortcut the loading of S & D values, resulting in a 1 clock instruction?
We might want this action to be cog selectable, since otherwise we would lose determinism.
My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
Website: www.clusos.com
Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)

Comments

  • jmgjmg Posts: 13,092
    I think it is one clock to load and decode the opcode and then one clock to act on it.

    The 2 Clock opcodes in P2, fit the silicon quite well as the counters/timers/smart pins can all run ~ 2x faster than the core.
    Once you have a PLL, the exact clocks per opcode matter less, as the clock is internally generated.

    More widely useful in P3, would be a means to prescale the non-wait opcodes, so they draw less power but still run the peripherals at SysCLK.
    Right now, all COGS run at SysCLK, and there is minimal clock gating, and no scope to run COGs at differing speeds.

    Chip has briefly mentioned clock gating is going into P2r2, so that could get part way to solving this.
  • It would need a lot design work to adjust for that. For the Prop2, the whole pipeline is a tick tock arrangement. And there isn't any spare fetch slots, for cogRAM at least, for prefetching extra instructions. It could be opportunity based, but that adds even more indeterminacy.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • As for my suggestion for Prop3 features, I'll throw in the "alt" for everything idea: "Have a bit-field in all opcodes just for specifying if the ALU result goes to specified D or to next instruction's D input."
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • I still think some sort of code protection / obfuscation would really help the Propeller be seen as a more professional product. While this isn't nearly as big of a deal as it was 5 years ago, there's still a lot of R&D departments who wouldn't even consider using a chip without it. I also think simple JTAG support would really help attract volume customers. I know that Parallax's focus is on education but I'm sure they'd like to see some high volume demand too!
  • Well, if we're dreaming of P3 I'll throw out some wild ideas:

    (1) 64 bit registers. We have 64 pins, so it seems like 64 bit registers is a natural extension of that, and would somewhat uniquely position a hypothetical P3 as a 64-bit MCU.

    (2) RISC-V instruction set with Propeller specific augments. This would make getting tools way easier, and would probably be attractive to the educational market. RISC-V has a pretty big reserved space for user defined instructions, so all kinds of things could be put in there while keeping the core RISC-V instructions for compilers to use.
  • jmgjmg Posts: 13,092
    cheezus wrote: »
    ... I also think simple JTAG support would really help attract volume customers...
    Which aspects "would really help attract volume customers" - Do you mean for Board Testing, Debug, or programming ?
    I'm not sure about JTAG Bypass instructions (used for chain of multiple JTAG parts), but I'd expect P2 should be able to connect to JTAG board testers, with a small stub loaded ?

    Debug over JTAG is also possible, but would those volume customers really want to lose 4 smart pins for that ?

    Those volume customers likely have something along the lines of this
    https://www.xjtag.com/products/hardware/xjtag-expert-bst-oscilloscope/
    which means they can program P2 SPI/QSPI Flash quickly, and run serial links as well.
    TAQOZ may give enough resource to manage first-pass board testing ?
  • ersmith wrote: »
    (1) 64 bit registers. We have 64 pins, so it seems like 64 bit registers is a natural extension of that, and would somewhat uniquely position a hypothetical P3 as a 64-bit MCU.
    Ah, but we'll be wanting 128 pins by then! :P
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • Something that would be nice in a P3 is the ability to cache instructions when executing from the hub. When running a loop the streamer has to be reloaded when jumping back to the start of the loop. It would be nice for the chip to keep these instructions in a cache memory so it doesn't have to reload them.
  • How about a TLB for executing code from external memory?
  • cheezus wrote: »
    I still think some sort of code protection / obfuscation would really help the Propeller be seen as a more professional product. While this isn't nearly as big of a deal as it was 5 years ago, there's still a lot of R&D departments who wouldn't even consider using a chip without it. I also think simple JTAG support would really help attract volume customers. I know that Parallax's focus is on education but I'm sure they'd like to see some high volume demand too!

    I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?
  • Tubular wrote: »
    cheezus wrote: »
    I still think some sort of code protection / obfuscation would really help the Propeller be seen as a more professional product. While this isn't nearly as big of a deal as it was 5 years ago, there's still a lot of R&D departments who wouldn't even consider using a chip without it. I also think simple JTAG support would really help attract volume customers. I know that Parallax's focus is on education but I'm sure they'd like to see some high volume demand too!

    I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?

    Yes, I thought the same thing :wink:
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • jmgjmg Posts: 13,092
    Tubular wrote: »
    I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?
    Is that test logic genuine jtag, or something more custom, tuned for faster ASIC testing ?
  • RISC-V
    Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.

    P2+
    I would like a P2 with...

    * 1MB HUB memory (that's the easy maximum without instruction changes)

    * One COG with extra features
    We all usually have one big program and lots of little ones, so
    One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
    If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?

    * Smartpins
    A thought to using a mini-sized P1 style CPU instead of the existing smartpins
    Could it run double the normal P2 clock speed?
    Only base instructions like AND/OR/XOR/NOT/SHL/SHR/JMPRET/ADD/SUB... (no fancy instructions or MUL etc)
    Only 512B (128*32) cog-like memory, no HUB access

    * 90-110nm for 500MHz standard plus overclocking :smiley:
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • I wonder if the current DEBUG event system could be used to talk, or simulate JTAG?

    I really like the D redirect idea.

    Also agree with David. A means to manage external memory takes P3 to general CPU status. Really big programs will come with a process shrink.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • jmgjmg Posts: 13,092
    Cluso99 wrote: »
    * One COG with extra features
    We all usually have one big program and lots of little ones, so
    One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
    If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?

    * Smartpins
    A thought to using a mini-sized P1 style CPU instead of the existing smartpins
    Could it run double the normal P2 clock speed?
    Only base instructions like AND/OR/XOR/NOT/SHL/SHR/JMPRET/ADD/SUB... (no fancy instructions or MUL etc)
    Only 512B (128*32) cog-like memory, no HUB access

    Asymmetric COGs is a clear possible path.
    However, replace of smartpins makes little sense : they are a key P2 feature, and SW is never as fast as HW, and there are 64 smart pins to replace....
    - but what could be useful, would be a simple COG designed to service smart pins.

    As more Smart pin examples appear, it becomes clear just how powerful they are, and how set-and-forget applies in many cases.
    The service overhead is also often low-mips. Only in top-end Display or fast COMs are high data rates needed.

    A blanket 'no hub access' is also too inflexible, but there is merit in being able to yield a hub slot to another COG.
    Cluso99 wrote: »
    RISC-V
    Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.
    That will depend more on the process (which will set MHz and Memory numbers), and customer demand.
    'Too many players' is far from a problem, it means mature tools, and a good education base.

    When P3 is rolled, there may be ByteCode engines proven enough to compile into verilog ?

    More ROM has not been mentioned yet, but that is a relatively easy extension.
  • Cluso99 wrote: »
    RISC-V
    Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.

    So having broad support from the rest of the industry is a bad thing?

    But anyway, a Risc-V P3 would be a very unique Risc-V, with a bunch of its own instructions. So yeah, it would certainly have its own quirks, but the whole point would be to make them *good* quirks. And non-quirky code could be ported right over from other chips easily (in particular compilers will be very easy to obtain, at least basic compilers that don't need P3 specific instructions -- for those we would expect assembly to be used).
    * One COG with extra features
    We all usually have one big program and lots of little ones, so
    One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
    If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?

    Of course if P3 is 64 bits then we could have 64 bit instructions, which means D & S space could be a lot bigger. For example, 16 bits D, S1, and S2 (two sources per instruction, D just gets the result) with 16 bits left over for the instruction bits themselves. That's if Chip doesn't go the Risc-V route, which would have a small set of registers but a huge address space, some of which could be mapped to COG memory, some to HUB, and some to external RAM.

  • Exec from HyperRam or HyperFlash would be neat.
    Prop Info and Apps: http://www.rayslogic.com/
  • cheezuscheezus Posts: 126
    edited 2019-02-20 - 15:59:10
    jmg wrote: »
    Which aspects "would really help attract volume customers" - Do you mean for Board Testing, Debug, or programming ?
    I'm not sure about JTAG Bypass instructions (used for chain of multiple JTAG parts), but I'd expect P2 should be able to connect to JTAG board testers, with a small stub loaded ?

    Debug over JTAG is also possible, but would those volume customers really want to lose 4 smart pins for that ?

    Those volume customers likely have something along the lines of this
    https://www.xjtag.com/products/hardware/xjtag-expert-bst-oscilloscope/
    which means they can program P2 SPI/QSPI Flash quickly, and run serial links as well.
    TAQOZ may give enough resource to manage first-pass board testing ?

    I was mainly thinking Boundary Scan would be nice for board testing. I realize that the propeller could do this with a simple program, but if there's a short or open on the programming pins (or possibly other pins).. Bypass shouldn't be that hard to implement. My thought was to use 4 pins for TDI, TDO, CLK, MODE, connect the Reset line to one of the VIO group pins so these 4 pins could still be used. I'm not sure if this would work and there's probably plenty of reasons NOT to do this. Just a thought...

    *edit-
    The idea of an accessable JTAG port came from reading Chip's posts about the process. From my understanding the JTAG port is disabled (reset pulled high) when bonding the wafer to the package. I would imagine that Boundry Scan and Bypass are at LEAST implemented. I could be wrong though.
Sign In or Register to comment.