P3 ideas

Cluso99 · 2019-02-05 20:32

Title says it all

Just had a thought...

The condition codes are possibly decoded early in the pipeline. Could this shortcut the loading of S & D values, resulting in a 1 clock instruction?
We might want this action to be cog selectable, since otherwise we would lose determinism.

jmg · 2019-02-05 21:52

I think it is one clock to load and decode the opcode and then one clock to act on it.

The 2 Clock opcodes in P2, fit the silicon quite well as the counters/timers/smart pins can all run ~ 2x faster than the core.
Once you have a PLL, the exact clocks per opcode matter less, as the clock is internally generated.

More widely useful in P3, would be a means to prescale the non-wait opcodes, so they draw less power but still run the peripherals at SysCLK.
Right now, all COGS run at SysCLK, and there is minimal clock gating, and no scope to run COGs at differing speeds.

Chip has briefly mentioned clock gating is going into P2r2, so that could get part way to solving this.

evanh · 2019-02-20 00:32

It would need a lot design work to adjust for that. For the Prop2, the whole pipeline is a tick tock arrangement. And there isn't any spare fetch slots, for cogRAM at least, for prefetching extra instructions. It could be opportunity based, but that adds even more indeterminacy.

evanh · 2019-02-20 00:37

As for my suggestion for Prop3 features, I'll throw in the "alt" for everything idea: "Have a bit-field in all opcodes just for specifying if the ALU result goes to specified D or to next instruction's D input."

cheezus · 2019-02-20 01:37

I still think some sort of code protection / obfuscation would really help the Propeller be seen as a more professional product. While this isn't nearly as big of a deal as it was 5 years ago, there's still a lot of R&D departments who wouldn't even consider using a chip without it. I also think simple JTAG support would really help attract volume customers. I know that Parallax's focus is on education but I'm sure they'd like to see some high volume demand too!

ersmith · 2019-02-20 01:50

Well, if we're dreaming of P3 I'll throw out some wild ideas:

(1) 64 bit registers. We have 64 pins, so it seems like 64 bit registers is a natural extension of that, and would somewhat uniquely position a hypothetical P3 as a 64-bit MCU.

(2) RISC-V instruction set with Propeller specific augments. This would make getting tools way easier, and would probably be attractive to the educational market. RISC-V has a pretty big reserved space for user defined instructions, so all kinds of things could be put in there while keeping the core RISC-V instructions for compilers to use.

jmg · 2019-02-20 01:53

cheezus wrote: »

... I also think simple JTAG support would really help attract volume customers...

Which aspects "would really help attract volume customers" - Do you mean for Board Testing, Debug, or programming ?
I'm not sure about JTAG Bypass instructions (used for chain of multiple JTAG parts), but I'd expect P2 should be able to connect to JTAG board testers, with a small stub loaded ?

Debug over JTAG is also possible, but would those volume customers really want to lose 4 smart pins for that ?

Those volume customers likely have something along the lines of this
https://www.xjtag.com/products/hardware/xjtag-expert-bst-oscilloscope/
which means they can program P2 SPI/QSPI Flash quickly, and run serial links as well.
TAQOZ may give enough resource to manage first-pass board testing ?

evanh · 2019-02-20 02:07

ersmith wrote: »

(1) 64 bit registers. We have 64 pins, so it seems like 64 bit registers is a natural extension of that, and would somewhat uniquely position a hypothetical P3 as a 64-bit MCU.

Ah, but we'll be wanting 128 pins by then! :P

Dave Hein · 2019-02-20 02:15

Something that would be nice in a P3 is the ability to cache instructions when executing from the hub. When running a loop the streamer has to be reloaded when jumping back to the start of the loop. It would be nice for the chip to keep these instructions in a cache memory so it doesn't have to reload them.

David Betz · 2019-02-20 02:54

How about a TLB for executing code from external memory?

Tubular · 2019-02-20 02:57

cheezus wrote: »

I still think some sort of code protection / obfuscation would really help the Propeller be seen as a more professional product. While this isn't nearly as big of a deal as it was 5 years ago, there's still a lot of R&D departments who wouldn't even consider using a chip without it. I also think simple JTAG support would really help attract volume customers. I know that Parallax's focus is on education but I'm sure they'd like to see some high volume demand too!

I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?

Cluso99 · 2019-02-20 05:39

Tubular wrote: »

cheezus wrote: »

I still think some sort of code protection / obfuscation would really help the Propeller be seen as a more professional product. While this isn't nearly as big of a deal as it was 5 years ago, there's still a lot of R&D departments who wouldn't even consider using a chip without it. I also think simple JTAG support would really help attract volume customers. I know that Parallax's focus is on education but I'm sure they'd like to see some high volume demand too!

I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?

Yes, I thought the same thing

jmg · 2019-02-20 05:44

Tubular wrote: »

I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?

Is that test logic genuine jtag, or something more custom, tuned for faster ASIC testing ?

Cluso99 · 2019-02-20 05:57

RISC-V
Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.

P2+
I would like a P2 with...

* 1MB HUB memory (that's the easy maximum without instruction changes)

* One COG with extra features
We all usually have one big program and lots of little ones, so
One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?

* Smartpins
A thought to using a mini-sized P1 style CPU instead of the existing smartpins
Could it run double the normal P2 clock speed?
Only base instructions like AND/OR/XOR/NOT/SHL/SHR/JMPRET/ADD/SUB... (no fancy instructions or MUL etc)
Only 512B (128*32) cog-like memory, no HUB access

* 90-110nm for 500MHz standard plus overclocking

potatohead · 2019-02-20 06:30

I wonder if the current DEBUG event system could be used to talk, or simulate JTAG?

I really like the D redirect idea.

Also agree with David. A means to manage external memory takes P3 to general CPU status. Really big programs will come with a process shrink.

jmg · 2019-02-20 07:28

Cluso99 wrote: »

* One COG with extra features
We all usually have one big program and lots of little ones, so
One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?

* Smartpins
A thought to using a mini-sized P1 style CPU instead of the existing smartpins
Could it run double the normal P2 clock speed?
Only base instructions like AND/OR/XOR/NOT/SHL/SHR/JMPRET/ADD/SUB... (no fancy instructions or MUL etc)
Only 512B (128*32) cog-like memory, no HUB access

Asymmetric COGs is a clear possible path.
However, replace of smartpins makes little sense : they are a key P2 feature, and SW is never as fast as HW, and there are 64 smart pins to replace....
- but what could be useful, would be a simple COG designed to service smart pins.

As more Smart pin examples appear, it becomes clear just how powerful they are, and how set-and-forget applies in many cases.
The service overhead is also often low-mips. Only in top-end Display or fast COMs are high data rates needed.

A blanket 'no hub access' is also too inflexible, but there is merit in being able to yield a hub slot to another COG.

Cluso99 wrote: »

RISC-V
Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.

That will depend more on the process (which will set MHz and Memory numbers), and customer demand.
'Too many players' is far from a problem, it means mature tools, and a good education base.

When P3 is rolled, there may be ByteCode engines proven enough to compile into verilog ?

More ROM has not been mentioned yet, but that is a relatively easy extension.

ersmith · 2019-02-20 13:19

Cluso99 wrote: »

RISC-V
Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.

So having broad support from the rest of the industry is a bad thing?

But anyway, a Risc-V P3 would be a very unique Risc-V, with a bunch of its own instructions. So yeah, it would certainly have its own quirks, but the whole point would be to make them *good* quirks. And non-quirky code could be ported right over from other chips easily (in particular compilers will be very easy to obtain, at least basic compilers that don't need P3 specific instructions -- for those we would expect assembly to be used).

* One COG with extra features
We all usually have one big program and lots of little ones, so
One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?

Of course if P3 is 64 bits then we could have 64 bit instructions, which means D & S space could be a lot bigger. For example, 16 bits D, S1, and S2 (two sources per instruction, D just gets the result) with 16 bits left over for the instruction bits themselves. That's if Chip doesn't go the Risc-V route, which would have a small set of registers but a huge address space, some of which could be mapped to COG memory, some to HUB, and some to external RAM.

Rayman · 2019-02-20 13:51

Exec from HyperRam or HyperFlash would be neat.

cheezus · 2019-02-20 15:55

jmg wrote: »

Which aspects "would really help attract volume customers" - Do you mean for Board Testing, Debug, or programming ?
I'm not sure about JTAG Bypass instructions (used for chain of multiple JTAG parts), but I'd expect P2 should be able to connect to JTAG board testers, with a small stub loaded ?

Debug over JTAG is also possible, but would those volume customers really want to lose 4 smart pins for that ?

Those volume customers likely have something along the lines of this
https://www.xjtag.com/products/hardware/xjtag-expert-bst-oscilloscope/
which means they can program P2 SPI/QSPI Flash quickly, and run serial links as well.
TAQOZ may give enough resource to manage first-pass board testing ?

I was mainly thinking Boundary Scan would be nice for board testing. I realize that the propeller could do this with a simple program, but if there's a short or open on the programming pins (or possibly other pins).. Bypass shouldn't be that hard to implement. My thought was to use 4 pins for TDI, TDO, CLK, MODE, connect the Reset line to one of the VIO group pins so these 4 pins could still be used. I'm not sure if this would work and there's probably plenty of reasons NOT to do this. Just a thought...

*edit-
The idea of an accessable JTAG port came from reading Chip's posts about the process. From my understanding the JTAG port is disabled (reset pulled high) when bonding the wafer to the package. I would imagine that Boundry Scan and Bypass are at LEAST implemented. I could be wrong though.

Rayman · 2019-02-20 17:25

It'd also be nice to have 1080p out on HDMI natively...

Cluso99 · 2019-02-20 17:37

Rayman wrote: »

It'd also be nice to have 1080p out on HDMI natively...

Isn’t this in the respin???

jmg · 2019-02-20 19:00

Rayman wrote: »

Exec from HyperRam or HyperFlash would be neat.

Rayman wrote: »

It'd also be nice to have 1080p out on HDMI natively...

.. and a better call stack for software...

ersmith wrote: »

Of course if P3 is 64 bits then we could have 64 bit instructions, which means D & S space could be a lot bigger. For example, 16 bits D, S1, and S2 (two sources per instruction, D just gets the result) with 16 bits left over for the instruction bits themselves. That's if Chip doesn't go the Risc-V route, which would have a small set of registers but a huge address space, some of which could be mapped to COG memory, some to HUB, and some to external RAM.

There are opcode sizes between 32 and 64, and even variable length opcodes to consider.
eg using 64b for a RET seems hugely wasteful
P2 already has some 64b opcodes.

jmg · 2019-02-20 19:02

Cluso99 wrote: »

Rayman wrote: »

It'd also be nice to have 1080p out on HDMI natively...

Isn’t this in the respin???

Not at 1080p, the respin targets only 250MHz, and that pre-supposes the new added hardware can P&R create a die that can be spec'd to reach 250MHz....

evanh · 2019-02-20 20:27

Given HDMI needs 10x the data rate of the pixel rate, 640x480's 250MHz data rate would normally be considered the only viable resolution for the Prop2. In an attempt to go higher I've been playing with display timing of Eric's vgatile driver. My LCD TV seems extremely forgiving of timings for any particular known resolution ... but it doesn't accept an unknown resolution.

I suspect modern 16:9 monitors/TVs will generally have similar "reduced blanking" capabilities. After all, they have no need of retrace times.

What I've been able to do is start from the spec'd 34MHz pixel rate of 848x480@60Hz mode and chop down the timings until I've even got 848x480 at 25MHz working.

Question is, does this work for anyone else at all?
PS: You need Eric's vgatile files and fastspin to make this work.
EDIT: Attachment moved to https://forums.parallax.com/discussion/comment/1465323/#Comment_1465323

Electrodude · 2019-02-20 21:10

There should be a pathway from the CORDIC output to LUTRAM. It's currently nearly impossible to use the CORDIC at full bandwidth because you have to juggle submitting new commands with reading results. It would be much easier if you could arrange for results to be automatically written to increasing addresses in a selectable area of LUTRAM; this would allow you to decouple submitting new commands from having to read results.

cgracey · 2019-02-20 21:19

Using the colorspace converter, analog 1080p HDMI can be signaled over just 3 pins. It only needs 148.5MHz to work. I will make a demo as soon as I can.

evanh · 2019-02-20 21:24

Chip! HDMI does't have analogue. You could call it full-HD video I guess.

EDIT: I wonder if component video on TV's might handle that. I note my cheapo TV doesn't have component video inputs.

Phil Pilgrim (PhiPi) · 2019-02-21 19:05

For the Prop 3, I just want what could've been offered years ago in Prop 1.5: more counters per cog and more counter modes. Maybe higher speed. Don't even want more pins. Just that sweet, simple elegance of the Prop 1 architecture that makes programming such a pleasure!

-Phil

David Betz · 2019-02-21 19:12

Phil Pilgrim (PhiPi) wrote: »

For the Prop 3, I just want what could've been offered years ago in Prop 1.5: more counters per cog and more counter modes. Maybe higher speed. Don't even want more pins. Just that sweet, simple elegance of the Prop 1 architecture that makes programming such a pleasure!

-Phil

You don't even want more memory? 64K with a loadable Spin interpreter would be nice.

Rayman · 2019-02-21 19:13

Be fun to have an ATSC demodulator...

BTW: I'm surprised Phil hasn't done a FM demodulator for P2 yet...

Phil Pilgrim (PhiPi) · 2019-02-21 20:54

David Betz wrote:

You don't even want more memory? 64K with a loadable Spin interpreter would be nice.

Nope. I've never needed more RAM. The P1's ROM/RAM combo is perfect.

-Phil

P3 ideas

Comments