P3 ideas
Cluso99
Posts: 18,069
in Propeller 2
Title says it all
Just had a thought...
The condition codes are possibly decoded early in the pipeline. Could this shortcut the loading of S & D values, resulting in a 1 clock instruction?
We might want this action to be cog selectable, since otherwise we would lose determinism.
Just had a thought...
The condition codes are possibly decoded early in the pipeline. Could this shortcut the loading of S & D values, resulting in a 1 clock instruction?
We might want this action to be cog selectable, since otherwise we would lose determinism.
Comments
The 2 Clock opcodes in P2, fit the silicon quite well as the counters/timers/smart pins can all run ~ 2x faster than the core.
Once you have a PLL, the exact clocks per opcode matter less, as the clock is internally generated.
More widely useful in P3, would be a means to prescale the non-wait opcodes, so they draw less power but still run the peripherals at SysCLK.
Right now, all COGS run at SysCLK, and there is minimal clock gating, and no scope to run COGs at differing speeds.
Chip has briefly mentioned clock gating is going into P2r2, so that could get part way to solving this.
(1) 64 bit registers. We have 64 pins, so it seems like 64 bit registers is a natural extension of that, and would somewhat uniquely position a hypothetical P3 as a 64-bit MCU.
(2) RISC-V instruction set with Propeller specific augments. This would make getting tools way easier, and would probably be attractive to the educational market. RISC-V has a pretty big reserved space for user defined instructions, so all kinds of things could be put in there while keeping the core RISC-V instructions for compilers to use.
I'm not sure about JTAG Bypass instructions (used for chain of multiple JTAG parts), but I'd expect P2 should be able to connect to JTAG board testers, with a small stub loaded ?
Debug over JTAG is also possible, but would those volume customers really want to lose 4 smart pins for that ?
Those volume customers likely have something along the lines of this
https://www.xjtag.com/products/hardware/xjtag-expert-bst-oscilloscope/
which means they can program P2 SPI/QSPI Flash quickly, and run serial links as well.
TAQOZ may give enough resource to manage first-pass board testing ?
I wonder how useful/extensive the jtag test logic that OnSemi bake in, is?
Yes, I thought the same thing
Heck no! Will be too many players in the future, sort of like ARM but different, all with their own quirks.
P2+
I would like a P2 with...
* 1MB HUB memory (that's the easy maximum without instruction changes)
* One COG with extra features
We all usually have one big program and lots of little ones, so
One COG (say Cog 0) with more COG/LUT memory, say 8-64KB
If it could have a modifier instruction AUGLUT to extend D & S instructions to use the dual port like the existing COG memory that would be fantastic as it effectively increases the register set. Otherwise, could it be single port (smaller)?
* Smartpins
A thought to using a mini-sized P1 style CPU instead of the existing smartpins
Could it run double the normal P2 clock speed?
Only base instructions like AND/OR/XOR/NOT/SHL/SHR/JMPRET/ADD/SUB... (no fancy instructions or MUL etc)
Only 512B (128*32) cog-like memory, no HUB access
* 90-110nm for 500MHz standard plus overclocking
I really like the D redirect idea.
Also agree with David. A means to manage external memory takes P3 to general CPU status. Really big programs will come with a process shrink.
Asymmetric COGs is a clear possible path.
However, replace of smartpins makes little sense : they are a key P2 feature, and SW is never as fast as HW, and there are 64 smart pins to replace....
- but what could be useful, would be a simple COG designed to service smart pins.
As more Smart pin examples appear, it becomes clear just how powerful they are, and how set-and-forget applies in many cases.
The service overhead is also often low-mips. Only in top-end Display or fast COMs are high data rates needed.
A blanket 'no hub access' is also too inflexible, but there is merit in being able to yield a hub slot to another COG.
That will depend more on the process (which will set MHz and Memory numbers), and customer demand.
'Too many players' is far from a problem, it means mature tools, and a good education base.
When P3 is rolled, there may be ByteCode engines proven enough to compile into verilog ?
More ROM has not been mentioned yet, but that is a relatively easy extension.
So having broad support from the rest of the industry is a bad thing?
But anyway, a Risc-V P3 would be a very unique Risc-V, with a bunch of its own instructions. So yeah, it would certainly have its own quirks, but the whole point would be to make them *good* quirks. And non-quirky code could be ported right over from other chips easily (in particular compilers will be very easy to obtain, at least basic compilers that don't need P3 specific instructions -- for those we would expect assembly to be used).
Of course if P3 is 64 bits then we could have 64 bit instructions, which means D & S space could be a lot bigger. For example, 16 bits D, S1, and S2 (two sources per instruction, D just gets the result) with 16 bits left over for the instruction bits themselves. That's if Chip doesn't go the Risc-V route, which would have a small set of registers but a huge address space, some of which could be mapped to COG memory, some to HUB, and some to external RAM.
I was mainly thinking Boundary Scan would be nice for board testing. I realize that the propeller could do this with a simple program, but if there's a short or open on the programming pins (or possibly other pins).. Bypass shouldn't be that hard to implement. My thought was to use 4 pins for TDI, TDO, CLK, MODE, connect the Reset line to one of the VIO group pins so these 4 pins could still be used. I'm not sure if this would work and there's probably plenty of reasons NOT to do this. Just a thought...
*edit-
The idea of an accessable JTAG port came from reading Chip's posts about the process. From my understanding the JTAG port is disabled (reset pulled high) when bonding the wafer to the package. I would imagine that Boundry Scan and Bypass are at LEAST implemented. I could be wrong though.
.. and a better call stack for software...
There are opcode sizes between 32 and 64, and even variable length opcodes to consider.
eg using 64b for a RET seems hugely wasteful
P2 already has some 64b opcodes.
Not at 1080p, the respin targets only 250MHz, and that pre-supposes the new added hardware can P&R create a die that can be spec'd to reach 250MHz....
I suspect modern 16:9 monitors/TVs will generally have similar "reduced blanking" capabilities. After all, they have no need of retrace times.
What I've been able to do is start from the spec'd 34MHz pixel rate of 848x480@60Hz mode and chop down the timings until I've even got 848x480 at 25MHz working.
Question is, does this work for anyone else at all?
PS: You need Eric's vgatile files and fastspin to make this work.
EDIT: Attachment moved to https://forums.parallax.com/discussion/comment/1465323/#Comment_1465323
EDIT: I wonder if component video on TV's might handle that. I note my cheapo TV doesn't have component video inputs.
-Phil
BTW: I'm surprised Phil hasn't done a FM demodulator for P2 yet...
-Phil