SmartPins... What could a few specialised cpus do instead ???
Cluso99
Posts: 18,069
in Propeller 2
The concept behind the P1, and initially the P2, was that some of the cogs would be dedicated to handling the I/O Pins.
After the P2-HOT came 16 cogs and more hub ram, but lower spec cogs.
Somewhere along the way came some ideas about putting some smarts into the I/O pins themselves. This has grown to a large LUT/ALE space for all sorts of things, including placing the counters there too. This means that we also need some primitive counters in the cogs as well.
I wonder what could be achieved with a number (say 8?) of tiny cpu's specialised for handing a group of I/O pins?
For fun (I have seen the other thread about naming the smart pins), lets call these "tappets" (part of an engine) for now.
What would these "tappets" (cpu's) require?
The cpu could be 8 bits with a small sram memory and say 16 internal addressable registers.
The cpu could access a group of 8 I/Os selected from 64. Multiple cpu's could operate in parallel like cogs (OR outputs).
A few simple instructions like AND/OR/XOR/ANDN and SHIFT IN/OUT PIN instructions to accumulate the bits easy.
Perhaps it could clock faster than the cogs because of the smaller and simpler instruction set.
After the P2-HOT came 16 cogs and more hub ram, but lower spec cogs.
Somewhere along the way came some ideas about putting some smarts into the I/O pins themselves. This has grown to a large LUT/ALE space for all sorts of things, including placing the counters there too. This means that we also need some primitive counters in the cogs as well.
I wonder what could be achieved with a number (say 8?) of tiny cpu's specialised for handing a group of I/O pins?
For fun (I have seen the other thread about naming the smart pins), lets call these "tappets" (part of an engine) for now.
What would these "tappets" (cpu's) require?
The cpu could be 8 bits with a small sram memory and say 16 internal addressable registers.
The cpu could access a group of 8 I/Os selected from 64. Multiple cpu's could operate in parallel like cogs (OR outputs).
A few simple instructions like AND/OR/XOR/ANDN and SHIFT IN/OUT PIN instructions to accumulate the bits easy.
Perhaps it could clock faster than the cogs because of the smaller and simpler instruction set.
Comments
The P2 Pin cell shifts those 32 counters, so the logic cost is not nearly as high, as at first glance.
All that remains in the COG is some very small state-sync & message passing counters.
Flexi-pins follow on from many MCUs, that now have USARTS, which can do any of SPI+ASYNC+i2c
- pulling a counter feature into that block is the next logical step.
Chip has made them uni-directional on Async, so the overhead of both Tx and Rx do not impact one cell
(but you do need to use 2 calls for Duplex)
It would need to clock a LOT faster to be as useful.
SW is not the way to get lowest power or highest speed, and there are already SW engines in a P2.
Then, you need to somehow load the nano-code and debug it, in these cells.
Such Software management issues are mainly what limited the success of the Freescale TPU Pin co-processors.
I think the currently P2 pin cell allows 63 TX and one Rx from one COG, with fully granular boundaries (no 8 i/o blocking).
It's not clear to me how this is going to be quick.
My dream P3 is a 64 bit RISC V engine running Linux and 16 P2 cores as it's "tappets"
-Phil
One instruction per clock ... suddenly not so small!
I was more thinking along the lines of a heavily simplified P1 cog with say 12 bit instructions.
Phil,
Yes I remember those. They were much more common in the 80s though. I even built some for the ICL mini.
On the same die as the Prop2?! Poor Chip ... Actually, 100 MIPS is no faster than the Cogs!