SmartPins... What could a few specialised cpus do instead ???

Cluso99 · 2016-01-16 07:22

The concept behind the P1, and initially the P2, was that some of the cogs would be dedicated to handling the I/O Pins.
After the P2-HOT came 16 cogs and more hub ram, but lower spec cogs.

Somewhere along the way came some ideas about putting some smarts into the I/O pins themselves. This has grown to a large LUT/ALE space for all sorts of things, including placing the counters there too. This means that we also need some primitive counters in the cogs as well.

I wonder what could be achieved with a number (say 8?) of tiny cpu's specialised for handing a group of I/O pins?
For fun (I have seen the other thread about naming the smart pins), lets call these "tappets" (part of an engine) for now.

What would these "tappets" (cpu's) require?

The cpu could be 8 bits with a small sram memory and say 16 internal addressable registers.
The cpu could access a group of 8 I/Os selected from 64. Multiple cpu's could operate in parallel like cogs (OR outputs).
A few simple instructions like AND/OR/XOR/ANDN and SHIFT IN/OUT PIN instructions to accumulate the bits easy.
Perhaps it could clock faster than the cogs because of the smaller and simpler instruction set.

cgracey · 2016-01-16 07:30

An 8-bit processor could be really small. After this chip is done, we could develop that.

Cluso99 · 2016-01-16 07:32

Could a little cpu like this (single port instruction ram and internal registers) run at 2x P2 COG speed?

jmg · 2016-01-16 07:37

Cluso99 wrote: »

Somewhere along the way came some ideas about putting some smarts into the I/O pins themselves. This has grown to a large LUT/ALE space for all sorts of things, including placing the counters there too. This means that we also need some primitive counters in the cogs as well.

The P1 counters were in each COG, and they are removed (there would have been 32 of those)
The P2 Pin cell shifts those 32 counters, so the logic cost is not nearly as high, as at first glance.
All that remains in the COG is some very small state-sync & message passing counters.

Flexi-pins follow on from many MCUs, that now have USARTS, which can do any of SPI+ASYNC+i2c
- pulling a counter feature into that block is the next logical step.
Chip has made them uni-directional on Async, so the overhead of both Tx and Rx do not impact one cell
(but you do need to use 2 calls for Duplex)

Cluso99 wrote: »

The cpu could be 8 bits with a small sram memory and say 16 internal addressable registers.
The cpu could access a group of 8 I/Os selected from 64. Multiple cpu's could operate in parallel like cogs (OR outputs).
A few simple instructions like AND/OR/XOR/ANDN and SHIFT IN/OUT PIN instructions to accumulate the bits easy.
Perhaps it could clock faster than the cogs because of the smaller and simpler instruction set.

It would need to clock a LOT faster to be as useful.
SW is not the way to get lowest power or highest speed, and there are already SW engines in a P2.

Then, you need to somehow load the nano-code and debug it, in these cells.

Such Software management issues are mainly what limited the success of the Freescale TPU Pin co-processors.

I think the currently P2 pin cell allows 63 TX and one Rx from one COG, with fully granular boundaries (no 8 i/o blocking).

Heater. · 2016-01-16 07:51

My issue with this kind of idea is that I would have to deal with two architectures and two instruction sets. One is quite enough for me.

It's not clear to me how this is going to be quick.

My dream P3 is a 64 bit RISC V engine running Linux and 16 P2 cores as it's "tappets"

Phil Pilgrim (PhiPi) · 2016-01-16 08:07

The idea is not new. As far back as the '70s, mainframes had "channels," which were micro-coded peripheral processors.

-Phil

evanh · 2016-01-16 08:17

Cluso99 wrote: »

Could a little cpu like this (single port instruction ram and internal registers) run at 2x P2 COG speed?

One instruction per clock ... suddenly not so small!

evanh · 2016-01-16 08:21

Funnily, the Prop2 Cogs have far more counters than the Prop1 Cogs ever had. Starting with the three IRQ timers, correct? Then there is the FIFO will have it's own counters for pacing and chunking. And how many are being used in the Streamer for same? I've probably missed some ...

Cluso99 · 2016-01-16 08:28

evanh wrote: »

Cluso99 wrote: »

Could a little cpu like this (single port instruction ram and internal registers) run at 2x P2 COG speed?

One instruction per clock ... suddenly not so small!

I didn't say single clock instructions. Cogs are slower because of the big ALU. So the clocking might be able to be twice as fast. Using internal registers means that the clock to fetch the source and target registers is not required. So you have fetch, execute and result clocks. So if they could run at 300MHz, then 100MHz per instruction cycle.

I was more thinking along the lines of a heavily simplified P1 cog with say 12 bit instructions.

Phil,
Yes I remember those. They were much more common in the 80s though. I even built some for the ICL mini.

evanh · 2016-01-16 08:29

Sorry, not trying to poo-poo everything. Reality is that SmartPins is a bit bulky and there's not likely any way to make them tiny. Fingers crossed they'll be worth it.

evanh · 2016-01-16 08:33

Cluso99 wrote: »

I didn't say single clock instructions. Cogs are slower because of the big ALU. So the clocking might be able to be twice as fast. Using internal registers means that the clock to fetch the source and target registers is not required. So you have fetch, execute and result clocks. So if they could run at 300MHz, then 100MHz per instruction cycle.

On the same die as the Prop2?! Poor Chip ... Actually, 100 MIPS is no faster than the Cogs!

SmartPins... What could a few specialised cpus do instead ???

Comments