"Smart Pins" as Programmable Gate Array?

Phil Pilgrim (PhiPi) · 2014-05-27 14:31

There are so many things that could put demands on the P2's smart pins, including a plethora of serializing and deserializing protocols, that I was wondering, "Why not just configure the smart pins as a sea of gates and flip-flops interconnected by programmable RAM cells, and let the programmer decide how to configure them?" This would be very Prop-like in that it provides just the most rudimentary building blocks, yet would provide the performance boost at the pin level that people are looking for. Moreover, it defers a lot of sticky decisions until after the chip is built -- hopefully hastening the advent of that day. Finally, it allows the configuration of functions that maybe nobody thought about in the chip design phase.

Anyway, it works for the PSoC; it might also work for the P2. Thoughts?

-Phil

rjo__ · 2014-05-27 14:56

Hmmmm. I like it.

Sounds like a perfect educational opportunity.

But maybe I like it because I don't understand exactly what would be required.

Where would the programming environment for the gate array come from?

Should we ask Chip to integrate a limited FPGA programming environment into Spin?

mark · 2014-05-27 15:29

Probably would take up significantly more die space than what's currently planned, and cutting down on the gates might limit their functionality. That said, I'm not against all pins not being equal, so letting some be programmable is fine by me. I was actually going to start a thread on the practicality (or lack of it) of having all pins be "smart". For one, because there's so many of them, there's some inherent compromises in terms of internal connectivity between them and the cogs that might be eliminated if less pins had that functionality. Another issue is that with Chip considering every 5th pin be IO VDD, and VSS being tied to the large power pad on the bottom of the package, board layout will be very difficult for a simple 2-sided board. If only, say, half the pins are smart, with the rest being perhaps simpler digital IO, that could reduce the amount of VDD pins at least for those pin banks, possibly making routing easier.

tonyp12 · 2014-05-27 15:32

How many gates do you need to do something useful, 5000?
Lowest count on the market today seems to be a Igloo nano AGLN010 with 10K gates and 260 versatiles
Though I have a hard time seeing what selections of cores it can be loaded with.

http://www.microsemi.com/products/fpga-soc/fpga/igloo-nano#product-tables

jmg · 2014-05-27 15:35

Phil Pilgrim (PhiPi) wrote: »

"Why not just configure the smart pins as a sea of gates and flip-flops interconnected by programmable RAM cells, and let the programmer decide how to configure them?"

Easy to say, and certainly flexible, but the silicon area needed is prohibitive.

We still do not have a area-cost for even the simplest Counter/PWM/NCO cells, but 32b Counters/capture/Compare are very costly in "a sea of gates and flip-flops ".

What may be possible is Programmable Logic managed how Silego do it - see
http://www.silego.com/products/greenpak3.html

They have the blocks hard-coded, Counters, Shift registers etc, and the Programmable part is in the connections between the elements, done on a simple schematic entry. This slashes the 'config fuse' count.

It comes down to how many config-bits make silicon sense, but even for configurable peripherals with let's say 4 longs to manage all the interconnect & options, it can be useful to have software wizard like tools to generate the bits, and give a resulting report of what the user actually has.

That allows both way to pgm : Someone can use assembler to build up the longs, if they really want to, or they may prefer to use the form/wizard to define what they want, and have the tools create a config binary bitset.

Tubular · 2014-05-27 15:59

Could be awesome, Phil. We'd need to have some very easy to use standard configurations, for users to be able to easily grasp, but that'd be easy enough.

The real estate around the pin pads is probably "cheaper" than that in the cog area. Is there a way to work out roughly the available area (transistors/flops) that might be on offer?

jmg · 2014-05-27 16:20

[QUOTE=mark

Lawson · 2014-05-27 16:20

jmg wrote: »

Easy to say, and certainly flexible, but the silicon area needed is prohibitive.

We still do not have a area-cost for even the simplest Counter/PWM/NCO cells, but 32b Counters/capture/Compare are very costly in "a sea of gates and flip-flops ".

What may be possible is Programmable Logic managed how Silego do it - see
http://www.silego.com/products/greenpak3.html

They have the blocks hard-coded, Counters, Shift registers etc, and the Programmable part is in the connections between the elements, done on a simple schematic entry. This slashes the 'config fuse' count.

It comes down to how many config-bits make silicon sense, but even for configurable peripherals with let's say 4 longs to manage all the interconnect & options, it can be useful to have software wizard like tools to generate the bits, and give a resulting report of what the user actually has.

That allows both way to pgm : Someone can use assembler to build up the longs, if they really want to, or they may prefer to use the form/wizard to define what they want, and have the tools create a config binary bitset.

Sounds a lot like what I suggested in the smart pins mega-thread. (page 2 post #34 and #39) A key thing to include is the state of the adjacent pins. This will allow 2+ pins to be ganged to support functions that don't fit in a single "smart pin".

To simplify configuration, how about if SPIN/ASM have built in macros to configure common use cases? I.e. NCO, PWM, logic modes, shift register modes, etc. The novice could then throw in a "CONFIGPIN NCO" macro followed by the frequency, while more sophisticated users could upload the raw config bits for maximum functionality. The big cost I see is configuration overhead. (and the area/power it consumes)

Marty

jmg · 2014-05-27 16:26

Lawson wrote: »

To simplify configuration, how about if SPIN/ASM have built in macros to configure common use cases? I.e. NCO, PWM, logic modes, shift register modes, etc. The novice could then throw in a "CONFIGPIN NCO" macro followed by the frequency, while more sophisticated users could upload the raw config bits for maximum functionality.

Sounds good, but to make it flexible perhaps better to use a simple include syntax for those macros. to allow updates/fixes to CONFIG_NCO.MAC,and the provided macro files can include help descriptions.

brucee · 2014-05-27 16:43

This is a great idea, and it is called an FPGA.

Actually my dream chip would be an ARM core or 2 and a small FPGA surrounding the ARM and IO section. There are some very high end products aimed at this space already Zync from Xilinx, Atmel has some hard./soft core ARMs inside an FPGA.

But there really is a hole in the low end, take a CPU core or more, and add about 2K programmable gate (this was state of the art around 1990), so it should not be that big in terms of die area. With that you could do the serial/deserial stuff with various encode/decode features. Attach a digital PLL and it could handle most serial protocols out there.

There are actually some public domain designs for such a thing, that are part of various masters/PHD thesis out there even including programming tools. I know I've done the research. Frankly on the tool side I'd be happy with programming the LUTs and connections by hand (kind of like assembly language for hardware, and I have experience as we used some early Xilinx parts before there were place and route tools).

I'm not sure Parallax could really tackle something like this, as I'd estimate it would add a couple years to the design schedule.

Phil Pilgrim (PhiPi) · 2014-05-27 16:50

brucee wrote:

... I'd estimate it would add a couple years to the design schedule.

That would be the opposite of what I hoped the idea would accomplish: to defer the hard decisions until after the chip is built.

-Phil

jazzed · 2014-05-27 16:50

Sounds like a great follow-on feature.

Why not just have such a thing as a blob or two like in the case of cordic?

I'm afraid that smart-pins on every pin that are too smart would be another black hole for cash and time.

Phil Pilgrim (PhiPi) · 2014-05-27 16:54

The nice thing about of sea of gates is that they can be shared amongst neighboring pins. You don't have to have a dedicated group for each pin. It would be up to the programmer to optimize the use of shared, finite resources.

-Phil

Roy Eltham · 2014-05-27 16:58

I could see a setup with a "sea of gates" per 4 or 8 pins. And you could configure the gates via the MSGOUT mechanism maybe?
This idea is really interesting and the potential for fun and awesomeness is crazy high!

jmg · 2014-05-27 16:58

Phil Pilgrim (PhiPi) wrote: »

That would be the opposite of what I hoped the idea would accomplish: to defer the hard decisions until after the chip is built.

Sea of gates is expensive to implement, and needs a lot of Chip testing time as well - so it is not zero-cost - and it will be slower.

jazzed · 2014-05-27 17:05

So to use such a thing would require Verilog?

I suppose one could find a way to make "VeriSpin" by combining things .... Then there is System-C LOL

AHDL is probably too dated, but even I was able to design a PLD controller for a production board with that 20 years ago.

jmg · 2014-05-27 17:08

brucee wrote: »

But there really is a hole in the low end, take a CPU core or more, and add about 2K programmable gate (this was state of the art around 1990), so it should not be that big in terms of die area. ...

Atmel had the FpSLIC, which flopped, and there was Triscend, which also vanished....
Microsemi have some ARM M3 + FPGAs now, which I think come under $20.
Cypress has PSoC, but they have moved a little from the Configurable to more hardended peripherals, as the earlier chips cost more than an external CPLD and Generic uC
Some of the Cypress PSoC fabric speeds were frankly underwhelming.

brucee wrote: »

I'm not sure Parallax could really tackle something like this, as I'd estimate it would add a couple years to the design schedule.

Yes the design and testing costs of this, are not for the faint-hearted.

It will also be SLOWER in P2 done as sea of gates - I would rather have a proper 32b 200MHz counter, with config choices on Clocks/capture, than a sea of gates that swallowed silicon, and gave 75MHz speeds...

jmg · 2014-05-27 17:11

jazzed wrote: »

So to use such a thing would require Verilog?

If it did, Parallax would need to provide the back end fitter tools, which are not trivial.

Silego (that I linked to above) do not use Verilog as they are more focused on their Config choices.

mark · 2014-05-27 17:45

jmg wrote: »

It will also be SLOWER in P2 done as sea of gates - I would rather have a proper 32b 200MHz counter, with config choices on Clocks/capture, than a sea of gates that swallowed silicon, and gave 75MHz speeds...

And that makes it not worth it, imo. Capable IOs which can de/serialize with a few options and ADC/DACs which route to 16 cogs, quite a lot of functionality can be realized with good performance while still being more efficient overally and simpler to use than some kind of programmable gate arrays.

Cluso99 · 2014-05-27 19:01

Phil, the concept sounds fantastic. But I am unsure about the time to implement it. Definitely worth fleshing out though.

I am sure they blocks could be configured using the MSG pins/protocol that Chip has implemented.

I am sure we could work out the sw required so that a block of gates could be defined as a sw sequence. There would be so many playing with this that solutions would be found. I hand coded some of the early Xilinx parts, even after they had autorouting, so that I could get the best out of the chip. I don't think Verilog would be needed.

jmg: It does not have to be slower because you just permit a bypass circuit to bypass the sea of gates.

General:

Some blocks would still need to be there, such as counter blocks.
Probably nice to have them in groups of 4 pins, so that individual, pairs or quad pins would be fit nicely, with a signal or two going between adjacent quad pin blocks, so that larger blocks could be built.

As for silicon space, Chip has said that we could go to a larger die. IIRC the costs Chip gave quite some time ago showed that the silicon is a rather small part of the overall actual chip cost.

I would also be a little concerned about power usage too.

jmg · 2014-05-27 19:12

Cluso99 wrote: »

jmg: It does not have to be slower because you just permit a bypass circuit to bypass the sea of gates.

? Then you need to clarify exactly what your 'sea of gates' does.
If you can bypass it to feed a Full speed 32b counter directly, then is it no longer a sea of gates, just a Multiplexer on steroids - which has .already been suggested above.

Flexible configuration is fine, but I would not call that' a sea of gates' .

Things that would need to be in Silicon, for speed and size reasons
* Counter cells, 32b
* Capture cells, 32b
* PWM - Chip has already proposed a Dual-Ctr true PWM, can use the blocks above
* Edge Management - edge detect & logic to cover the P1 Counter modes
* Baud Prescaler (fractional Baud support would be good)
* DPLL variant of Baud prescaler, locks to data edges (used for USB & other protocols)
* Serial Shifters/Buffers - CLK, DO, DI, CLKEN control lines.

The Logic fabric then just does interconnect of the above Silicon Blocks, more like a cross point + Config bitsets.

Phil Pilgrim (PhiPi) · 2014-05-27 19:36

I have nothing against macro blocks, so long as they're very basic, e.g. counters, shift registers, and the like. But the more specific they become, the less useful they will be in general. What I do not want to do with this idea is to delay the P2's roll-out. If it complicates rather than simplifies, it should be discarded!

-Phil

jazzed · 2014-05-27 19:39

Phil Pilgrim (PhiPi) wrote: »

What I do not want to do with this idea is to delay the P2's roll-out. If it complicates rather than simplifies, it should be discarded!

Maybe you were expecting shoe-maker elves to deliver a design? ;-)

jmg · 2014-05-27 19:54

Phil Pilgrim (PhiPi) wrote: »

I have nothing against macro blocks, so long as they're very basic, e.g. counters, shift registers, and the like. But the more specific they become, the less useful they will be in general.

Depends what 'very basic' means ? -the most basic counter is CLK,QN, but that is not quite useful enough.

To support Quadrature IP, for example, you need CLK, DIRN.
To Support P1 modes, you need CLK, CLK_EN (+ other logic)
To support PWM, as Chip described, you need ReLoad and Saturate
To capture you need a capture Enable

- So the Minimum Useful Counter Block becomes CLK, DIRN, CLK_EN, PL, CAPT, SAT, TC, RST as booleans,
& Qo,QCAPT, PIn as 32b data paths.

Configurable logic / bitsets can set up what drives/senses those boolean IP/OP, and the 32b paths would be memory mapped.

Phil Pilgrim (PhiPi) · 2014-05-27 19:57

jazzed wrote:

Maybe you were expecting shoe-maker elves to deliver a design?

Not expecting, but hoping that maybe unloading macro function design decisions onto the programmers would hasten the silcon design phase. IOW, just throw a bunch of gates and basic function blocks onto the wafer with some cross-point switches, and call it good for now. As a programmer, I would rather be the arbiter of how things are connected than to have it done for me a priori and have it be a compromise between what I need and what's been predetermined.

-Phil

Cluso99 · 2014-05-27 20:03

jmg wrote: »

Depends what 'very basic' means ? -the most basic counter is CLK,QN, but that is not quite useful enough.

To support Quadrature IP, for example, you need CLK, DIRN.
To Support P1 modes, you need CLK, CLK_EN (+ other logic)
To support PWM, as Chip described, you need ReLoad and Saturate
To capture you need a capture Enable

- So the Minimum Useful Counter Block becomes CLK, DIRN, CLK_EN, PL, CAPT, SAT, TC, RST as booleans,
& Qo,QCAPT, PIn as 32b data paths.

Configurable logic / bitsets can set up what drives/senses those boolean IP/OP, and the 32b paths would be memory mapped.

The problem is that there is no direct data path to cog ram. It all has to go either from the pin(s) or via MSG.

Phil Pilgrim (PhiPi) · 2014-05-27 20:11

Cluso99 wrote:

The problem is that there is no direct data path to cog ram. It all has to go either from the pin(s) or via MSG.

Several of the functional blocks could be registers mapped into cog memory space, similar to DIRx, INx, and OUTx.

-Phil

Cluso99 · 2014-05-27 20:12

Here is a good starting point - one of the P1 Counters (from AN001)

What extras do we need?

The Video shifter as a serialiser, and including using it as an input de-serialiser
Potential to chain counters ???
Anything else ???

How do we break this down to basic building blocks and some programmable gates ???

jmg · 2014-05-27 20:14

Cluso99 wrote: »

The problem is that there is no direct data path to cog ram. It all has to go either from the pin(s) or via MSG.

Yes, the 32b values would be memory mapped over the MSG config link.
I would guess 8-12 longs will be needed for settings of both Data and Connections, maybe more than12 if the crosspoint choices are made more comprehensive

jmg · 2014-05-27 20:17

Cluso99 wrote: »

[*]Anything else ???

Yes, quite a few things - see my list of a Minimum Useful Counter Block above.
The P1 Counter is actually an adder, which expands the Booleans, to include mode, or a 32b value to Add/Subtract.
(can default to 1).

Then you might get
CLK, DIRN, CLK_EN, PL, CAPT, SAT, TC, RST, Mode as booleans,
& Qo(RW),QCAPT(R), PIn(W),AddV(W) as 32b data paths.

Possibly Pin and AddV could merge, if you did not mind not having a ReLoad choice on Adder values <> 1

mark · 2014-05-27 20:20

I thought a good portion of the smart pins was already completed.

"Smart Pins" as Programmable Gate Array?

Comments