"Smart Pins" as Programmable Gate Array?
Phil Pilgrim (PhiPi)
Posts: 23,514
There are so many things that could put demands on the P2's smart pins, including a plethora of serializing and deserializing protocols, that I was wondering, "Why not just configure the smart pins as a sea of gates and flip-flops interconnected by programmable RAM cells, and let the programmer decide how to configure them?" This would be very Prop-like in that it provides just the most rudimentary building blocks, yet would provide the performance boost at the pin level that people are looking for. Moreover, it defers a lot of sticky decisions until after the chip is built -- hopefully hastening the advent of that day. Finally, it allows the configuration of functions that maybe nobody thought about in the chip design phase.
Anyway, it works for the PSoC; it might also work for the P2. Thoughts?
-Phil
Anyway, it works for the PSoC; it might also work for the P2. Thoughts?
-Phil
Comments
Sounds like a perfect educational opportunity.
But maybe I like it because I don't understand exactly what would be required.
Where would the programming environment for the gate array come from?
Should we ask Chip to integrate a limited FPGA programming environment into Spin?
Lowest count on the market today seems to be a Igloo nano AGLN010 with 10K gates and 260 versatiles
Though I have a hard time seeing what selections of cores it can be loaded with.
http://www.microsemi.com/products/fpga-soc/fpga/igloo-nano#product-tables
Easy to say, and certainly flexible, but the silicon area needed is prohibitive.
We still do not have a area-cost for even the simplest Counter/PWM/NCO cells, but 32b Counters/capture/Compare are very costly in "a sea of gates and flip-flops ".
What may be possible is Programmable Logic managed how Silego do it - see
http://www.silego.com/products/greenpak3.html
They have the blocks hard-coded, Counters, Shift registers etc, and the Programmable part is in the connections between the elements, done on a simple schematic entry. This slashes the 'config fuse' count.
It comes down to how many config-bits make silicon sense, but even for configurable peripherals with let's say 4 longs to manage all the interconnect & options, it can be useful to have software wizard like tools to generate the bits, and give a resulting report of what the user actually has.
That allows both way to pgm : Someone can use assembler to build up the longs, if they really want to, or they may prefer to use the form/wizard to define what they want, and have the tools create a config binary bitset.
The real estate around the pin pads is probably "cheaper" than that in the cog area. Is there a way to work out roughly the available area (transistors/flops) that might be on offer?
Sounds a lot like what I suggested in the smart pins mega-thread. (page 2 post #34 and #39) A key thing to include is the state of the adjacent pins. This will allow 2+ pins to be ganged to support functions that don't fit in a single "smart pin".
To simplify configuration, how about if SPIN/ASM have built in macros to configure common use cases? I.e. NCO, PWM, logic modes, shift register modes, etc. The novice could then throw in a "CONFIGPIN NCO" macro followed by the frequency, while more sophisticated users could upload the raw config bits for maximum functionality. The big cost I see is configuration overhead. (and the area/power it consumes)
Marty
Sounds good, but to make it flexible perhaps better to use a simple include syntax for those macros. to allow updates/fixes to CONFIG_NCO.MAC,and the provided macro files can include help descriptions.
Actually my dream chip would be an ARM core or 2 and a small FPGA surrounding the ARM and IO section. There are some very high end products aimed at this space already Zync from Xilinx, Atmel has some hard./soft core ARMs inside an FPGA.
But there really is a hole in the low end, take a CPU core or more, and add about 2K programmable gate (this was state of the art around 1990), so it should not be that big in terms of die area. With that you could do the serial/deserial stuff with various encode/decode features. Attach a digital PLL and it could handle most serial protocols out there.
There are actually some public domain designs for such a thing, that are part of various masters/PHD thesis out there even including programming tools. I know I've done the research. Frankly on the tool side I'd be happy with programming the LUTs and connections by hand (kind of like assembly language for hardware, and I have experience as we used some early Xilinx parts before there were place and route tools).
I'm not sure Parallax could really tackle something like this, as I'd estimate it would add a couple years to the design schedule.
-Phil
Why not just have such a thing as a blob or two like in the case of cordic?
I'm afraid that smart-pins on every pin that are too smart would be another black hole for cash and time.
-Phil
This idea is really interesting and the potential for fun and awesomeness is crazy high!
Sea of gates is expensive to implement, and needs a lot of Chip testing time as well - so it is not zero-cost - and it will be slower.
I suppose one could find a way to make "VeriSpin" by combining things .... Then there is System-C LOL
AHDL is probably too dated, but even I was able to design a PLD controller for a production board with that 20 years ago.
Atmel had the FpSLIC, which flopped, and there was Triscend, which also vanished....
Microsemi have some ARM M3 + FPGAs now, which I think come under $20.
Cypress has PSoC, but they have moved a little from the Configurable to more hardended peripherals, as the earlier chips cost more than an external CPLD and Generic uC
Some of the Cypress PSoC fabric speeds were frankly underwhelming.
Yes the design and testing costs of this, are not for the faint-hearted.
It will also be SLOWER in P2 done as sea of gates - I would rather have a proper 32b 200MHz counter, with config choices on Clocks/capture, than a sea of gates that swallowed silicon, and gave 75MHz speeds...
If it did, Parallax would need to provide the back end fitter tools, which are not trivial.
Silego (that I linked to above) do not use Verilog as they are more focused on their Config choices.
And that makes it not worth it, imo. Capable IOs which can de/serialize with a few options and ADC/DACs which route to 16 cogs, quite a lot of functionality can be realized with good performance while still being more efficient overally and simpler to use than some kind of programmable gate arrays.
I am sure they blocks could be configured using the MSG pins/protocol that Chip has implemented.
I am sure we could work out the sw required so that a block of gates could be defined as a sw sequence. There would be so many playing with this that solutions would be found. I hand coded some of the early Xilinx parts, even after they had autorouting, so that I could get the best out of the chip. I don't think Verilog would be needed.
jmg: It does not have to be slower because you just permit a bypass circuit to bypass the sea of gates.
General:
Some blocks would still need to be there, such as counter blocks.
Probably nice to have them in groups of 4 pins, so that individual, pairs or quad pins would be fit nicely, with a signal or two going between adjacent quad pin blocks, so that larger blocks could be built.
As for silicon space, Chip has said that we could go to a larger die. IIRC the costs Chip gave quite some time ago showed that the silicon is a rather small part of the overall actual chip cost.
I would also be a little concerned about power usage too.
? Then you need to clarify exactly what your 'sea of gates' does.
If you can bypass it to feed a Full speed 32b counter directly, then is it no longer a sea of gates, just a Multiplexer on steroids - which has .already been suggested above.
Flexible configuration is fine, but I would not call that' a sea of gates' .
Things that would need to be in Silicon, for speed and size reasons
* Counter cells, 32b
* Capture cells, 32b
* PWM - Chip has already proposed a Dual-Ctr true PWM, can use the blocks above
* Edge Management - edge detect & logic to cover the P1 Counter modes
* Baud Prescaler (fractional Baud support would be good)
* DPLL variant of Baud prescaler, locks to data edges (used for USB & other protocols)
* Serial Shifters/Buffers - CLK, DO, DI, CLKEN control lines.
The Logic fabric then just does interconnect of the above Silicon Blocks, more like a cross point + Config bitsets.
-Phil
Maybe you were expecting shoe-maker elves to deliver a design? ;-)
Depends what 'very basic' means ? -the most basic counter is CLK,QN, but that is not quite useful enough.
To support Quadrature IP, for example, you need CLK, DIRN.
To Support P1 modes, you need CLK, CLK_EN (+ other logic)
To support PWM, as Chip described, you need ReLoad and Saturate
To capture you need a capture Enable
- So the Minimum Useful Counter Block becomes CLK, DIRN, CLK_EN, PL, CAPT, SAT, TC, RST as booleans,
& Qo,QCAPT, PIn as 32b data paths.
Configurable logic / bitsets can set up what drives/senses those boolean IP/OP, and the 32b paths would be memory mapped.
-Phil
-Phil
What extras do we need?
- The Video shifter as a serialiser, and including using it as an input de-serialiser
- Potential to chain counters ???
- Anything else ???
How do we break this down to basic building blocks and some programmable gates ???Yes, the 32b values would be memory mapped over the MSG config link.
I would guess 8-12 longs will be needed for settings of both Data and Connections, maybe more than12 if the crosspoint choices are made more comprehensive
Yes, quite a few things - see my list of a Minimum Useful Counter Block above.
The P1 Counter is actually an adder, which expands the Booleans, to include mode, or a 32b value to Add/Subtract.
(can default to 1).
Then you might get
CLK, DIRN, CLK_EN, PL, CAPT, SAT, TC, RST, Mode as booleans,
& Qo(RW),QCAPT(R), PIn(W),AddV(W) as 32b data paths.
Possibly Pin and AddV could merge, if you did not mind not having a ReLoad choice on Adder values <> 1