P2 Smart Pins

ErNa · 2016-02-07 14:16

cgracey wrote: »

I wonder if 64 smart pins are going to make 16 cogs look like total overkill.

Isn't that intended? provrkll. Have a little, but a much of that. What will happen, when we crosslink the pins, creating analog signals that are to digital converted again and x-ored with some time delay? One quadrature signal counting up, the other down. A lot to do for some generations! Keep on going strong, Chip!

Heater. · 2016-02-07 14:16

In that case, perhaps what I mean to say is that maybe I agree with myself.

Rayman · 2016-02-07 16:09

Well, I always seem to be out of cogs with my P1 projects...
Rather have dumb pins and 16 cogs...
Although, there are a few of these smart pin modes that will be very useful to me.

On the other hand, if something has to give, I suppose 8 cogs would be enough.

potatohead · 2016-02-07 17:11

Seems to me, it's all about the fab. Didn't we have a software logic limit in play? If this has exceeded it, then what is worth what?

Say exceeding it takes money and more time. Is that worth 16 COGS, or not?

Personally, I would love to see it produced as currently designed. I do agree with Heater, 16 cores, plus pin minions? Holy cow! Will sure pack a punch on the spec sheet.

But, I want to see it produced more.

My .02

potatohead · 2016-02-07 17:15

Also, unlike the P1, the software chain and data is intact.

Technically, variants make more sense.

8 cogs, bigger RAM, 16 COGS, wtc...

If we commit and just make one, the others could be funded by that one, and other means.

No matter what, we should focus hard on making one.

jmg · 2016-02-07 20:45

potatohead wrote: »

Seems to me, it's all about the fab. Didn't we have a software logic limit in play?

Good point - did they already pay to increase that ?
The other limit is package die area, and that one is rather a harder ceiling, which could force trade-offs.
Is BGA viable ?

jmg · 2016-02-07 21:08

cgracey wrote: »

jmg wrote: »

Is it possible to 'cross-configure' two pin cells, both as Quadrature ? (uses 2 pins and 2 pin cells, one fast update, one slower delta)

Yes, you can do that.

Cool! I think that's a great demo of the pins - extract both position and velocity, all done in HW, and with no extra pins consumed.
Repeat as many times as you want! 31 anyone ?

Nothing else out there comes close to that. (bar FPGA)

Is that reach-nearby feature totally pin relative, or is it a binary-lsb effect, within a pin group ?
Seems Binary-LSB would need less logic & may be tolerable ?

cgracey wrote: »

Think of this, though: If you set up three encoders and six triangle PWMs, all with the same period, and start them on the same clock, you can use any one of them to trigger a single interrupt in which all three encoders can be read for speed and accumulated for position, and all six PWMs can be updated with next samples. Actually, it would be better to compute the PID loops and just wait for any of those pins' IN to go high, and then read and update everything. All with one cog in a simple loop.

Good example.

I just came across another practical variant, where Reciprocal Counting becomes important.

Normally, I am used to Quad encoders pushing MHz as the PPR goes up and spindle speeds go up, however another case occurs in an idler-wheel, such as in paper feeding.
Now, the Pulse rate is far lower, and the example I have the pulse rate is down around the Motor Step rate (which is also modest, both sub 100Hz).
For simplest control, you update at Motor-step rates, but if you want paper-speed, you have a variable number of encoder pulses, and you want to not just measure one, but over a (variable) few to better average.
Cleanest way to manage that, is to capture both Cycles and Time, and that nicely auto-scaled with every measurement. If 14 cycles fit, use those, if 7 fit, use that.
Precision is very high, even tho the cycles is low, and variable. (limited by encoder phase jitter)

At high MHz, I think being able to arm the capture matters, at low speeds a live-read can give the most-recent velocity, with best precision, and no waits.

A beauty of Reciprocal (dual) Capture is, you can read at almost any time/rate you like, provided you know Cycles and Time are same-edge based.

jmg · 2016-02-07 21:12

cgracey wrote: »

I wonder if 64 smart pins are going to make 16 cogs look like total overkill.

Maybe.
I think the COG count is rather locked to a binary number, by the eggbeater mux, is that right ?
12 might make sense, but I think the (next choice of?) 8 may look like no change from P1.

Perhaps what this opens is an 4 or 8 COG, 24 smart pins in a TQFP32/QFN32 package, as a future P2-100 sibling ?

potatohead · 2016-02-07 21:19

I'm hoping we don't have to make that tradeoff. (Crosses fingers)

Maxing out COGS and Smart pins would be crazy! And somebody will do it too.

jmg · 2016-02-07 21:36

potatohead wrote: »

I'm hoping we don't have to make that tradeoff. (Crosses fingers)

* Crosses toes too*

potatohead wrote: »

Maxing out COGS and Smart pins would be crazy! And somebody will do it too.

Hehe, yes.
Maxing out pins will be easy, some relatively low end FPGA T&M type designs can port to P2, and do that.
Maxing out COGS will take more effort.

Cluso99 · 2016-02-07 23:53

Chip,

Please keep 16 cogs and 512KB hub ram.

If it comes down to die space (seems like it has) then I would rather only 32 pins have smart pins.

evanh · 2016-02-08 00:35

I don't think Chip was trying to imply it's either or just yet, just that 16 Cogs won't end up being used very often because so much of the grunt bit bashing on the Prop1 can now be handled by the hardware on the Prop2. If space does permit I'd like to have that flexibility personally.

As for the gate count limit, 300k I think, of the synthesis software, Chip is well beyond it already, probably doubled at this stage. Dumping 8 Cogs may not even be enough to bring the design back within that limit. He was resigned to paying the extra for next grade up long ago.

evanh · 2016-02-08 00:40

For educational purposes, having spare Cogs will be a boon for learning bit bashing techniques without the burden of knowing the project will have to be redone to make it fit. And get to solve the problem at the lowest level without loading up someone else’s object nor having to get to grips with the additional complexities of Smartpins and Streamers.

cgracey · 2016-02-08 00:51

Judging from ALM usage, cutting the number of cogs from 16 to 8 would only reduce the logic by 25%. Better to stay with 16, in that case. It's just tight in 180nm.

Dave Hein · 2016-02-08 01:40

One advantage to reducing the number of cogs to 8 is that it would cut the number of hub stall cycles in half for random memory accesses. With the eggbeater design it will be very difficult to write tight loops that minimize hub stalls. In the P1 the hub slots are predictable, but with the P2 it will be much harder to determine when a hub slot occurs for random hub accesses.

With that said, I would still like to see a 16-cog P2 with 64 smart pins and 512K of hub RAM. However, an 8-cog P2 with 64 smart pins and 1M of hub RAM would be interesting if that were possible.

Bob Lawrence (VE1RLL) · 2016-02-08 01:41

re: It's just tight in 180nm.

I'm all for 16 cogs. However if we can afford more HUB memory then great and it would be a beast LOL

With 14 nanometer (14 nm) advanced manufacturing processes available is it cost that keeps us at 180nm. ?

You would think that at least 130 nanometer (130 nm) or (smaller) would be affordable by now.

Bob Lawrence (VE1RLL) · 2016-02-08 02:00

10 µm – 1971
6 µm – 1974
3 µm – 1977
1.5 µm – 1982
1 µm – 1985
800 nm – 1989
600 nm – 1994
350 nm – 1995
250 nm – 1997
180 nm – 1999
130 nm – 2001
90 nm – 2004
65 nm – 2006
45 nm – 2008
32 nm – 2010
22 nm – 2012
14 nm – 2014
10 nm – 2016–2017
7 nm – 2018–2019
5 nm – 2020–2021

https://en.wikipedia.org/wiki/Semiconductor_device_fabrication

msrobots · 2016-02-08 02:12

Bob Lawrence (VE1RLL) wrote: »

10 µm – 1971
6 µm – 1974
3 µm – 1977
1.5 µm – 1982
1 µm – 1985
800 nm – 1989
600 nm – 1994
350 nm – 1995
250 nm – 1997
180 nm – 1999
130 nm – 2001
90 nm – 2004
65 nm – 2006
45 nm – 2008
32 nm – 2010
22 nm – 2012
14 nm – 2014
10 nm – 2016–2017
7 nm – 2018–2019
5 nm – 2020–2021

https://en.wikipedia.org/wiki/Semiconductor_device_fabrication

Bob Lawrence (VE1RLL) wrote: »

re: It's just tight in 180nm.

I'm all for 16 cogs. However if we can afford more HUB memory then great and it would be a beast LOL

With 14 nanometer (14 nm) advanced manufacturing processes available is it cost that keeps us at 180nm. ?

You would think that at least 130 nanometer (130 nm) or (smaller) would be affordable by now.

Question is what you think affordable is. And was is affordable for Parallax. I think changing the process now would be - hmm - a stupid move.

Get the thing out and refine software and Tools support. Shrinking down to another process may be nice for another iteration, but not now.

just my two,

Mike

cgracey · 2016-02-08 02:24

Bob Lawrence (VE1RLL) wrote: »

10 µm – 1971
6 µm – 1974
3 µm – 1977
1.5 µm – 1982
1 µm – 1985
800 nm – 1989
600 nm – 1994
350 nm – 1995
250 nm – 1997
180 nm – 1999
130 nm – 2001
90 nm – 2004
65 nm – 2006
45 nm – 2008
32 nm – 2010
22 nm – 2012
14 nm – 2014
10 nm – 2016–2017
7 nm – 2018–2019
5 nm – 2020–2021

https://en.wikipedia.org/wiki/Semiconductor_device_fabrication

We need to see a graph of cost vs process technology. 350nm masks are currently $35k, while 45nm masks are about $1M, or ~30x cost for ~60x the density. I wonder what 14nm masks would cost.

evanh · 2016-02-08 02:26

As a point of interest, estimating the entire Prop2 gate count at 500k gates, including Smartpins, it will consume about 6.4mm2 which is a mere 25% of the 25.1mm2 real-estate that's needed for 512kB (4Mbit) HubRAM. See http://forums.parallax.com/discussion/comment/1340864/#Comment_1340864

What this means is if a larger HubRAM is expected of a finer grain then the die size won't shrink. Price of finished product goes up because of increased development costs.

cgracey · 2016-02-08 02:41

Here's some pricing info from ~2.5 years ago:

Scott gave me some ballpark numbers on the non-recurring engineering (NRE) costs associated with a new device. At 28 nm, the design cost is $50 million to $90 million. Each mask set is $2 million to $3 million. If you're lucky, you'll get away with going through the mask creation process twice to get the design done. Not lucky? Then more iteration.

Going down to 22/20 nm, the design cost is $120 million to $500 million, and a mask set costs $5 million to $8 million per iteration. Finer lines, higher cost.

Scott rightly notes that not many products can support that much NRE. Only ICs that end up in products that everyone buys can support this level of cost -- for example, products like cellphones. Scott said, "How many $1 chips do you need to sell to recoup $100 million? And then at these really high volumes, the margins are low. Very hard to make money."

At 180 nm, mask cost are around $150,000, and design cost is around $1 million to $2 million. So more companies pursue 180 nm -- or maybe down to 130 nm. We may see some of the companies that are mostly doing analog push down to 90 nm, but probably not much farther.

http://www.planetanalog.com/author.asp?section_id=385&doc_id=561004

Beau Schwabe · 2016-02-08 03:46

"We need to see a graph of cost vs process technology. 350nm masks are currently $35k, while 45nm masks are about $1M, or ~30x cost for ~60x the density. I wonder what 14nm masks would cost." - Chip G

Right ... 45nm is 60 times the density ...

Density = (350x350) / (45x45) = 60.49

So if the cost for 350nm is $35k, then at 60.49 times the density using 45nm that's a steal !!!

$2,117,283.95 for the equivalent number of IC's at 350nm vs $1,000,000.00 for the same number of IC's at 45nm.

While it is true that some devices don't scale such as Capacitors, Inductors, etc. in most cases it's only a matter of circuit design to utilize a smaller component.

Even at 180nm compared to 45nm ... 45nm is a steal...

Density = (180x180) / (45x45) = 16

16 x $150,000 = $2,400,000.00 .... Again 45nm with the equiv number of IC's would be $1,000,000.00

No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.

jmg · 2016-02-08 04:18

cgracey wrote: »

Here's some pricing info from ~2.5 years ago:

At 180 nm, mask cost are around $150,000, and design cost is around $1 million to $2 million. So more companies pursue 180 nm -- or maybe down to 130 nm.

In that ~2.5 year time-frame, has there been any 'next process step' type price movement,
ie can you now get 130nm masks for ~ $150,000 ?

Have OnSemi added ( or plan to add) anything finer than 180nm to their FABs ?

cgracey · 2016-02-08 04:35

Beau Schwabe wrote: »

No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.

If the volume is there, right? Starting out without that much money, you would have to use a cheaper process. That's where we have to start. It's sure interesting to contemplate those numbers, though.

jmg · 2016-02-08 04:42

Beau Schwabe wrote: »

No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.

Depends on what you consider 'design cost' ?
Iteration cost is all part of the cost of getting ship-able silicon, so that cannot be ignored.
It is that cost that varies wildly with process.

Cluso99 · 2016-02-08 04:50

I wonder about the costs of the next geometry sizing (130nm or 90nm).

But it also depends on Treehouse and OnSemi, or the alternatives.

Beau Schwabe · 2016-02-08 05:02

"Depends on what you consider 'design cost' ?
Iteration cost is all part of the cost of getting ship-able silicon, so that cannot be ignored.
It is that cost that varies wildly with process." - jmg

Agreed ! ... it also doesn't help changing a process in the middle of a design or designing in one process and going to production with another.

It must also be realized that there is a difference between technology and process. The terminology has been confused a little bit here I think. The technology for example is 180nm where the process is what recipe is used to reach a final design. With different processes, you run into a scenario where all apples are fruit, but not all fruit is an apple. Same goes for 180nm and weather you use TSMC, On-Semi, or Jack-b-Quick.

Roy Eltham · 2016-02-08 06:50

10nm is delayed (intel announced it recently) until 2017 at least. That pushes out the other ones after it too.

The other thing that comes with smaller processes is lower yields, much more power leakage, and often more complex power feeds.

I think it would be a huge mistake to drop cogs down to 8. I'd be more in favor of less pins if it's required.

Beau Schwabe · 2016-02-08 07:45

"The other thing that comes with smaller processes is lower yields, much more power leakage, and often more complex power feeds." - Roy

FINFET's are built vertically to minimize leakage at the smaller technologies by minimizing contact with the substrate.

http://www.realworldtech.com/intel-22nm-finfet/

ctwardell · 2016-02-08 08:20

This particular effort seems to be bound more by "The Yogi Berra Theorem of Repetition" than by Moore's Law...

I say this respectfully, yet from a point of growing frustration.

C.W.

P2 Smart Pins

Comments