I wonder if 64 smart pins are going to make 16 cogs look like total overkill.
Isn't that intended? provrkll. Have a little, but a much of that. What will happen, when we crosslink the pins, creating analog signals that are to digital converted again and x-ored with some time delay? One quadrature signal counting up, the other down. A lot to do for some generations! Keep on going strong, Chip!
Well, I always seem to be out of cogs with my P1 projects...
Rather have dumb pins and 16 cogs...
Although, there are a few of these smart pin modes that will be very useful to me.
On the other hand, if something has to give, I suppose 8 cogs would be enough.
Seems to me, it's all about the fab. Didn't we have a software logic limit in play? If this has exceeded it, then what is worth what?
Say exceeding it takes money and more time. Is that worth 16 COGS, or not?
Personally, I would love to see it produced as currently designed. I do agree with Heater, 16 cores, plus pin minions? Holy cow! Will sure pack a punch on the spec sheet.
Seems to me, it's all about the fab. Didn't we have a software logic limit in play?
Good point - did they already pay to increase that ?
The other limit is package die area, and that one is rather a harder ceiling, which could force trade-offs.
Is BGA viable ?
Is it possible to 'cross-configure' two pin cells, both as Quadrature ? (uses 2 pins and 2 pin cells, one fast update, one slower delta)
Yes, you can do that.
Cool! I think that's a great demo of the pins - extract both position and velocity, all done in HW, and with no extra pins consumed. Repeat as many times as you want! 31 anyone ?
Nothing else out there comes close to that. (bar FPGA)
Is that reach-nearby feature totally pin relative, or is it a binary-lsb effect, within a pin group ?
Seems Binary-LSB would need less logic & may be tolerable ?
Think of this, though: If you set up three encoders and six triangle PWMs, all with the same period, and start them on the same clock, you can use any one of them to trigger a single interrupt in which all three encoders can be read for speed and accumulated for position, and all six PWMs can be updated with next samples. Actually, it would be better to compute the PID loops and just wait for any of those pins' IN to go high, and then read and update everything. All with one cog in a simple loop.
Good example.
I just came across another practical variant, where Reciprocal Counting becomes important.
Normally, I am used to Quad encoders pushing MHz as the PPR goes up and spindle speeds go up, however another case occurs in an idler-wheel, such as in paper feeding.
Now, the Pulse rate is far lower, and the example I have the pulse rate is down around the Motor Step rate (which is also modest, both sub 100Hz).
For simplest control, you update at Motor-step rates, but if you want paper-speed, you have a variable number of encoder pulses, and you want to not just measure one, but over a (variable) few to better average.
Cleanest way to manage that, is to capture both Cycles and Time, and that nicely auto-scaled with every measurement. If 14 cycles fit, use those, if 7 fit, use that.
Precision is very high, even tho the cycles is low, and variable. (limited by encoder phase jitter)
At high MHz, I think being able to arm the capture matters, at low speeds a live-read can give the most-recent velocity, with best precision, and no waits.
A beauty of Reciprocal (dual) Capture is, you can read at almost any time/rate you like, provided you know Cycles and Time are same-edge based.
I wonder if 64 smart pins are going to make 16 cogs look like total overkill.
Maybe.
I think the COG count is rather locked to a binary number, by the eggbeater mux, is that right ?
12 might make sense, but I think the (next choice of?) 8 may look like no change from P1.
Perhaps what this opens is an 4 or 8 COG, 24 smart pins in a TQFP32/QFN32 package, as a future P2-100 sibling ?
Maxing out COGS and Smart pins would be crazy! And somebody will do it too.
Hehe, yes.
Maxing out pins will be easy, some relatively low end FPGA T&M type designs can port to P2, and do that.
Maxing out COGS will take more effort.
I don't think Chip was trying to imply it's either or just yet, just that 16 Cogs won't end up being used very often because so much of the grunt bit bashing on the Prop1 can now be handled by the hardware on the Prop2. If space does permit I'd like to have that flexibility personally.
As for the gate count limit, 300k I think, of the synthesis software, Chip is well beyond it already, probably doubled at this stage. Dumping 8 Cogs may not even be enough to bring the design back within that limit. He was resigned to paying the extra for next grade up long ago.
For educational purposes, having spare Cogs will be a boon for learning bit bashing techniques without the burden of knowing the project will have to be redone to make it fit. And get to solve the problem at the lowest level without loading up someone else’s object nor having to get to grips with the additional complexities of Smartpins and Streamers.
Judging from ALM usage, cutting the number of cogs from 16 to 8 would only reduce the logic by 25%. Better to stay with 16, in that case. It's just tight in 180nm.
One advantage to reducing the number of cogs to 8 is that it would cut the number of hub stall cycles in half for random memory accesses. With the eggbeater design it will be very difficult to write tight loops that minimize hub stalls. In the P1 the hub slots are predictable, but with the P2 it will be much harder to determine when a hub slot occurs for random hub accesses.
With that said, I would still like to see a 16-cog P2 with 64 smart pins and 512K of hub RAM. However, an 8-cog P2 with 64 smart pins and 1M of hub RAM would be interesting if that were possible.
We need to see a graph of cost vs process technology. 350nm masks are currently $35k, while 45nm masks are about $1M, or ~30x cost for ~60x the density. I wonder what 14nm masks would cost.
As a point of interest, estimating the entire Prop2 gate count at 500k gates, including Smartpins, it will consume about 6.4mm2 which is a mere 25% of the 25.1mm2 real-estate that's needed for 512kB (4Mbit) HubRAM. See http://forums.parallax.com/discussion/comment/1340864/#Comment_1340864
What this means is if a larger HubRAM is expected of a finer grain then the die size won't shrink. Price of finished product goes up because of increased development costs.
Scott gave me some ballpark numbers on the non-recurring engineering (NRE) costs associated with a new device. At 28 nm, the design cost is $50 million to $90 million. Each mask set is $2 million to $3 million. If you're lucky, you'll get away with going through the mask creation process twice to get the design done. Not lucky? Then more iteration.
Going down to 22/20 nm, the design cost is $120 million to $500 million, and a mask set costs $5 million to $8 million per iteration. Finer lines, higher cost.
Scott rightly notes that not many products can support that much NRE. Only ICs that end up in products that everyone buys can support this level of cost -- for example, products like cellphones. Scott said, "How many $1 chips do you need to sell to recoup $100 million? And then at these really high volumes, the margins are low. Very hard to make money."
At 180 nm, mask cost are around $150,000, and design cost is around $1 million to $2 million. So more companies pursue 180 nm -- or maybe down to 130 nm. We may see some of the companies that are mostly doing analog push down to 90 nm, but probably not much farther.
"We need to see a graph of cost vs process technology. 350nm masks are currently $35k, while 45nm masks are about $1M, or ~30x cost for ~60x the density. I wonder what 14nm masks would cost." - Chip G
Right ... 45nm is 60 times the density ...
Density = (350x350) / (45x45) = 60.49
So if the cost for 350nm is $35k, then at 60.49 times the density using 45nm that's a steal !!!
$2,117,283.95 for the equivalent number of IC's at 350nm vs $1,000,000.00 for the same number of IC's at 45nm.
While it is true that some devices don't scale such as Capacitors, Inductors, etc. in most cases it's only a matter of circuit design to utilize a smaller component.
Even at 180nm compared to 45nm ... 45nm is a steal...
Density = (180x180) / (45x45) = 16
16 x $150,000 = $2,400,000.00 .... Again 45nm with the equiv number of IC's would be $1,000,000.00
No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.
At 180 nm, mask cost are around $150,000, and design cost is around $1 million to $2 million. So more companies pursue 180 nm -- or maybe down to 130 nm.
In that ~2.5 year time-frame, has there been any 'next process step' type price movement,
ie can you now get 130nm masks for ~ $150,000 ?
Have OnSemi added ( or plan to add) anything finer than 180nm to their FABs ?
No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.
If the volume is there, right? Starting out without that much money, you would have to use a cheaper process. That's where we have to start. It's sure interesting to contemplate those numbers, though.
No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.
Depends on what you consider 'design cost' ?
Iteration cost is all part of the cost of getting ship-able silicon, so that cannot be ignored.
It is that cost that varies wildly with process.
"Depends on what you consider 'design cost' ?
Iteration cost is all part of the cost of getting ship-able silicon, so that cannot be ignored.
It is that cost that varies wildly with process." - jmg
Agreed ! ... it also doesn't help changing a process in the middle of a design or designing in one process and going to production with another.
It must also be realized that there is a difference between technology and process. The terminology has been confused a little bit here I think. The technology for example is 180nm where the process is what recipe is used to reach a final design. With different processes, you run into a scenario where all apples are fruit, but not all fruit is an apple. Same goes for 180nm and weather you use TSMC, On-Semi, or Jack-b-Quick.
Comments
Rather have dumb pins and 16 cogs...
Although, there are a few of these smart pin modes that will be very useful to me.
On the other hand, if something has to give, I suppose 8 cogs would be enough.
Say exceeding it takes money and more time. Is that worth 16 COGS, or not?
Personally, I would love to see it produced as currently designed. I do agree with Heater, 16 cores, plus pin minions? Holy cow! Will sure pack a punch on the spec sheet.
But, I want to see it produced more.
My .02
Technically, variants make more sense.
8 cogs, bigger RAM, 16 COGS, wtc...
If we commit and just make one, the others could be funded by that one, and other means.
No matter what, we should focus hard on making one.
The other limit is package die area, and that one is rather a harder ceiling, which could force trade-offs.
Is BGA viable ?
Repeat as many times as you want! 31 anyone ?
Nothing else out there comes close to that. (bar FPGA)
Is that reach-nearby feature totally pin relative, or is it a binary-lsb effect, within a pin group ?
Seems Binary-LSB would need less logic & may be tolerable ?
Good example.
I just came across another practical variant, where Reciprocal Counting becomes important.
Normally, I am used to Quad encoders pushing MHz as the PPR goes up and spindle speeds go up, however another case occurs in an idler-wheel, such as in paper feeding.
Now, the Pulse rate is far lower, and the example I have the pulse rate is down around the Motor Step rate (which is also modest, both sub 100Hz).
For simplest control, you update at Motor-step rates, but if you want paper-speed, you have a variable number of encoder pulses, and you want to not just measure one, but over a (variable) few to better average.
Cleanest way to manage that, is to capture both Cycles and Time, and that nicely auto-scaled with every measurement. If 14 cycles fit, use those, if 7 fit, use that.
Precision is very high, even tho the cycles is low, and variable. (limited by encoder phase jitter)
At high MHz, I think being able to arm the capture matters, at low speeds a live-read can give the most-recent velocity, with best precision, and no waits.
A beauty of Reciprocal (dual) Capture is, you can read at almost any time/rate you like, provided you know Cycles and Time are same-edge based.
I think the COG count is rather locked to a binary number, by the eggbeater mux, is that right ?
12 might make sense, but I think the (next choice of?) 8 may look like no change from P1.
Perhaps what this opens is an 4 or 8 COG, 24 smart pins in a TQFP32/QFN32 package, as a future P2-100 sibling ?
Maxing out COGS and Smart pins would be crazy! And somebody will do it too.
Hehe, yes.
Maxing out pins will be easy, some relatively low end FPGA T&M type designs can port to P2, and do that.
Maxing out COGS will take more effort.
Please keep 16 cogs and 512KB hub ram.
If it comes down to die space (seems like it has) then I would rather only 32 pins have smart pins.
As for the gate count limit, 300k I think, of the synthesis software, Chip is well beyond it already, probably doubled at this stage. Dumping 8 Cogs may not even be enough to bring the design back within that limit. He was resigned to paying the extra for next grade up long ago.
With that said, I would still like to see a 16-cog P2 with 64 smart pins and 512K of hub RAM. However, an 8-cog P2 with 64 smart pins and 1M of hub RAM would be interesting if that were possible.
I'm all for 16 cogs. However if we can afford more HUB memory then great and it would be a beast LOL
With 14 nanometer (14 nm) advanced manufacturing processes available is it cost that keeps us at 180nm. ?
You would think that at least 130 nanometer (130 nm) or (smaller) would be affordable by now.
6 µm – 1974
3 µm – 1977
1.5 µm – 1982
1 µm – 1985
800 nm – 1989
600 nm – 1994
350 nm – 1995
250 nm – 1997
180 nm – 1999
130 nm – 2001
90 nm – 2004
65 nm – 2006
45 nm – 2008
32 nm – 2010
22 nm – 2012
14 nm – 2014
10 nm – 2016–2017
7 nm – 2018–2019
5 nm – 2020–2021
https://en.wikipedia.org/wiki/Semiconductor_device_fabrication
Question is what you think affordable is. And was is affordable for Parallax. I think changing the process now would be - hmm - a stupid move.
Get the thing out and refine software and Tools support. Shrinking down to another process may be nice for another iteration, but not now.
just my two,
Mike
We need to see a graph of cost vs process technology. 350nm masks are currently $35k, while 45nm masks are about $1M, or ~30x cost for ~60x the density. I wonder what 14nm masks would cost.
What this means is if a larger HubRAM is expected of a finer grain then the die size won't shrink. Price of finished product goes up because of increased development costs.
http://www.planetanalog.com/author.asp?section_id=385&doc_id=561004
Right ... 45nm is 60 times the density ...
Density = (350x350) / (45x45) = 60.49
So if the cost for 350nm is $35k, then at 60.49 times the density using 45nm that's a steal !!!
$2,117,283.95 for the equivalent number of IC's at 350nm vs $1,000,000.00 for the same number of IC's at 45nm.
While it is true that some devices don't scale such as Capacitors, Inductors, etc. in most cases it's only a matter of circuit design to utilize a smaller component.
Even at 180nm compared to 45nm ... 45nm is a steal...
Density = (180x180) / (45x45) = 16
16 x $150,000 = $2,400,000.00 .... Again 45nm with the equiv number of IC's would be $1,000,000.00
No matter how you slice it, design cost for ANY process is going to be about the same $1 million to $2 million.
ie can you now get 130nm masks for ~ $150,000 ?
Have OnSemi added ( or plan to add) anything finer than 180nm to their FABs ?
If the volume is there, right? Starting out without that much money, you would have to use a cheaper process. That's where we have to start. It's sure interesting to contemplate those numbers, though.
Depends on what you consider 'design cost' ?
Iteration cost is all part of the cost of getting ship-able silicon, so that cannot be ignored.
It is that cost that varies wildly with process.
But it also depends on Treehouse and OnSemi, or the alternatives.
Iteration cost is all part of the cost of getting ship-able silicon, so that cannot be ignored.
It is that cost that varies wildly with process." - jmg
Agreed ! ... it also doesn't help changing a process in the middle of a design or designing in one process and going to production with another.
It must also be realized that there is a difference between technology and process. The terminology has been confused a little bit here I think. The technology for example is 180nm where the process is what recipe is used to reach a final design. With different processes, you run into a scenario where all apples are fruit, but not all fruit is an apple. Same goes for 180nm and weather you use TSMC, On-Semi, or Jack-b-Quick.
The other thing that comes with smaller processes is lower yields, much more power leakage, and often more complex power feeds.
I think it would be a huge mistake to drop cogs down to 8. I'd be more in favor of less pins if it's required.
FINFET's are built vertically to minimize leakage at the smaller technologies by minimizing contact with the substrate.
http://www.realworldtech.com/intel-22nm-finfet/
I say this respectfully, yet from a point of growing frustration.
C.W.