We're looking at 5 Watts in a BGA!

jmg · 2014-04-05 16:00

cgracey wrote: »

We have a huge need for MORE power pins, no matter what we do next.

By reducing the Prop2 to 64 I/O's, we could get there in a 14x14mm, 100-pin Exposed-Pad TQFP. This package has a Tja of 20 and a huge 10.3 x 10.3mm die pad that would accommodate down-bonds for all the GND connections, freeing up pins for all the VDD connections that we need. This would work perfectly:

http://www.amkor.com/go/packaging/all-packages/exposedpad-lqfp-/-tqfp/exposedpad-lqfp-/-tqfp

With 28 fewer I/O's the C port could become another D port with internal connections between cogs. This would leave about 20 pins after an SDRAM hookup.

Whatever chip we make next, this package is the way to go. It lets our die be whatever size it needs to be and lets us make all our GND connections directly to the exposed pad on the bottom.

Attachment not found.

Yes, I agree Thermal Pad size is king here.

Note that the 14x14 with 10.3mm Thermal, probably does come in 128 pins (or could do with minimal tooling)
The images suggest two frames tooling, so it may be as simple as a PAD-frame and lead frame.

Amkor also mention a 16 x 16 mm 144 pin package, (0.4mm?) but do not show their thermal pad choices on that one.

That sits nicely between 14x14 and 14x20 or 20x20 and allows a larger copper-spreader under the device, before the thermal vias.

KeithE · 2014-04-05 16:00

cgracey wrote: »

There will also be an increase of area due to BIST

So they going to put in both logic and memory BIST for you? This sounds like good news to me. It should save you a lot of time developing ATE tests and fault grading functional vectors. It's also good for customer engagements.

rjo__ · 2014-04-05 16:01

Chip,

I find it fascinating, that when you ask if people want a faster P1 with more memory, they say yesI do too, so that is no surprise,
Would I be disappointed if the P2 stayed in FPGA form for a while not at all. It might take me five years to figure out:)

But when given a choice between a P2 that is roughly twice the speed with more memory, using half the power another P2. the basic answer is
we want 8 Cogs, so that we can turn 4 of them off.

I understand that this is a huge mischaracterization of anything that anyone has saidand it isn't directed at anyone, that is just how it strikes me:)

I love the conversation. Thank you for inviting us in.

Rich

RossH · 2014-04-05 16:06

cgracey wrote: »

Argh! Just when I make up my mind!

We could yet go either way. I feel like the package issue is resolved, though.

Added: As Prop1-types go, I'd rather do 16 than 32 cogs, too.

This is only my interpretation. but I think one thing the consensus thread is telling Parallax is that there are still a lot of people with a lot of interest in the current P1 architecture, who could just use a bit more "oomph". The P2 is too far removed (both in a technical sense and in its likely timing) to fill the gap between the original P1 and their current needs. That's not to say the P1 doesn't have a life ahead of it yet, just that the niche for which it is an appropriate solution is getting smaller, and may disappear altogether before a workable P2 finally arrives.

From my own perspective (again, this has not been discussed on that thread) I find the idea of a 4-cog version of the P2 less than compelling. It just doesn't seem to have enough flexibility, apart for use as a high-level language execution engine - for which we already have many other alternatives. To me, this chip feels like a stopgap on the way to the "real" P2. It seems mainly to be under consideration because it happens to fit the power and packaging envelope, and not because it is what people really wanted out of the P2. The P16X32B seems to be closer to that.

Ross.

Rayman · 2014-04-05 16:10

Can we put ROM in any extra space?

jmg · 2014-04-05 16:11

RossH wrote: »

From my own perspective (again, this has not been discussed on that thread) I find the idea of a 4-cog version of the P2 less than compelling. It just doesn't seem to have enough flexibility, apart for use as a high-level language execution engine - for which we already have many other alternatives.

Even without knowing the RAM on 4 COG P2 ?

Keep in mind, there is always this solution

cgracey wrote:

Anything is possible, even making Prop1 cogs alongside Prop2 cogs, but that would take more time to develop.

Note that P1E is far from 'instant pudding', it needs to draw on P2 work, and be FPGA proven, not to mention Power simulations too..

Cluso99 · 2014-04-05 16:12

Chip,
Just had a little more time to digest the 4 cog P2.

I have been commercially designing micro interfaces (and programming them) for almost 40 years now. I am quite happy with handling interrupts but I came to the Prop P1 because I didn't have to worry about those and I could just add simple program blocks and drivers in its own cog.

A 4 cog P2 makes this a non-starter because the cog drivers become complex because multi-tasking in cogs will be a requirement.

Therefore, (without the existing P1) I would go back to using an ARM or other micro with interrupts over grasping with multi-tasking. Remember, most new users will find this challenging, and prefer to stick to what they know.

I am not after a P2 for the sake of it. I want it to be a commercial success. A 4 cog P2 doesn't inspire me at all.

IMHO, the P16X32 can be done now. Consider doing that while you think about how to reduce the P2 power and give us plenty of useful cogs.

Just my 2c.

jmg · 2014-04-05 16:12

Rayman wrote: »

Can we put ROM in any extra space?

Do you mean real ROM, or RAM patched as ROM ?
I think more RAM is far more useful than more ROM, and ROM can have bugs....

RossH · 2014-04-05 16:19

jmg wrote: »

Even without knowing the RAM on 3 COG P2 ?

This is only an issue for those who want to run Linux or suchlike. For most microcontroller applications we know 32K is a bit shy, and 64K is likely to be limiting as well ... but that 256K or 512K would be ample.

jmg wrote: »

Keep in mind, there is always this solution

A chimera.

jmg wrote: »

Note that P1E is far from 'instant pudding', it needs to draw on P2 work, and be FPGA proven, not to mention Power simulations too..

The advantage of the P16X32B is that we know it can be developed in a finite timeframe, we know it will work, and we know it will sell. You can't claim any of these is true for the P2.

rjo__ · 2014-04-05 16:32

Ross,

That is the conundrum. And the one thing no-one would want to do on the way to the eventual P2 is lose people along the way.
As an amateur, I want the P2 so that I can do things that I can't presently do with the P1. I could get a lot of bang for the buck with a new P1 and
with practically no learning curve. My best guess is that the things I can't do now, I probably couldn't do with the new P1 either.
I really don't care… I would want the new P1 for the memory and the speed bump.

One issue I see is that a new P1 would rather quickly replace the old P1. I have no idea what that would do for total sales or what the transition to a new P1 would do to the bottom line.
I don't think the P2 would have a short term impact on P1 sales at all unless it (on-net) drives developers away.

I have no way of knowing if there is a broad based urgency for a new p1 or if there are just a few developers who are at their limit.

rogloh · 2014-04-05 16:34

Not sure if this came up but one further benefit of the 4 COG P2 is that it implies hub accesses could be made 1:4 which would automatically give all hubexec code twice as much hub bandwidth which could improve performance during the cache misses. This is a nice gain. But having 6 COGs if they could fit size/power wise would be nicer, 4 still seems rather tight.

jmg · 2014-04-05 16:37

RossH wrote: »

This is only an issue for those who want to run Linux or suchlike.

Nope, there are significant LCD and video applications opened with more RAM.

cgracey · 2014-04-05 16:37

jmg wrote: »

A 4 COG P2 will allow a big jump from the 256k memory, (waiting on that number)

Actually, the die will still be pretty big, even with 4 cogs and 256KB RAM.

The difference between Prop2 and Prop1 is that Prop2 has tons of DSP capability and lots of creature comforts, whereas Prop1 is more bare-bones. The two chips we've been talking about are very different animals. Each has a claim to existence.

jmg · 2014-04-05 16:38

rogloh wrote: »

But having 6 COGs if they could fit size/power wise would be nicer, 4 still seems rather tight.

5 is also a solution. We need to see the RAM combinations, with COG count, to make a better choice.

jmg · 2014-04-05 16:40

cgracey wrote: »

Actually, the die will still be pretty big, even with 4 cogs and 256KB RAM.

Sure but there must be a P2-COG => Extra Memory Available figure, for the same dies size we had before.

RossH · 2014-04-05 16:41

cgracey wrote: »

The two chips we've been talking about are very different animals. Each has a claim to existence.

Of course. It's just a matter of which one we can have now, and which one needs more time to incubate.

Ross.

rogloh · 2014-04-05 16:43

jmg wrote: »

5 is also a solution. We need to see the RAM combinations, with COG count, to make a better choice.

Yeah 5 is better than 4. What I like is that the P2 COGs just have so much more usable bandwidth/performance with hubexec and tasking/threading compared to the P1 or its proposed variant. P1 was so memory bandwidth constrained.

Bill Henning · 2014-04-05 16:55

Ross,

A poll, in a thread called "Consensus on the P16X32B", comes up strongly in favor of the P16X32B???

I just about fell out of my chair in surprise.

(sorry about the sarcasm, but that is as stacked and invalid poll as one can make)

RossH wrote: »

All,

Over in the "consensus" thread we now have 34 in favor of developing a P16X32B and 2 against. Opinion also seems to favor the simpler (16-cog) variant, and more compatibility with the P1.

Ross.

rogloh · 2014-04-05 16:57

I just put my name down in the other thread against, so its 34 to 3 now....

Bill Henning · 2014-04-05 16:59

As the next device, I'd be perfectly happy with a four cog P2 with 256KB hub

I'll add timing comparisons for it in my performance thread, however given that a single P2x8 cog outperformed a single P16X32 cog by a factor of 10 with hubexec, I am happy.

I am not as happy about the pin limits, but I do understand them. I guess Morpheus 2 will have to use two P2x4's.

Unlike others, I am perfectly happy packing four drivers in one cog, so a P2x4 with tasks is actually still more powerful than a P16E32. And at least 10 times faster for running C code.

cgracey wrote: »

Anything is possible, even making Prop1 cogs alongside Prop2 cogs, but that would take more time to develop.

I'm going to make a 4-cog Prop2 file set for OnSemi to analyze. This will also run on the DE2-115.

In thinking about all the neat things that the Prop2 can do, it seems that we need to build it first, for maximum bang. All of your technical and marketing analyses here have been helpful in realizing that.

Note to Bill: We could use a bigger package and get more pins, but the die growth would be too much for separate SDRAM pins. Large dies not only cost more, but they yield lower. We need to keep this die around 7x7mm, or smaller. We can do that with 4 cogs and 64 I/O's. We'd have 28 fewer I/O's than the current 92. This would leave about 21 pins free for a single SDRAM chip.

So, Prop2 development will continue, but with four cogs, for now.

Bill Henning · 2014-04-05 17:01

In all fairness, please visit MY performance comparison thread at

http://forums.parallax.com/forumdisplay.php/97-Propeller-2-Multicore-Microcontroller

where I have actual performance data, with source code, methodology, and math you can check, unlike other threads.

cgracey wrote: »

Argh! Just when I make up my mind!

We could yet go either way. I feel like the package issue is resolved, though.

Added: As Prop1-types go, I'd rather do 16 than 32 cogs, too.

Bill Henning · 2014-04-05 17:08

Ray,

Set the mapping so each task gets 128 longs. Presto, 4 baby cogs.

With 4 cogs, each cog gets hub access every 16 cycles, so the four tasks, if they needed it, can get a hub cycle every 16 clocks.

P1E16 gets 1 long every 16 clocks (8 instructions), 0.5 bytes per instruction.

P2x4 gets 8 longs every 4 clocks, 16 bytes per instruction.

32 times the hub bandwidth.

Divide by four tasks.

Each task can get 8x the hub bandwidth a P16X32 cog can.

Writing drivers for tasks is NOT the disaster you make it out to be; for things like serial etc it is practically identical to separate cogs.

P1E16 cannot do high resolution video

P2x4 easily can do HD.

Cluso99 wrote: »

Chip,
Just had a little more time to digest the 4 cog P2.

I have been commercially designing micro interfaces (and programming them) for almost 40 years now. I am quite happy with handling interrupts but I came to the Prop P1 because I didn't have to worry about those and I could just add simple program blocks and drivers in its own cog.

A 4 cog P2 makes this a non-starter because the cog drivers become complex because multi-tasking in cogs will be a requirement.

Therefore, (without the existing P1) I would go back to using an ARM or other micro with interrupts over grasping with multi-tasking. Remember, most new users will find this challenging, and prefer to stick to what they know.

I am not after a P2 for the sake of it. I want it to be a commercial success. A 4 cog P2 doesn't inspire me at all.

IMHO, the P16X32 can be done now. Consider doing that while you think about how to reduce the P2 power and give us plenty of useful cogs.

Just my 2c.

jmg · 2014-04-05 17:09

rogloh wrote: »

Yeah 5 is better than 4. What I like is that the P2 COGs just have so much more usable bandwidth/performance with hubexec and tasking/threading compared to the P1 or its proposed variant. P1 was so memory bandwidth constrained.

Yes, but there are ways to manage Power envelope and memory bandwidth - see my plan here :
http://forums.parallax.com/showthread.php/155091-P1-vs-P1E32-vs-P1E16-vs-P2-cycle-counting-benchmark-comparison!?p=1256524&viewfull=1#post1256524

(would work in either of P1E and P2)

At 180nm, designs look to be Power Envelope dictated. ( Chip has worked the speed up )

Bill Henning · 2014-04-05 17:10

Heater,

NO.

AUX is needed for:

- CLUT for video out
- FAST stack
- LIFO, FIFO

Matter of fact, if we go to 4 P2 cogs, it would be great to increase it to 512 longs.

Pre-emptive threading lets us run basically unlimited hubexec threads in one cog. With the cogs being trimmed back to four, we need it. It does not impact the heat envelope.

Heater. wrote: »

Chip,

I'm pretty much convinced that the 4 COG PII is the way to go.
Bill's performance calculations based on 4 COGs would look even better due increased COG/HUB bandwidth.
The threads means there is less demand for actual COGs.
Less COG's presumably means more space for RAM. A huge win assuming the logic can be easily adapted to it.

I have my reservations about it's hundreds of instructions but you have sweated blood to get to where it is and every step along the way has been convincingly justified so it can't be all nuts. Perhaps trimming out some features like the AUX RAM, or whatever it is called now, and preemptive threading is worth a look at.

Hopefully a process shrink will become possible at some point.

RossH · 2014-04-05 17:15

Bill Henning wrote: »

Ross,

A poll, in a thread called "Consensus on the P16X32B", comes up strongly in favor of the P16X32B???

I just about fell out of my chair in surprise.

(sorry about the sarcasm, but that is as stacked and invalid poll as one can make)

I don't get it. You voted No, so have two others. What's the problem?

RossH · 2014-04-05 17:17

Bill Henning wrote: »

In all fairness, please visit MY performance comparison thread at

http://forums.parallax.com/forumdisplay.php/97-Propeller-2-Multicore-Microcontroller

where I have actual performance data, with source code, methodology, and math you can check, unlike other threads.

Your performance thread is good - it shows a P16X32B would perform between 2 and 5 times faster than a P1. I think we could count on a four-fold performance improvement for many applications.

Ross.

mindrobots · 2014-04-05 17:18

@Rossh, I keep seeing you using words like "now", "finite timeframe", "real soon" when referring to a P1 variant...can you put less abstract dates and imeframes to thoss terms?

Now means you can buy one in a month. Real soon means you can buy one in 3 to 4 months. Finite timeframe means Parallax can announce the firm target date we can purchase it in a week or so.

Easy means you can pull the existing verilog blocks together, build it, emulate it, test it, synthesize it and send it off to the shuttle without surprises or designing any new verilog beyond glue. Easy means there is an existing I/O frame that has been in production before. Easy means it's a package that's been used. Easy means there aren't questions about the pin layout being backward compatible with P1 or forward compatible with P2. I don't see much of this being easy or now or finite.

RossH · 2014-04-05 17:21

mindrobots wrote: »

@Rossh, I keep seeing you using words like "now", "finite timeframe", "real soon" when referring to a P1 variant...can you put less abstract dates and imeframes to thoss terms?

Now means you can buy one in a month. Real soon means you can buy one in 3 to 4 months. Finite timeframe means Parallax can announce the firm target date we can purchase it in a week or so.

Easy means you can pull the existing verilog blocks together, build it, emulate it, test it, synthesize it and send it off to the shuttle without surprises or designing any new verilog beyond glue. Easy means there is an existing I/O frame that has been in production before. Easy means it's a package that's been used. Easy means there aren't questions about the pin layout being backward compatible with P1 or forward compatible with P2. I don't see much of this being easy or now or finite.

I believe Ken said "two months". Of course, Chip is the only one who can really tell us.

potatohead · 2014-04-05 17:22

The assumption that we could get EITHER chip in 2 months is a very weak assumption.

Ken Gracey · 2014-04-05 17:26

RossH wrote: »

I believe Ken said "two months". Of course, Chip is the only one who can really tell us.

I'd never suggest it takes us two months, ever.

Ken Gracey

We're looking at 5 Watts in a BGA!

Comments