We're looking at 5 Watts in a BGA!

RossH · 2014-04-04 16:36

jmg wrote: »

Strange contradiction, in engineering, the 'pudding' is the numbers. (Which you admit we do not actually have yet).

I think Cluso's point was that you can crunch numbers as much as you like - but the best estimates by the best qualified are currently the OnSemi numbers, so if your calculations don't match these numbers, then you are just kidding yourself.

Ross.

jmg · 2014-04-04 17:05

RossH wrote: »

I think Cluso's point was that you can crunch numbers as much as you like - but the best estimates by the best qualified are currently the OnSemi numbers, so if your calculations don't match these numbers, then you are just kidding yourself.

I fully agree that until OnSemi Power Simulation numbers arrive for the P1, there is no hard information on P1 Power levels at 180nm. That's exactly what I was saying.

cgracey · 2014-04-04 17:30

RossH wrote: »

Hi Chip,

Going back through this thread, it is not surprising that this post (by you) looks like the most practical and thoroughly thought out proposal (even including pricing!) that has been suggested. It also happens to be very close to what many people originally wanted out of the P2.

One thing this thread shows is that while there is a real hunger for the "bells and whistles" P2, there is also a significant pent-up demand for a simpler P1X32B. And I don't think the one undermines the other. In fact, I think a chip like the P1X32B could both help Parallax fund the P2, and also significantly enhance interest in the P2 when it does finally arrive, by rekindling interest in symmetrical multiprocessing microcontrollers (the kind of interest the P1 had initially, and which got many of us here interested in the first place!).

If you believe that this P1X32B would be low technical risk, quick and easy to make happen, and (importantly) not delay the work on the P2 - i.e. if it could be done during the current hiatus in P2 development - then why not continue to develop the P2 using normal Parallax funding, but develop the P132X16 using some kind of reward-based (not equity-based) crowd-funding model. This would make it cost-neutral to Parallax.

Just set up a crowdsourcing site, specifying the target that we would need to reach make it happen, and - importantly - a date by which the target would have to be achieved to avoid impacting the P2 development, and see what happens.

Let people vote with their wallets. I'd be willing to commit some $$$ to it, and I'm sure there are many others who would also.

If the target is reached, the chip goes ahead.

Ross.

This would be a very simple chip to complete. It would be kind of a relief, for now.

Prop2 still needs a serdes and some thorough testing. I'd hate to have another false start.

I could see making a 32-cog 3200 MIPS Prop1 w/64 analog I/Os and 512KB RAM for now and then continue to get Prop2 ready for a smaller process.

If I felt some consensus here about doing that, it would be pretty easy to attack that project in short order.

RossH · 2014-04-04 17:40

cgracey wrote: »

If I felt some consensus here about doing that, it would be pretty easy to attack that project in short order.

So, no chance at all then?

rogloh · 2014-04-04 17:44

cgracey wrote: »

This would be a very simple chip to complete. It would be kind of a relief, for now.

Prop2 still needs a serdes and some thorough testing. I'd hate to have another false start.

I could see making a 32-cog 3200 MIPS Prop1 w/64 analog I/Os and 512KB RAM for now and then continue to get Prop2 ready for a smaller process.

If I felt some consensus here about doing that, it would be pretty easy to attack that project in short order.

My vote would be to continue to exhaust considering all your P2 options before reverting to a P1 variant.

Phil Pilgrim (PhiPi) · 2014-04-04 17:48

Chip,

1. Would the sixteen cogs share one hub or two? (I'm still thinking two hubs with mailbox comms between them might provide the most throughput.)
2. Would all cogs have access to all pins? Analog, too?
3. If so, are there any scaling issues that could bite?
4. Again, if so, are we back to using the carry flag to discriminate between the two waitpeq/pne instructions?

Having 32 counters would be great, and I wouldn't mind so much that there are still only two per cog in that case. But please keep all the counter registers fully visible without special instructions.

For the record, I'd be in favor of a P1 variant ahead of the P2. I think the P2 needs some time off before diving headlong into the thicket again. I don't see the P1+ as a punt, necessarily, just a gap filler and something to keep the market interested while the P2 gestates.

Thanks,
-Phil

potatohead · 2014-04-04 17:55

I want what will best fund our futures.

mindrobots · 2014-04-04 17:57

cgracey wrote: »

This would be a very simple chip to complete. It would be kind of a relief, for now.

Prop2 still needs a serdes and some thorough testing. I'd hate to have another false start.

I could see making a 32-cog 3200 MIPS Prop1 w/64 analog I/Os and 512KB RAM for now and then continue to get Prop2 ready for a smaller process.

If I felt some consensus here about doing that, it would be pretty easy to attack that project in short order.

Let's all think of this as the Prop2 NOT a Prop1 variant.

A Prop1 with the B port is a Prop1 variant.

What Chip is proposing is closer to the Prop2 we all started shuttle runs with than what is currently being called a Prop2

So let's call Chip's new proposal the Prop2. I'd still like 'relatively easy' and 'quick to make' quantified. Also some idea of a release date would be nice.

Which brings us all to teh mythical beast that we have been chasing through the woods and tacking features onto every time we corner it. This chip is SURELY the Propeller3 and it has grown to the point that for it to live in anything otehr than a emulated world, it NEEDS 65nm silicon. So, it IS the Prop3 and should be the follow-on to the Prop2 (32 cog, 64 pin, 512kb chip from above).

Chip needs to set teh design specs for the Prop2, design it, put it into an FPGA for a bit of testing and then send it off to be made. No feature creep, no "but if only it had....", no changes unless Chip comes out and says, "Guys, I just added this feature." If this does not happen, it will bloat its way into mythology just like the P3 did. It has already started to bloat and the features have already started creeping in the two days that it has been discussed as a possibility.

Chip, it is your vision. The P3 vision became really amazing through "forum think" but we also killed it. That can't happen again to the new Prop2 (32x64x512)!

Phil Pilgrim (PhiPi) · 2014-04-04 18:01

mindrobots wrote:

Which brings us all to teh mythical beast that we have been chasing through the woods and tacking features onto every time we corner it.

Nice metaphor and certainly more tactfully put than I was capable of!

-Phil

mindrobots · 2014-04-04 18:06

Phil Pilgrim (PhiPi) wrote: »

Nice metaphor and certainly more tactfully put than I was capable of!

-Phil

Haha! It goes well with your "diving into the thickets" which I hadn't even seen yet!

Roy Eltham · 2014-04-04 18:26

I want to add my vote to the pile for doing a P1B (or whatever) in the short term and getting the P2 done right later. Assuming Parallax feels that is a viable business solution.

dr hydra · 2014-04-04 18:33

I vote for the p1b...it sounds awesome...512 ram...more cogs...short turn around time.

David Betz · 2014-04-04 18:33

Chip mentioned the possibility that hub execution would be part of this enhanced P1 he described message #564. It seems that would require lots of changes to P1. Unless you only allow indirect branches you need the JMP instructions with 16 bit absolute addresses as well as a wider PC. In fact, 16 bits isn't enough if there is 512k of hub memory. You also need the logic that handles COG and hub addresses differently depending on the high order bits of the address. Is all of this likely to start consuming too much power like the full P2 did? Is there a way to add hub execution in a simpler way than was done in P2 but still have it be useful? I guess you could do without hardware stacks if you just have a jmp instruction that puts its return address in a register like the P2 LINK instruction does. The stack could then be implemented in software with SUB #4, WRLONG and RDLONG, ADD #4 instead of PUSH and POP. I guess the biggest problem is how to handle the 17 bit addresses in JMP and CALL.

RossH · 2014-04-04 18:34

All,

I have started a new thread specifically to gauge consensus on the P16X32B (i.e. some kind of P1 variant rather than a P2) suggested by Chip earlier in this thread. See here.

Ross.

Peter Jakacki · 2014-04-04 18:48

cgracey wrote: »

This would be a very simple chip to complete. It would be kind of a relief, for now.

Prop2 still needs a serdes and some thorough testing. I'd hate to have another false start.

I could see making a 32-cog 3200 MIPS Prop1 w/64 analog I/Os and 512KB RAM for now and then continue to get Prop2 ready for a smaller process.

If I felt some consensus here about doing that, it would be pretty easy to attack that project in short order.

You have my vote for this proposal, wholeheartedly, but please, refer to this as the P2 which we originally wanted, otherwise a Superfortress is really just a B17
"B" variant. Your proposal quadruples the cogs, x20 speed, etc. This is no variant, it still has "Propellers" like the original but it is a very different beast, as is the still in-flux serdesless nextgen P3 design which when it is finalized and fabricated will be very warmly welcomed.

jmg · 2014-04-04 19:03

cgracey wrote: »

This would be a very simple chip to complete. It would be kind of a relief, for now.

Prop2 still needs a serdes and some thorough testing. I'd hate to have another false start.

I could see making a 32-cog 3200 MIPS Prop1 w/64 analog I/Os and 512KB RAM for now and then continue to get Prop2 ready for a smaller process.

If I felt some consensus here about doing that, it would be pretty easy to attack that project in short order.

It would need Sim Power figures done asap on the OnSemi, to get a proper handle on where the power envelope is.

What is the memory equivalent of a P1 cog ?

32 GOGs and 512K sounds skewed, and better could be 16/20/24 COGS and 1M RAM, as that would open the LCD driver replacement business market and there are a LOT of small LCD made every year.

Note that I think a serious LCD chip really has to have QuadSPI support in Silicon, to slash the otherwise high power cost of bit-banging, and improve the COG bandwidth.

That is also a growing industry standard bus, that other micros can talk to this one over.

rogloh · 2014-04-04 19:09

David Betz wrote: »

Chip mentioned the possibility that hub execution would be part of this enhanced P1 he described message #564. It seems that would require lots of changes to P1. Unless you only allow indirect branches you need the JMP instructions with 16 bit absolute addresses as well as a wider PC. In fact, 16 bits isn't enough if there is 512k of hub memory. You also need the logic that handles COG and hub addresses differently depending on the high order bits of the address. Is all of this likely to start consuming too much power like the full P2 did? Is there a way to add hub execution in a simpler way than was done in P2 but still have it be useful? I guess you could do without hardware stacks if you just have a jmp instruction that puts its return address in a register like the P2 LINK instruction does. The stack could then be implemented in software with SUB #4, WRLONG and RDLONG, ADD #4 instead of PUSH and POP. I guess the biggest problem is how to handle the 17 bit addresses in JMP and CALL.

Rinse. Spin. Repeat.

P1b or is it P2 redux. I've forgotten which now.

Totally agree hub exec makes the 512k memory very useful but as soon as you start adding enhancements and major features you will have a very hard time stopping and the whole cycle repeats.

Just saying.

mindrobots · 2014-04-04 19:10

Peter Jakacki wrote: »

...P3...warmly welcomed.

That SHOULD be fixed by then!!

David Betz · 2014-04-04 19:16

rogloh wrote: »

Rinse. Spin. Repeat. P1b or is it P2 redux. I've forgotten which now.

Totally agree hub exec makes the 512k memory very useful but as soon as you start adding enhancements and major features you will have a very hard time stopping and the whole cycle repeats.

Just saying.

Yes, that's a worry. I only mentioned hub execution because Chip did and I assume he believes it can be done relatively easily.

jmg · 2014-04-04 19:23

David Betz wrote: »

Yes, that's a worry. I only mentioned hub execution because Chip did and I assume he believes it can be done relatively easily.

HubExec makes sense, it would be very blinkered, not to mention short sighted, to ignore development that has been proven on the P2.
This is really more of a 180nm Power Envelope design exercise.

rogloh · 2014-04-04 19:28

It makes a lot of sense, so does full speed USB, tasking, QuadSPI support..........you can see where I am going here.

EDIT: actually tasking is not a good example, if there are 16 COGs, it's not so important. But there will be plenty of other things that will come up and be wanted. How do you stop? When you run out of power or $ or time or die size or the first of the above?

jmg · 2014-04-04 19:33

brucee wrote: »

As you have the multi threading make it optional and drop back to 4 COGS, which could still mean up to 16 virtual CPUs. And increase the RAM, As that still IMHO is the real bottleneck. And you can let 1 or 2 COGs run at full speed (turn off threading)

I think lowering the P2 COG count (and trading for more RAM) is a perfectly valid way of tuning the design to match a practical 180nm Power Envelope.

The Single-COG power figures of a P2, are nothing scary at all, especially if you apply Vcore scaling.
The low cost packages look like they will struggle to cool 8 x P2 COGs, all at 100% speed.

Once P1E power values are available, another combination could be 4 P2 COGS (16 virtual CPUs) and a blob of P1E COGS, with the memory expanded into the spare die space.

RossH · 2014-04-04 19:36

jmg wrote: »

HubExec makes sense, it would be very blinkered, not to mention short sighted, to ignore development that has been proven on the P2.

Chip is not ignoring development that has been proven on the P2. He has already said (for example) that he would re-use the work done on the "analog rich" I/O pins.

Ross.

jmg · 2014-04-04 19:51

RossH wrote: »

Chip is not ignoring development that has been proven on the P2. He has already said (for example) that he would re-use the work done on the "analog rich" I/O pins.

Correct, I was talking more about the FPGA proven Verilog, not the Custom parts of P2.

msrobots · 2014-04-04 20:21

I wish I would be able to handle this language like mindrobots does.

Which brings us all to the mythical beast that we have been chasing through the woods and tacking features onto every time we corner it.

Well said.

I even understand David's wish to add hubexec to it, this may be a big feature for PropGCC and other compiled languages.

But 16/24 whatever cogs running at 10 times the speed with the (almost) same instruction set as the P1 has would greatly ease the changes to PropGCC, Catalina, Spin, PropBasic, Forth and whatever is there already.

Change as less as needed in the PASM opcodes. All OBEX stuff will run. All Documentation and Tutorials will fit with some added comments. Incremental. Like BS1 and BS2.

I think Hubexec is to much to ask. LMM will be 10 times faster already.

Maybe Bill can remember his first proposals what would be needed to speed up LMM without even thinking about Hubexec at that time.

Keep the instruction set as close as doable to the P1. Add what is needed for ADC/DAC/Video/Counter. .

Leave all that fancy stuff out for the upcoming P3.

It is about the time to market not features. IMHO even 16 cogs will do.

This would help all existing customers who struggle with - shoot one more cog please or just 5 more pins please or some more speed please...

32 cogs is already overkill with 64 pins.

But if this is a (almost) drop in replacement for the Prop1 - from the software point - The existing customer base could easily adapt.

Somebody said here already that until you really reach high volumes the software would be the main cost to account for.

One who outed himself here was Man_At_Work selling 1-2k props a year. I just take him as example.
If he could even double the speed of his controllers or can extend his product line with some features and STILL run almost the same code he might be very happy.

Read the excitement in Peter Jakacki's Post. He is not really a Hobbyist in his field. But like others he has stable software for various products. Porting Tachion from P1 to P1xx will be way easier, faster and a far less dangerous enterprise as moving to the new mythical beast now called P3.

Do not do the same mistake again. Avoid adding features at all cost. keep it at 16 P1 cogs, 2 clock/ins, ADC/DAC, more ram, inter cog communication.

I am to stupid to understand the implications of 512k or even 1m RAM...

But PLEASE change as less as needed on the instruction set. And keep them (almost) P1 cogs.

my 2 cents.

Enjoy!

Mike

David Betz · 2014-04-04 20:24

msrobots wrote: »

I even understand David's wish to add hubexec to it, this may be a big feature for PropGCC and other compiled languages.

I didn't ask for it. Chip mentioned it himself. If he has had second thoughts about it, I certainly won't push it. I was just following up on something he mentioned as a possibility.

jazzed · 2014-04-04 20:24

mindrobots wrote: »

Originally Posted by Peter Jakacki

...P3...warmly welcomed.

That SHOULD be fixed by then!!

You read my mind! Again.

Cluso99 · 2014-04-04 20:35

Could the same P2 QFP package and pinout be used on the P32X32B using only 64 I/O pins and the others N.C. ?

This could make our boards compatible with the following P2 footprint (a lower power version at least).

cgracey · 2014-04-04 20:42

In reviewing the Prop1 cog RTL today, I looked into what it would take to add hub exec. It may be too complicated, in that we'd need several new instructions, a cache line, and some type of stack mechanism to go with it. We could easily double the cog's logic complexity by doing just those things. I think that the key is to leave the Prop1 cog as it is. That way, we get to keep software compatibility with the current Prop1.

The video could simplify a little, as the 8 bit output could just go straight to a DAC, since we are now unable to clock digital pins from the PLL, anyway. Also, only CTRA would need a PLL. It would be neat if we could come up with some really simple video modulator. RGB color-space conversion could be done, but it would be best as some resource that lives outside of the cogs, maybe between the cogs and the DACs.

It would be nice to keep a simple universal hub memory map of 512k. With 16 cogs, each having a two-clock instruction cycle, the cogs would each get a hub turn every 8 instructions, disregarding any hub cycle allocation array that could be implemented outside of the cogs. This is the same effective hub:cog relationship that Prop1 has.

We could implement 32 cogs best if we used a hub-cycle allocation array as someone suggested.

At any rate, this becomes an effort of REMOVING, not adding, as Phil likes to talk about. I had really forgotten how simple Prop1 cogs were. They are balanced little machines that can do just about anything, though not as fast as some things might require. It's easier to just add more of them than trying to soup them up.

cgracey · 2014-04-04 20:48

About that ugliness of using C to select which port the "WAITPEQ/WAITPNE D,S/#" uses:

We could select the port based on whether the D register is at an odd or even address. I'd need to make a general-purpose "align" directive for the assembler.

We're looking at 5 Watts in a BGA!

Comments