Shop OBEX P1 Docs P2 Docs Learn Events
We're looking at 5 Watts in a BGA! - Page 21 — Parallax Forums

We're looking at 5 Watts in a BGA!

1181921232437

Comments

  • bruceebrucee Posts: 239
    edited 2014-04-04 13:19
    Not as nice to develop for as the P1/P2, but definitely interesting for some uses.

    I won't argue with you there, as those parts are a real pain in the *** to program, but when you build lots of something you can invest the time. So for some quick and dirty projects the P2 or whatever (I would vote enhanced P1 with one or more COGs able to do hub-exec) it might find some low volume, non-power sensitive, non-cost sensitive applications.

    The big issue is WHEN? It is already way way too late to market. And where is the code when it gets there? I can buy most of the pieces in the ARM world or find many contributed ones. Like I say software sells hardware, and there ain't much of either for the P2.

    I too would shy away from that can of worms called Linux (as you need s Linux guru to do much of anything), but if you need network stacks, file systems, lots of video compress/decompress, you can pretty much get all that from some git hub somewhere.
  • jazzedjazzed Posts: 11,803
    edited 2014-04-04 13:22
    AAAARRRRGGGHHHHH!!!!

    Gentlemen,

    Seriously.

    Really.

    Would you kindly stop saying 3W-5W will be required to run a P2 until we have a better handle on it?

    ....
    I propose that every power-cutting measure be required to show HOW it will cut power utilization, backed up with references and numbers. No hand waving.


    When the number goes down, we will be greatly relieved.

    Meanwhile, you may continue waving your arms.

    images?q=tbn:ANd9GcQg9wO2R5S4QCFpj_f3t-WTC3QBGNPbfflwctOEkV5xTjSBi1EwMQ
  • User NameUser Name Posts: 1,451
    edited 2014-04-04 13:58
    ARM chips are a PITA. I should know.

    They are cranked out too quickly and with too little care (and no love). Even the manufacturer's sample code often doesn't work.

    The chips are complicated, the tools are complicated, and the problems are complicated. Hair pulling is an occupational hazard.

    Just in the past two days I've had to deal with both SSI and power management issues where the official NXP documentation was wrong. But how can you blame them? The freaking manual is 1000 pages long! It was torture to create. It's torture to use.

    Give me a Propeller any day.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-04 14:06
    Ditching the features that you don't like will not reduce power consumption, and greatly limit the chip.
    Bill,
    Haven't you yet realised that this is precisely where the extra power consumption is being burned!
    The P1 cogs were lean and used very little power.
    Sorry Bill, but either you just don't get it or you are ignoring in an attempt to mislead.
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-04 14:11
    jmg wrote: »
    what part was that, and does it have full speed, or high speed usb. High speed usb seems to be an emerging feature.
    pic32mx795f512l
  • jmgjmg Posts: 15,159
    edited 2014-04-04 14:12
    cgracey wrote: »
    I just talked to OnSemi about where this whole project is headed.

    The understanding is that we need to determine how much power the P2 will take before we know if it's viable to build at 180nm. They said doing this at 65nm would cost between $1-2M and they get asked all the time to partner with hopefuls like Parallax, but that's not their business model.

    Their engineer got back to me after a doing a test where the ALU inputs were kept from toggling. A cog's core power went from 700mW to 450mW, or down by 36%. So, with aggressive flop gating into the ALU, we could maybe reduce power by 30%. That's still 500mW per cog, or 4W, total, for just the core.

    Here are some numbers he gave me the other day:
    	180MHz	100MHz
    1.8V	5.929W	3.757W
    1.5V	3.887W	2.462W
    

    We could probably get the 1.5V 100MHz case under 2W with aggressive flop gating in the cogs' ALUs. He's also setting up some memory considerations, which are going to give us a realistic idea of total core power.

    Interesting, relative to Pt = Sum(Cpd*Ft*Vc^2) those figures are slightly under mA/MHz proportional, and slightly over ^2 - ie dropping Vcc in those sims is ~ Vcc ^ 2.315

    It does show how scaling Vcore makes a very large difference.


    Memory numbers will be useful to add here.

    If we scale those earlier numbers, with the 30% case we get
    1.8V 180MHz 4W 500mW/COG @ 100% Profile, worst case, COG MHz = Timer MHz
    1.5V 100MHz 1.66W or 207mW/COG @ 100% Profile, worst case, COG MHz = Timer MHz

    A more typical usage case will be COG MHz = 50% Timer MHz, so that would make it worthwhile to have some tuning focus on pushing up timers margin, so as to allow the lowest possible Vcc for the Core.
    A Gated 50% is not going to allow as low a Vcc as a all-frequencies-lowered by 50% will.

    Taking that 100MHz 1.5V case, and running the hungry COG at 50%, and apply ~ 60% scale,
    we have
    1.5V 125mW/COG @ 100% Profile, worst case, 50 MOP COG, 100MHz Timer
  • jmgjmg Posts: 15,159
    edited 2014-04-04 14:19
    Cluso99 wrote: »
    The P1 cogs were lean and used very little power.

    We do NOT yet have Sims, of P1 COGs on OnSemi 180nm.

    Reality Check: Do not imagine that a 32 bit core, running 32 bit opcodes, will magically drop by x16(!), over another 32 bit core, running 32 bit opcodes. (one claim made already)

    Any power compares also need to compare a 50MHz P2, Lower Vcc, with a 100MHz, 2 Clock P1, just to match MOPs
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-04 14:21
    Chip,
    You mentioned adding hubexec for the P32X32B proposal. Would the 512KB hub be WIDE (8 longs wide) to support the single 8 long hubexec cache?

    A number of us have suggested at different times that not all cogs need to be equal. jazzed recently suggested hubexec mode for only one P1 cog.

    If my first statement is correct, how about Cog #0 only has hubexec and has a 4 * 8 long instruction cache with the LRU tags (like you did for P2) ? ie Cog #0 has an extra feature or two.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-04 14:25
    Ray,

    I never attempt to mislead, and I find it fascinating that you refuse to provide technical arguments backed by facts, and instead engage in personal attacks. Like you just did.

    In my experience, when people refuse to provide technical arguments, but engage in personal attacks, it is usually because they are:

    (a) incapable of providing technical arguments

    (b) know that they cannot provide valid technical arguments

    Funny how you are still avoiding making a technical argument!

    Please show how:

    1) tasks significantly increase power consumption over non-tasked code

    HOW?

    As you have consistently refused to say HOW tasks significanlty increase power consumption, I am forced to hypothesize:

    1a) *IF* your argument is, four tasks allow you to do the work of four simple cogs, while using more power then YOU are disingenious by proposing four simpler cogs to replace them, which would use just as much power, possibly more due to more cog memory.

    1b) *IF* your argument is against pipelining, as it leads to higher performance which uses more power, than make the argument against pilelining, instead of providing a straw man argument against tasks.

    If the above are NOT your arguments, then explain HOW do tasks increase power utilization?

    2) hubexec significantly increases power consumption

    How?

    3) threads significantly incrase power consumption

    How?

    All three recover wasted / idle time, so you can do something useful instead of waitpne or equivalent.

    I make arguments based on numbers and facts.

    You wave your hands and say "they use a lot more power".

    Prove it.

    What appears to be happening is that with the eight cogs running full blast, with all logic firing (which is unrealistic), a P2 *may* use up to 5W according to simulations. That's all the transistors in the cogs.

    The tasks/hubexec/threads are a small fraction of the transistors of a cog - Chip would have the exact numbers.

    Because it suits your philosophical bent towards simpler cogs, you are painting tasks/hubexec/threads as the "evil features" that cause all that excessive power usage, when it fact the blamed features are a very small percentage of the logic.

    Your argument is faulty.

    Now if you can show that hubexec/tasks/threads consume 30% of the logic, you may have a point. But they don't.


    Cluso99 wrote: »
    Bill,
    Haven't you yet realised that this is precisely where the extra power consumption is being burned!
    The P1 cogs were lean and used very little power.
    Sorry Bill, but either you just don't get it or you are ignoring in an attempt to mislead.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-04 14:26
    jmg wrote: »
    .....
    Reality Check: Do not imagine that a 32 bit core, running 32 bit opcodes, will magically drop by x16(!), over another 32 bit core, running 32 bit opcodes. (one claim made already)
    .....
    Chip has stated this fact as the reason that the P2 cogs are consuming so much additional power over P1 cogs - please go back and re-read his posts.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-04 14:33
    1) link please

    2) that is at most a hypothesis, and not a fact
    Cluso99 wrote: »
    Chip has stated this fact as the reason that the P2 cogs are consuming so much additional power over P1 cogs - please go back and re-read his posts.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 14:34
    pedward wrote: »
    I really think that 1MB of ram is the right target for a P1B because there are so many applications where the P1 is fast enough, but doesn't have enough ram to do a job. Video will require big RAM, datalogging/high speed sampling requires big RAM, heck, just complex programs require big ram.

    Agreed, and it opens markets the other players cannot reach either.

    I think the practical number of P1 COGs will be determined by Power-Envelope, not by die-area.

    This already has P2 PLL and ADC/DAC, and I think it would need P2 Timers as well, and probably some P2 SerDes.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 14:35
    Cluso99 wrote: »
    Chip has stated this fact as the reason that the P2 cogs are consuming so much additional power over P1 cogs - please go back and re-read his posts.

    Link for OnSemi Sim results for P1 COGs please ?
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2014-04-04 14:50
    Chips posts about P1 cogs in the 180nm process have largely all been about LE counts and die area, and not about power usage. The P1 was done in a 300nm process (if I recall right) that is not comparable to the OnSemi 180nm process.

    For all we know, the P1 cogs built on OnSemi's 180nm process could draw much more power than they do in the 300nm process we have now. You can't make any assumptions here.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 14:56
    Mike Green wrote: »
    Power switching helps, but the P2, even if it were functionally similar to the P1, would not be able to compete on idle or wait state power consumption.

    A P2 running the same 20 MOPs as a P1, with the Vcc adjusted to 20MHz-run, will come in similar/below a P1. ( Static Icc, P1 easily wins, with P2 at about 1mA)
  • cgraceycgracey Posts: 14,133
    edited 2014-04-04 14:58
    User Name wrote: »
    ARM chips are a PITA. I should know.

    They are cranked out too quickly and with too little care (and no love). Even the manufacturer's sample code often doesn't work.

    The chips are complicated, the tools are complicated, and the problems are complicated. Hair pulling is an occupational hazard.

    Just in the past two days I've had to deal with both SSI and power management issues where the official NXP documentation was wrong. But how can you blame them? The freaking manual is 1000 pages long! It was torture to create. It's torture to use.

    Give me a Propeller any day.


    Thanks for posting this.
  • Kerry SKerry S Posts: 163
    edited 2014-04-04 14:58
    jmg wrote: »
    Link for OnSemi Sim results for P1 COGs please ?

    That was based on Chip's Verilog experiment with a P1 cog where it was 1/16 the size of a P2 cog and thus he extrapolated the power curve (as I recall).

    Unfortunately that is not a real comparison as there are changes that need to be made, SERDES, Analog, some Math, along with some quality of life instructions and in the end we might get an updated P1 cog at 1/4 of the current P2 cog but at a high functional cost.

    That is where your P2 @ 50MHz vs new P1 @ 100MHz comparision would be good to see real data for. Closest thing to Apples2Apples we could get.

    While Chip seemed to think that hardware TASKS and THREADING were 'nearly free' when he implemented them it would be interesting to see just how much that complicated the cogs.

    The guy is a wizard with this stuff and it will be interesting to see what he can do with power management on the P2 cog... I have a lot of faith in him and really believe he can get real world power usage down to something reasonable (for the power we are being granted). Even if it does require a few pages in the P2 design manual. That STMicro ARM has 30 pages on power profiles, modes and management!
  • bruceebrucee Posts: 239
    edited 2014-04-04 14:59
    I think power has been an issue all along, earlier estimates for core power ran up to 1.5A. That is 3W and I asked at the time whether that was for real. The answer was maybe. For me that looked like a deal killer, this is suppose to be a micro controller after all.

    Just seems like now reality is striking.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-04 15:02
    Bill & jmg,
    Chip has spoken with OnSemi and realises precisely where and why the P2 cogs are consuming so much power. That is why he is looking at gating the D & S paths.

    You can go back and read his posts as I am sick of the biased posts. You don't need numbers to prove anything - the proof is in the pudding.

    I told you the complex P2 instruction core was using the power but you didn't believe me (as if there was somewhere else that was magically using the extra power) - there are really only P2 cogs, and rams and I/O in the whole chip and the rams had been excluded. OnSemi and Chip have now confirmed what I already knew - there was nowhere else it could possibly be.

    Now, sure I would like to know what a P1 cog will use, but as Chip said, its absolutely nowhere near a P2 cog.
    If you are happy to have variable power scales on the P2 then the same logic should apply to P32X32B - apples to apples!

    Next, doing a P32X32B shuttle run could be done much faster than a P2 shuttle run. The P32X32B shuttle would prove the OnSemi process and give Parallax an immediate sales opportunity. Quite likely this diverted time would be recovered by the fact that while the shuttle was in progress, the P2 work could be continued (USB/SERDES plus power reduction in the cogs). Chip and Ken have both given the impression that a lot of FPGA (P2) testing needs to be done before a shuttle run is attempted - I could be wrong but it is my impression and it makes sense too. Chip has said P2 will still happen.

    So, I am all for a P32X32B now with as much hub as possible 512KB+. There are niches where this will be untouchable.

    When the P2 comes, it would then be the 3rd Parallax designed Chip which also brings credibility too.

    Just because I pushed for the P1B options to be explored didn't mean that is what I hoped for. I really hoped for the P2 problems to be solved. However, 24 hours ago Chips comments (and others) totally changed my mind. I see the P32X32B not only being the better option, I see more markets for it and hence volume than P2, and I see little delays, if any, to the P2 production providing Parallax can fund both. Certainty is better than Blue Sky, and the P32X32B is a certainty at this time. (certainty meaning that it will work if that option is chosen)

    I will build boards for the P32X32B the moment Chip releases pinouts and package. So I will have pcbs for the shuttle chips. I am this confident that it will work.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 15:16
    The solution I propose for sharing the 512MB hub bandwidth is as follows:
    a 64 entry five bit table, and six bit counter incremented each clock cycle.
    The hub slot is assigned by the content of the table, indexed by the counter.
    Allows TOTAL flexibility in allocating slots in a deterministic fashion.

    Nice idea, an array table has some logic cost, but it is very easy to understand, and very granular. Gives the Pgmr control.

    I would also include in this a No-Gog choice ( 5 bits allows that for-free for 16..31 COGs), for power profile control.

    Good allocation of hub bandwidth will be critical, and most Apps I can think of will need quite asymmetric bandwidth,
    The old 1:8 or 1:16 will just not cut it, but an Array table can default to that very easily, so is 100% backward compatible too.
    Is it really smart to change designs at the last minute, over unproved power envelope fears?

    I doubt any design change decisions will be made before power envelope data for P2 is fully defined.
    I also doubt any serious P1E work will be done before power envelope data for P1E COGs is fully defined.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-04 15:21
    Ray,

    Of course the P2 cogs use a lot more power - I never dis-believed that.

    What I had a problem with - and still do - is your unsupported assertion that all the extra power is due to tasks/hubexec/threads, and your refusal to make technical arguments.

    A P1 cog, running at 160Mhz clock frequency, pipelined, at 180nm, will use a lot more power than at 20Mhz non-pipelined frequency @ 300nm. Fact. No one disputed that, except perhaps yourself. I think this is where the major power difference is.

    Had your argument been that pipelining, that is one instruction per clock, vs 4 clocks per instruction instructions per clock, was a cause of a major difference, I would not have argued, as I do believe pipelining is a major cause of the thus far unquantified power difference.

    That would have been a technical argument. Just trying to assert that it was tasks/hubexec/threads is NOT a technical argument.

    A P2 cog has a lot more LE's. The extra power usage comes from how many of those gates toggle at what frequency. They toggle faster for a pipelined cog than a non-pipelined four clocks per instruction cog. Duh.

    If the gates do not toggle often, they do not have as much effect.

    My point was, that the percentage of gates ascribed to tasks/hubexec/threads was insufficient to explain the power utilization difference between a 300nm 20Mhz P1 cog and a 160Mhz P2 cog, and that we do not have an estimate for the power utilization of a 180nm P1 cog on the OnSemi process.

    Yet you still go on with personal, instead of logical or technical attacks.

    "I am sick of the biased posts"

    Would you care to show how the posts are biased?

    Amazing how I still refrain from personal attacks, and keep my arguments at a technical level. When others don't.
    Cluso99 wrote: »
    Bill & jmg,
    Chip has spoken with OnSemi and realises precisely where and why the P2 cogs are consuming so much power. That is why he is looking at gating the D & S paths.

    You can go back and read his posts as I am sick of the biased posts. You don't need numbers to prove anything - the proof is in the pudding.

    I told you the complex P2 instruction core was using the power but you didn't believe me (as if there was somewhere else that was magically using the extra power) - there are really only P2 cogs, and rams and I/O in the whole chip and the rams had been excluded. OnSemi and Chip have now confirmed what I already knew - there was nowhere else it could possibly be.

    Now, sure I would like to know what a P1 cog will use, but as Chip said, its absolutely nowhere near a P2 cog.
    If you are happy to have variable power scales on the P2 then the same logic should apply to P32X32B - apples to apples!

    Next, doing a P32X32B shuttle run could be done much faster than a P2 shuttle run. The P32X32B shuttle would prove the OnSemi process and give Parallax an immediate sales opportunity. Quite likely this diverted time would be recovered by the fact that while the shuttle was in progress, the P2 work could be continued (USB/SERDES plus power reduction in the cogs). Chip and Ken have both given the impression that a lot of FPGA (P2) testing needs to be done before a shuttle run is attempted - I could be wrong but it is my impression and it makes sense too. Chip has said P2 will still happen.

    So, I am all for a P32X32B now with as much hub as possible 512KB+. There are niches where this will be untouchable.

    When the P2 comes, it would then be the 3rd Parallax designed Chip which also brings credibility too.

    Just because I pushed for the P1B options to be explored didn't mean that is what I hoped for. I really hoped for the P2 problems to be solved. However, 24 hours ago Chips comments (and others) totally changed my mind. I see the P32X32B not only being the better option, I see more markets for it and hence volume than P2, and I see little delays, if any, to the P2 production providing Parallax can fund both. Certainty is better than Blue Sky, and the P32X32B is a certainty at this time. (certainty meaning that it will work if that option is chosen)

    I will build boards for the P32X32B the moment Chip releases pinouts and package. So I will have pcbs for the shuttle chips. I am this confident that it will work.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 15:23
    Cluso99 wrote: »
    You don't need numbers to prove anything - the proof is in the pudding.

    Strange contradiction, in engineering, the 'pudding' is the numbers. (Which you admit we do not actually have yet).
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-04-04 15:31
    In post #385 Chip stated that P2 uses about 15.8 times as many LEs as P1. The P1 datasheet shows that the Prop draws about 80mA with all 8 cogs running a Spin REPEAT at 80 MHz. This equals 264mW. 15.8 times this is 4.2W. At 160MHz the power dissipation would be twice as high, or 8.4W. If P1 use 300nm, then I think this would scale to 8.4*(180/300)**2, which equals 3W, which is in the ballpark of 5W.

    The real killer is the factor of 15.8. It's doubtful that removing any one feature would reduce this significantly. As I stated earlier, if you want to wack away at the P2 features then the resource requirements of each feature should be determined. The features that use the most transistors, and are the least useful should be the first to go.

    Personally, I think we should keep the P2 the way it is, and follow the proposal that I stated twice before.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 15:34
    potatohead wrote: »
    BTW, at 80Mhz, Video and SDRAM could be one COG, so long as one does not need an insane fill rate.

    Using Chips latest OnSemi figures, I get this estimate for that single P2 COG, (100MHz peripheral CLK, 50MOP P2 COG.)

    1.5V 125mW/COG @ 100% Profile, worst case, 50 MOP P2 COG, 100MHz Timer
  • bruceebrucee Posts: 239
    edited 2014-04-04 15:37
    Might I make a suggestion (everybody else is). I haven't dug into what you guys have been calling multi-threading, but I assume it was similar to what Echelon used to have 4 CPUs. Basically it uses the pipeline to run 4 instruction streams thru the ALU on separate clocks. So you get the effect of 4 independent yet predictable code streams.

    As you have the multi threading make it optional and drop back to 4 COGS, which could still mean upto 16 virtual CPUs. And increase the RAM, As that still IMHO is the real bottleneck. And you can let 1 or 2 COGs run at full speed (turn off threading)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-04 15:38
    We have been calling that hardware tasks.

    I think a 4 cog P2 would be better than a 16 cog p1 derivative.
    brucee wrote: »
    Might I make a suggestion (everybody else is). I haven't dug into what you guys have been calling multi-threading, but I assume it was similar to what Echelon used to have 4 CPUs. Basically it uses the pipeline to run 4 instruction streams thru the ALU on separate clocks. So you get the effect of 4 independent yet predictable code streams.

    As you have the multi threading make it optional and drop back to 4 COGS, which could still mean upto 16 virtual CPUs. And increase the RAM, As that still IMHO is the real bottleneck. And you can let 1 or 2 COGs run at full speed (turn off threading)
  • LawsonLawson Posts: 870
    edited 2014-04-04 15:53
    cgracey wrote: »
    I just talked to OnSemi about where this whole project is headed.

    The understanding is that we need to determine how much power the P2 will take before we know if it's viable to build at 180nm. They said doing this at 65nm would cost between $1-2M and they get asked all the time to partner with hopefuls like Parallax, but that's not their business model.

    Their engineer got back to me after a doing a test where the ALU inputs were kept from toggling. A cog's core power went from 700mW to 450mW, or down by 36%. So, with aggressive flop gating into the ALU, we could maybe reduce power by 30%. That's still 500mW per cog, or 4W, total, for just the core.

    Here are some numbers he gave me the other day:
    	180MHz	100MHz
    1.8V	5.929W	3.757W
    1.5V	3.887W	2.462W
    


    We could probably get the 1.5V 100MHz case under 2W with aggressive flop gating in the cogs' ALUs. He's also setting up some memory considerations, which are going to give us a realistic idea of total core power.

    Let's see what he comes back with.

    Just to add fuel to the fire. The current Propeller happily boots at 1.5v and looks to be able to run 20MHz down at 1.0-1.1 volts. Prop Limbo, post #29 in particular. Can you toss a 1.0 and 1.2 volt Vcc into the next batch of sims? While the power supply will be a bit of a pain, 80-100MHz and 1.5watt or less is not a bad spot to be with the P2's feature set.

    Marty
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2014-04-04 15:55
    I'd love a P1 with 64 I/O pins, more hub memory, more cogs, and higher clock rate (which all seem reasonably doable). That would unlock the P1 cog's potential dramatically. However, I don't wan that instead of the P2.
    If Parallax can get a P1B out quickly and within budget, AND they feel it would provide enough revenue to meet their needs, then it could allow them to finish the P2 in a more comfortable time frame and without needing to sacrifice things along the way. It's more than just a technical problem, and only Ken and Parallax can properly determine that.
  • jmgjmg Posts: 15,159
    edited 2014-04-04 16:23
    Lawson wrote: »
    Just to add fuel to the fire. ... Can you toss a 1.0 and 1.2 volt Vcc into the next batch of sims? While the power supply will be a bit of a pain,

    as well as 1.8 thru 1.0V sims, also getting the indicated MHz capability at those voltage would help define the Vcc/MHz curve, in addition to the Vcc/Power one.

    Power supply is not really a pain, most switching regulators these days have references of 1v or below.
    eg
    http://diodes.com/catalog/buck_converter_57/ap65400.html (2500 $0.5600)
  • RossHRossH Posts: 5,436
    edited 2014-04-04 16:33
    cgracey wrote: »
    I'm thinking more than 16 cogs is excessive for only 64 I/O pins. So, what about this:


    100 pin 14x14mm exposed thermal pad TQFP package (Tja=20) with internal down-bonds to GND, so no pins needed for GND

    16 x 1.8V VDD pins, at four per side, with internal down-bonds for GND
    XI, XO, RESn, BOEn pins
    64 I/O pins with a unique 3.3V VDD pin for every 4 pins (and internal down-bond for GND) - this is important for analog and high-speed switching
    (that makes 100 pins, not including 32 internal down-bonds for GND)


    16 two-clocks-per-instruction Prop1 cogs
    256KB hub RAM with simple round-robin cog access - this maintains the same 8:1 instruction:hub-cycle ratio as the current Prop1
    200MHz clock - cogs run at 100 MIPS, CTRs at 200 MHz
    1600 total MIPS - 10x faster than current Prop1 chip

    Center of die is only 4.3 x 4.3mm. Die is 5.4 x 5.4mm with pad frame, or 29.16 square mm (54% of current projected Prop2 die size)


    This chip could be called P16X32B, as per current convention.

    Manufacturing cost would be ~$1.90, assuming $950/wafer, 950 die per wafer, 80% yield, $0.50 per package, $0.15 total testing. 1K piece price would be ~$6.00.

    This chip would behave like Prop1 with twice the cogs, all running 5x faster, and with 8x the hub RAM, plus 2x the I/O pins with new analog capabilities.

    This would be really easy to make happen and the chance of latent bugs would be very low.

    Hi Chip,

    Going back through this thread, it is not surprising that this post (by you) looks like the most practical and thoroughly thought out proposal (even including pricing!) that has been suggested. It also happens to be very close to what many people originally wanted out of the P2.

    One thing this thread shows is that while there is a real hunger for the "bells and whistles" P2, there is also a significant pent-up demand for a simpler P16X32B. And I don't think the one undermines the other. In fact, I think a chip like the P16X32B could both help Parallax fund the P2, and also significantly enhance interest in the P2 when it does finally arrive, by rekindling interest in symmetrical multiprocessing microcontrollers (the kind of interest the P1 had initially, and which got many of us here interested in the first place!).

    If you believe that this P16X32B would be low technical risk, quick and easy to make happen, and (importantly) not delay the work on the P2 - i.e. if it could be done during the current hiatus in P2 development - then why not continue to develop the P2 using normal Parallax funding, but develop the P16X32B using some kind of reward-based (not equity-based) crowd-funding model. This would make it cost-neutral to Parallax.

    Just set up a crowdsourcing site, specifying the target that we would need to reach make it happen, and - importantly - a date by which the target would have to be achieved to avoid impacting the P2 development, and see what happens.

    Let people vote with their wallets. I'd be willing to commit some $$$ to it, and I'm sure there are many others who would also.

    If the target is reached, the chip goes ahead.

    Ross.
This discussion has been closed.