Shop OBEX P1 Docs P2 Docs Learn Events
We're looking at 5 Watts in a BGA! - Page 15 — Parallax Forums

We're looking at 5 Watts in a BGA!

1121315171837

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 20:11
    cgracey wrote: »
    whatever we do, we keep the analog-rich i/o pins that we've already designed. They make all kinds of wild things possible.
    fantastic !!!
  • RossHRossH Posts: 5,458
    edited 2014-04-03 20:15
    Cluso99 wrote: »
    How about 4 x P1 with 64KB (8 cogs, 256KB) and 4 x 32bit pathways (looks like I/O ports) between P1s ?
    Use Dual Port Cog to get some speed.

    Or have 32 "analog rich" I/O pins from each P1 (port A), plus 32 pins on a common internal bus between all 4 P1s (port B).

    That would rock!

    Ross.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 20:15
    Chip,
    What are the chances of using OnSemi's RAMs instead of yours?
  • jmgjmg Posts: 15,172
    edited 2014-04-03 20:15
    cgracey wrote: »
    Whatever we do, we keep the analog-rich I/O pins that we've already designed. They make all kinds of wild things possible.

    Phil, they have internal clocking in them, if you want it, for super-low jitter. You would approve, I'm quite sure.

    Have these Cells (& PLLs etc) been OnSemi process proven ? (ie on a shuttle run, while the main design is being done ?)
  • msrobotsmsrobots Posts: 3,709
    edited 2014-04-03 20:15
    can anybody close the door to the opium den, please?

    Enjoy!

    Mike
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-04-03 20:16
    Cluso99 wrote:
    How about 4 x P1 ...
    'Similar to my proposal. I think it's important to keep the pin count (hence, package size) manageable by limiting the total physical I/Os to 64.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 20:17
    RossH wrote: »
    Or have 32 "analog rich" I/O pins from each P1 (port A), plus 32 pins on a common internal bus between all 4 P1s (port B).

    That would rock!

    Ross.
    Sure would!

    In between the frenetic posts here, Chip said we will get the new analog pins from P2 WTG :)
  • jmgjmg Posts: 15,172
    edited 2014-04-03 20:19
    Cluso99 wrote: »
    Postedit:
    Wouldn't this still be high power? But perhaps some lesser cogs could work.

    Yes, it could even be higher power, as there is less of each COG unused/idling, and there are now
    8 TIMES as many COGs.

    Even a seemingly meagre 80mW per COG, @64 COGS, will blow past that 5W we are all talking about.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-04-03 20:20
    RossH wrote:
    ... plus 32 pins on a common internal bus between all 4 P1s (port B).

    I never liked the port B idea, vis-a-vis waitpeq/pne, since it relies on the carry flag. Better to occupy that address space with an additonal counter and use hub-centric mailboxes/locks for inter-hub comms.

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 20:20
    'Similar to my proposal. I think it's important to keep the pin count (hence, package size) manageable by limiting the total physical I/Os to 64.

    -Phil
    Reading between the lines, I think Chip wants to keep the QFP128 package and the Pin Layout Beau has done. So it would be 92 I/O pins :)
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 20:23
    I never liked the port B idea, vis-a-vis waitpeq/pne, since it relies on the carry flag. Better to occupy that address space with an additonal counter and use hub-centric mailboxes/locks for inter-hub comms.

    -Phil
    I don't like the carry flag use for PortB.

    But a 32bit register like PortB is nice for direct intercog comms without waiting for hub cycles.
  • RossHRossH Posts: 5,458
    edited 2014-04-03 20:31
    Cluso99 wrote: »
    Sure would!

    In between the frenetic posts here, Chip said we will get the new analog pins from P2 WTG :)

    Introducing the new Parallax "Quad" Propeller - a true "symmetric multiprocessing system on a chip" - 3,200 MIPS in in a low-power micro controller form factor, 64 "analog rich" I/O pins, 32 processors and 1Mb RAM (4 cores, each with 256Kb RAM and 8 32-bit symmetric microprocessors), with all cores sharing a 32 bit bus. Support for "programmable peripherals" using any of the 64 I/O pins, such as video, UART, I2C, SPI etc
    :lol:
  • AribaAriba Posts: 2,690
    edited 2014-04-03 20:32
    How about a mix of P1 and P2 cogs?
    For example 2xP2 cogs and 12xP1 cogs on a one chip . P2 cogs access hub every 8 cycles, P1 cogs every 16 cycles.

    We would have the genial new Video generator, the SDRAM access HubExec on two cogs.
    And we would have P1 compatibelity (Spin1, OBEX objects) on the other 12 cogs.

    The chip may even be smaller than a P1 chip. I think the P1 cogs should add the multiplier instructions.

    Andy
  • jmgjmg Posts: 15,172
    edited 2014-04-03 20:37
    Cluso99 wrote: »
    Reading between the lines, I think Chip wants to keep the QFP128 package and the Pin Layout Beau has done. So it would be 92 I/O pins :)

    That sounds like 96 COGs then ? ;)
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2014-04-03 20:39
    cgracey wrote: »
    We could run 128 Prop1 cogs at 200MHz, for 50 MIPS, each. That would be 6,400 MIPS, total.

    We could maybe even do a two-clock version (which I've already had working) using a dual-port 256x32 cog RAM, for 100 MIPS per cog. That could yield 12,800 MIPS. That's 10x the MIPS of a 160MHz Prop2, albeit "lesser" MIPS with half the cog-RAM size.

    We could do a 64-cog version, COG-per-pin for DAC, 100 MIPS per cog, 512K hub RAM version.

    These would all fit in the current die area and be quick to finish.

    Ohh, 64 cogs, 512k RAM, I do love the sound of that very much :)
    and QFP128 is just fine by me
  • FredBlaisFredBlais Posts: 370
    edited 2014-04-03 20:46
    Wow, that thread escalated quickly!
  • jmgjmg Posts: 15,172
    edited 2014-04-03 20:47
    Ariba wrote: »
    How about a mix of P1 and P2 cogs?
    For example 2xP2 cogs and 12xP1 cogs on a one chip . P2 cogs access hub every 8 cycles, P1 cogs every 16 cycles.

    We would have the genial new Video generator, the SDRAM access HubExec on two cogs.
    And we would have P1 compatibelity (Spin1, OBEX objects) on the other 12 cogs.

    The chip may even be smaller than a P1 chip. I think the P1 cogs should add the multiplier instructions.

    Andy

    Asymmetric cores do make sense, and given the area ratios, the P1 and P2 are certainly Asymmetric.

    P2 COGs deliver high end Maths, Tasks, HUN+EXEC and SDRAM and SerDes and Better timers.

    Some of the Better timers and SerDes might make it over into P1 COGS too ?
  • jmgjmg Posts: 15,172
    edited 2014-04-03 20:53
    cgracey wrote: »
    I explained to the engineer that the S and D flops change on every clock, while other flops could be considered to toggle at a 20% rate.

    Wouldn't random data be an average of 50% toggle rate, as half the time any cell will not change state ?
  • cgraceycgracey Posts: 14,139
    edited 2014-04-03 20:54
    Cluso99 wrote: »
    Chip,
    What are the chances of using OnSemi's RAMs instead of yours?


    We're there, already. There RAMs will work better with their design flow. The area cost for Prop2 is 2 square mm more than our own RAMs, since they don't have an efficient 3R1W RAM, but must use three 1R1W RAMs. This means that OnSemi must just build a giant square of logic and RAM that will hook up to our pad frame. This is WAY simpler than before.
  • cgraceycgracey Posts: 14,139
    edited 2014-04-03 20:55
    jmg wrote: »
    Have these Cells (& PLLs etc) been OnSemi process proven ? (ie on a shuttle run, while the main design is being done ?)


    They've been proven at TSMC, but will be tweaked for OnSemi. This is all SPICE-level work, so outcomes are pretty certain.
  • jmgjmg Posts: 15,172
    edited 2014-04-03 20:57
    cgracey wrote: »
    No memory power considerations yet, though we determined today that we could use their memories, instead of our own, and it would cost an additional 2 square mm of silicon, since they would have to build the 3-read-port/1-write-port cog RAM out of three separate 1-read/1-write port RAMs.

    Does that mean a 2 clock P2 COG could shrink the size of the COG ram, making maybe 100 MOP P2 COGS, with 200 MHz timers just possible ? (and easing the power envelope at the same time ? )
  • cgraceycgracey Posts: 14,139
    edited 2014-04-03 20:58
    Cluso99 wrote: »
    Reading between the lines, I think Chip wants to keep the QFP128 package and the Pin Layout Beau has done. So it would be 92 I/O pins :)


    We'll need to reduce the I/O pin count to make way for more VDD/GND pins. Like Phil said, 64 pins is a good number for keeping things sane. This is all predicated on using Prop1 cogs, which is hypothetical, at this point. It is intriguing, though.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2014-04-03 21:03
    chip said
    Today, I compiled the original Prop1 design for a Cyclone IV device, like we have on the DE0-Nano and DE2-115 boards.

    The total required LE's were 15,926. That would only take only 71% of the DE0-Nano FPGA, though that FPGA wouldn't have enough RAM for the 64KB hub memory.

    Lots of exciting experiments at the high end of the FPGAs. But also lots of interesting possibilities at the other end too with the cheaper sub $20 FPGAs with 5000-10000 LE's - ok, less cogs but in return maybe hand off some cog functions to VHDL/Verilog blocks, more pins, maybe external ram, and much more flexibility.
  • cgraceycgracey Posts: 14,139
    edited 2014-04-03 21:04
    Ariba wrote: »
    How about a mix of P1 and P2 cogs?
    For example 2xP2 cogs and 12xP1 cogs on a one chip . P2 cogs access hub every 8 cycles, P1 cogs every 16 cycles.

    We would have the genial new Video generator, the SDRAM access HubExec on two cogs.
    And we would have P1 compatibelity (Spin1, OBEX objects) on the other 12 cogs.

    The chip may even be smaller than a P1 chip. I think the P1 cogs should add the multiplier instructions.

    Andy


    This is an interesting idea - a few Prop2 cogs and a bunch of Prop1 cogs!

    That way, we could get the best of both - a few Cadillacs, plus a few dozen Pintos, for economy!
  • cgraceycgracey Posts: 14,139
    edited 2014-04-03 21:05
    jmg wrote: »
    Wouldn't random data be an average of 50% toggle rate, as half the time any cell will not change state ?


    Yes, he used a factor of 0.5.
  • cgraceycgracey Posts: 14,139
    edited 2014-04-03 21:07
    jmg wrote: »
    Does that mean a 2 clock P2 COG could shrink the size of the COG ram, making maybe 100 MOP P2 COGS, with 200 MHz timers just possible ? (and easing the power envelope at the same time ? )


    It's too late to make Prop2 cogs take two clocks per instruction, instead of one.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-03 21:07
    Now THAT is an interesting thought... we would need at least two P2 cogs (1080p, hubexec), and a whole passel (as many as fits) of little P1 cogs.

    Of course it makes Obex more complex, but probably worth it.
    cgracey wrote: »
    This is an interesting idea - a few Prop2 cogs and a bunch of Prop1 cogs!

    That way, we could get the best of both - a few Cadillacs, plus a few dozen Pintos, for economy!
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 21:08
    What about 32 Cogs and 512KB (or 1MB) hub with this hub access method...
    Cogs 0, 8, 16 & 24 each can access the whole hub memory at slots 0, 2, 4 & 8 respectively.
    Cogs 1-7 each can access only 1 block of 64KB (128KB if 1MB) of the hub (Cog 1=64-128KB, 2=128-192KB, etc), in slot 1
    Cogs 9-15 are the same as Cogs 1-7 but get their access in slot 3
    Cogs 17-23 same as Cogs 1-7 but in slot 5
    Cogs 25-31 same as Cogs 1-7 but in slot 7
    Slot------------0-------1-------2-------3-------4-------5-------6-------7-------
    Cog 00          0-512K
    Cog 01                  64K                           
    Cog 02                  128K    
    Cog 03                  192K
    Cog 04                  256K
    Cog 05                  320K
    Cog 06                  384K
    Cog 07                  448K
    Slot------------0-------1-------2-------3-------4-------5-------6-------7-------
    Cog 08                          0-512K
    Cog 09                                  64K 
    Cog 10                                  128K
    Cog 11                                  192K
    Cog 12                                  256K
    Cog 13                                  320K
    Cog 14                                  384K
    Cog 15                                  448K
    Slot------------0-------1-------2-------3-------4-------5-------6-------7-------
    Cog 16                                          0-512K      
    Cog 17                                                  64K 
    Cog 18                                                  128K
    Cog 19                                                  192K
    Cog 20                                                  256K
    Cog 21                                                  320K
    Cog 22                                                  384K
    Cog 23                                                  448K
    Slot------------0-------1-------2-------3-------4-------5-------6-------7-------
    Cog 24                                                          0-512K      
    Cog 25                                                                  64K 
    Cog 26                                                                  128K
    Cog 27                                                                  192K
    Cog 28                                                                  256K
    Cog 29                                                                  320K
    Cog 30                                                                  384K
    Cog 31                                                                  448K
    Slot------------0-------1-------2-------3-------4-------5-------6-------7-------
    
    This permits cogs 0, 8, 16 & 24 to access the full hub memory so they could act as a server between the cog groups hub blocks. By being distributed 2 slots apart permits quicker responses between cog groups (if they all run in parallel).

    I would still like to see some common I/O style block between all cogs.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 21:13
    Now THAT is an interesting thought... we would need at least two P2 cogs (1080p, hubexec), and a whole passel (as many as fits) of little P1 cogs.

    Of course it makes Obex more complex, but probably worth it.
    2 x P2 cogs and a pile of P1 cogs - nice

    Perhaps more hub would fit too. And memory does not add so much to power.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 21:15
    Chip,
    Since you are going to use their memory, might it then be possible to make the cog/aux WIDE ?
    If we only had 2 x P2 cogs then only 2 lots would need to be wide.
This discussion has been closed.