Shop OBEX P1 Docs P2 Docs Learn Events
P1B & P1C (P1B+) A possible alternative if the P2 cannot fly yet — Parallax Forums

P1B & P1C (P1B+) A possible alternative if the P2 cannot fly yet

Cluso99Cluso99 Posts: 18,069
edited 2014-04-03 16:33 in Propeller 2
From the "We're looking at 5 Watts in a BGA" thread
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!
we were discussing options for possible changes to P2 to reduce power consumption.

Part of this was a possibility of doing a P1B in the short term, followed by P2 later in a smaller geometry.

The P1B basic is fairly well understood, but perhaps there is a possibility of leveraging some more P2 features without going overboard. I have dubbed this the P1C (previously P1B+).

Here is a quote from the thread that likely started this discussion...
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!?p=1255125&viewfull=1#post1255125
Cluso99 wrote: »
Last night Seairth, RossH and I met for dinner in Sydney. Seairth was visiting for work, so we couldn't miss the opportunity to catch up :)

A few things came out of that:
  1. What we all originally wanted was a few more I/O pins - 64 would be way more than adequate.
  2. ADC on some/all I/O pins.
  3. More HUB RAM.
  4. Faster clock speed.
And IIRC it was in that order. So not much more than the proposed original Prop 1B.

All of us would be happy with this first and keep the P2 for a 65nm followup.

I described my idea for AUX / COG RAM. They liked the idea.
  1. Make COG RAM WIDE
    • RD/WRWIDE would take 1 clock to transfer 8 longs, sync'd to hub
    • Ditch AUX RAM
    • Permit a block of 128 or 256 Cog longs be used as CLUT
      • Requires a simple MUX on the Instruction read port for the block(s)
      • A simple instruction to allocate the block(s) to CLUT useage
  2. Increase the LIFO depth for CALL/RET
    • Simpler design uses less transistors than AUX as only 18 bits wide.
  3. Saves 0.20mm2 x8 = 1.6mm silicon
    • Maybe that could be used to add 128KB extra HUB RAM (128KB=1.76mm2)
  4. We all agreed this now makes sense because
    • HUBEXEC removes the critical shortage of COG RAM
    • Simpler to use and describe
    • Saves a tiny amount of power x8
Regarding Speed & Cogs:
  1. None of us liked reducing cogs to 4
  2. Not all cogs need to be equal
    • We did agree, although against previous Prop philosophy, that not all cogs need to be equal
    • Only 1 or 2 cogs are likely to really run large programs using HUBEXEC
      • None of us are intending to use these cogs for multitasking or multithreading as provided by P2
      • IIRC Ross said he would implement his own software model for multitasking/multithreading
    • Only 1 or 2 cogs are likely to use Video
    • Most cogs will be used to perform intelligent I/O in software
    • Not all cogs are equal now (Video DAC pins)
    • This gives scope to make two types of cogs, reducing appropriate features from each set.
  3. We do not actually require the current speed even tho' we like it
    • Means we can trade power for speed
  4. Power vs die geometry
    • 5W at 1.8V = 2.8A (180nm)
    • 2.5W at 1.2V = 2.1A (65/90nm???)
  5. We do question the methodology behind the 5W calculations
    • Not all features of the chip will be in use at the same time
      • Not all 8 videos will be active concurrently
      • Not all maths routines will be active concurrently
      • What other logic blocks are not active concurrently?
      • How much of the instruction block is not active concurrently?
      • What blocks use the power?
An improved P1B
Overnight I have added these thoughts
  1. I like the P2 being multi-specified:
    • 5W @ 160MHz
    • 2.5W @ 80MHz
    • 1W @ 32MHz
    • and a graph W vs MHz
  2. Could we use the multitasking feature as a simple clock reduction:
    • We can specify up to 16 slots for tasks
    • If an instruction setup "idle mode" for tasks 1, 2 & 3 then:
      • The cog would only get time for the slots when SETASK allocated task 0
      • All other SETASK tasks would "idle" the cog
      • This would reduce the power by effectively reducing the clock to n:16
  3. If Parallax went the P1B route and followed up soon with P2 updated to 65nm or similar
    • Most instructions are already in Verilog (opcode 0xxxxxx)
    • Increase Hub ram size
      • Whatever fits (with less instructions, could be 512KB or more!!!)
    • Reduce hub cycles to 1:8 from 1:16
    • Possibly increase to 160MHz or 200MHz
    • Possibly remove ROM tables and use RAM like P2
    • Possibly use SPI Flash instead of EEPROM
    • This could likely be done by Chip in a couple/few of weeks

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-02 19:16
    reserved for summary
  • jmgjmg Posts: 15,171
    edited 2014-04-02 19:34
    I doubt Ken, or Chip would get too excited by the diversion of a P1B, and once this is out there, it will be compared with a P2 and dilute the P2 focus....

    but the important details are going to be

    * How many pins - there will need to be a manually designed PAD ring.
    * What target Price - this sets the die area, which sets the RAM
    * Is this /4 or /1 clocked - users are going to expect a x1 clock on any new device, but x4 is somewhat simpler.
  • RossHRossH Posts: 5,454
    edited 2014-04-02 19:39
    Resuming from the other thread ...

    I'd be happy with 48 external I/O pins - remember that nearly all the board designs for the P1 are compromised by the shortage of only a few pins! If we had 48 pins on the P1, we would have needed only a fraction of the various development boards we ended up with, and life would have been a whole lot easier.

    Then we could simply wire together internally the other 16 pins of the B register, to be used for inter-prop synchronziation and communications.

    Ross.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-02 19:41
    Here are a few interesting items from the P2 that are nice. Some will probably again use too much power, but lets put them on the table.
    Add any features you like, but lets not go overboard.

    Basic P1B (starting point)
    1. 180nm
    2. 48-64 I/O pins with ADC and maybe 10K pullups
    3. 128KB hub ram minimum
    4. 2-4x speed of P1
      • hub 1:8 (was 1:16)
      • 160-200MHz
    5. No ROM except P2 style monitor
    6. Security (Ken raised this) and fuses
    Extra P1C (possible nice additional features from P2)
    1. 256KB hub (or more if space and power ok)
    2. Single clock pipelined instruction ???
      • May support slower clock, say 96/100MHz
      • Needs quad port cog ram
    3. Some additional counter modes
    4. Some additional video modes
      • DAC ???
      • be conscious of power
      • not all cogs ???
    5. WIDE mode ???
    6. Use a block of 128 or 256 Cog ram as CLUT ???
      • Needs quad port cog ram
    7. Hubexec with 4 Wide cache ???
    8. 1 wide read cache ???
    9. Single LIFO 18bits, 16 deep supported by CALLS/RETS
    10. 1 SERA per cog
    11. 1 single bit USB instruction ???
    12. Simple SERDES ???
    13. No multi-tasking, not multi-threading (simpler implementation)
    14. INDA/PTRA ???
    15. CALLRET
    16. No CALL & RET A/B/X/Y
    17. No Cordic, SIN/COS/TAN/ROTATE
    18. MULT & DIV yes/no ???
    19. Delete most of the other special instructions
    20. PortD intercog comms
      • Could be simpler 32 bit I/Os without other hw support
  • David BetzDavid Betz Posts: 14,516
    edited 2014-04-02 20:02
    I'm looking for some sort of hardware assist for CMM to make it run as fast as LMM but I'm afraid it might require an opcode translation table. It would also work best with the quad cache feature that the early P2 design had.

    General features:

    1) expand narrow S/D register fields (4 or 5 bits) into 9 bit fields.
    2) translate a smaller, maybe 6 bit, opcode to a full P1 opcode.
    3) provide a cache to allow back-to-back instruction execution without waiting for a hub slot.
    4) maybe provide a special jump instruction to handle exceptions to the 16 bit instruction set standard encoding.

    This would be like an RDWORDC instruction in the early P2 except that the bits would get spread out into a full 32 bit long written to D.

    This could make it possible to more efficiently use the hub memory on a P1+ processor but still achieve LMM speeds or faster if the cache is added.
  • jazzedjazzed Posts: 11,803
    edited 2014-04-02 20:15
    My list is P1 with these additions:

    64KB HUB RAM
    48 or more IO in a QFP64
    Spin/PASM compatible except no mask ROM
    No SPIN interpreter, no character set, no tables
    MUL,DIV,ENC,DEC functional
    Simple loader not in 64KB memory map
    Possible DIP32 with limited IO
    Pull-up/down per pin
    Limited 12bit ADCs (not all pins!)
    I2C or SPI program storage
    Clocked as fast as possible ;-)

    Added: IOVDD to all IO pins for 5V tolerance if possible.
  • jmgjmg Posts: 15,171
    edited 2014-04-02 21:25
    jazzed wrote: »
    Added: IOVDD to all IO pins for 5V tolerance if possible.

    Wide Vcc would have appeal, it is a growing trend amongst small Micros (ans not so small - Nuvoton have 5V Cortex M4 coming)
    However, I don't think the process supports it.

    jazzed wrote: »
    64KB HUB RAM
    Spin/PASM compatible except no mask ROM
    No SPIN interpreter, no character set, no tables

    What size is the ROM now ?
    A new device should be able to load ROM, into RAM, and still be larger than P1.
    64K may be a bit small ?
  • jazzedjazzed Posts: 11,803
    edited 2014-04-02 21:38
    jmg wrote: »
    64K may be a bit small ?
    I'm interested in running existing programs. The current SPIN interpreter will not address anything beyond 64KB, so I don't see much point in it being bigger ... except for C/C++ programs.

    If HUB RAM could be bigger for little incremental cost (sweat equity, die, power, temperature, etc...) I'm sure it could be used.
  • RossHRossH Posts: 5,454
    edited 2014-04-02 21:39
    jazzed wrote: »
    I'm interested in running existing programs. The current SPIN compiler will not address anything beyond 64KB.

    The SPIN compiler is software, and easily rewritten. I'd opt for 256kb of Hub RAM. Even if the initial version of the Spin compiler can't use it, this would (in 99% of cases) eliminate the need for external SRAM (freeing up even more I/O pins!).

    Ross.

    EDIT: Add reason.
  • jazzedjazzed Posts: 11,803
    edited 2014-04-02 21:51
    Ross, It's all speculation at this point ;-)

    Lots of opportunities have been lost in all corners of this place though. The bleeding has to stop.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-03 01:25
    In my mind, the current design is a P3. It's so far removed from the P2 that went to shuttle run a year or so ago. It deserves to be 3.

    The idea of starting on a fourth design, P1b or whatever, at this point in time seems like crazy talk. We have been waiting years to get this ship launched. Let's not start building another one.
  • RossHRossH Posts: 5,454
    edited 2014-04-03 01:52
    Heater. wrote: »
    In my mind, the current design is a P3. It's so far removed from the P2 that went to shuttle run a year or so ago. It deserves to be 3.

    The idea of starting on a fourth design, P1b or whatever, at this point in time seems like crazy talk. We have been waiting years to get this ship launched. Let's not start building another one.

    But these are not new designs - or at least they are not if we can convince Chip not to "tinker" too much :smile:

    We could have both the P1b or P1c relatively quickly, perhaps followed by the original P2 sometime next year, followed by the P3 sometime thereafter (when Parallax can afford it).

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-03 01:57
    RossH,
    But these are not new designs - or at least they are not if we can convince Chip not to "tinker" too much
    Yeah right.
  • jmgjmg Posts: 15,171
    edited 2014-04-03 02:41
    RossH wrote: »
    But these are not new designs ...

    We could have both the P1b or P1c relatively quickly...

    They are new designs : the IO ring is full manual custom design. Slow and error prone.
    That means the 'relatively quickly' claim has a very skewed 'relatively' - on Glacial time frames, certainly.
  • BaggersBaggers Posts: 3,019
    edited 2014-04-03 05:12
    My 2c for what it's worth,

    I think doing a P1B would be like taking a huge step backwards, and a waste of all the tech Parallax have now with P2, it would also take a while to get to market as all the changes needed, I recon the P2 could still make the next shuttle run and be out quick, and have a bigger better hit than any P1B would make.

    Rated at 80Mhz would halve the 5W usage down to a more usable value, but also have the grunt to allow users to throw more Mhz at it if they need it. rather than halving it to 4 Cogs.

    I just don't think going back to P1B or P1C is a move forward for Parallax.

    Making it backward compatible too isn't a good idea either, the ram and IO count was probably the most major of issues we had to overcome, so making it have more ram etc would involve lots of testing to make sure all the old programs worked, I just think it's too late for P1B now, especially when we're on P2 and others are talking about P3 even!
  • ozpropdevozpropdev Posts: 2,792
    edited 2014-04-03 05:26
    Yet another 2 cents.
    I think P1B would have been a good thing 6 years ago!
    P2 more than compensates for P1B's no show. :)

    Is it just me or did someone leave the door ajar of the Opium Den? :lol:
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2014-04-03 06:40
    Since this is a blue sky thread...

    A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?

    If so, roughly how much horsepower is needed?

    Ok, I'm playing around with this project http://zx80.netai.net/grant/Multicomp/index.html

    Getting to learn how to code in VHDL and getting right inside FPGAs. In a very rough sense, in terms of $ per horsepower and $ per internal ram, the propeller and FPGAs come out rather similar.

    What I am thinking is that rather than just talk and talk about P1B and P1C, it may be actually possible to test the ideas using FPGAs. And if this works, I think the people that make FPGAs can turn them into custom chips if you want them.

    This stuff needn't be dreaming. Or asking someone else to do all the work...
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-04-03 06:47
    Dr_Acula wrote: »
    Since this is a blue sky thread...

    A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?

    If so, roughly how much horsepower is needed?

    Ok, I'm playing around with this project http://zx80.netai.net/grant/Multicomp/index.html

    Getting to learn how to code in VHDL and getting right inside FPGAs. In a very rough sense, in terms of $ per horsepower and $ per internal ram, the propeller and FPGAs come out rather similar.

    What I am thinking is that rather than just talk and talk about P1B and P1C, it may be actually possible to test the ideas using FPGAs. And if this works, I think the people that make FPGAs can turn them into custom chips if you want them.

    This stuff needn't be dreaming. Or asking someone else to do all the work...

    Drac,

    Sign up here and you will find out soon!

    Ale has a treat coming for those that want to play!!
  • 4x5n4x5n Posts: 745
    edited 2014-04-03 10:59
    RossH wrote: »
    Resuming from the other thread ...

    I'd be happy with 48 external I/O pins - remember that nearly all the board designs for the P1 are compromised by the shortage of only a few pins! If we had 48 pins on the P1, we would have needed only a fraction of the various development boards we ended up with, and life would have been a whole lot easier.

    Then we could simply wire together internally the other 16 pins of the B register, to be used for inter-prop synchronziation and communications.

    Ross.

    Interesting idea. I have to admit when I first got through this post I thought to myself "instead of 48 IO pins, using half of the pins in the port would be better!" Then I did the math! :nerd:

    With the ability of running C programs larger then the 32KB of internal hub ram has made that less of an issue but of course more would be better. For a lot of us the 16 extra pins and the inter-cog communication would go a long way and should be relatively inexpensive to make.
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-04-03 11:29
    Am I missing something. (And/or my memory may be failing me)

    I thought the P1 was pretty much a lsot design as far as any expansion or updating or follow on product were concerned due to the way it was designed and laid out and/or fabricated. As many P1s can be cranked out as wanted but any changes are next to impossible WITHOUT totally redoing it in Verilog.

    How is that easy, inexpensive or a short cut or short term solution? Take the P2 design and back it up until it looks more like a P1? The end result is a totally new design that needs to be tested for basic functionality and then made sure it is 100% backward compatible with the existing P1. Plus, it's new fab costs would be similar to the 180nm fab costs for the P2.

    Where am I missing the quicker to market and cheaper to produce part of this? How much will a P1C+ cost to the consumer? Does that squeeze the P2 pricing when the P2 comes out? How much does it add to Parallax's expenses in NRE, cost to inventory a 2nd (soon after to be 3rd) chip in the line? How much does it cost to produce the documentation for a P1C+? What else has to be redone (or duplicated) to support the P1C+ which will then have to be redone (or duplicated) to support the P2? How many Parallax human resources are tied up producing the P1C+ that aren't able to work on other projects (like the P2) and what kinds of delays and stress will that put on the companies limited resources?

    How can any of this be done relatively quickly or relatively inexpensively?

    You folks got the technical ideas, do you also have a financial plan that you can pitch to Ken to show him how this will work without seriously derailing other efforts (including the P2)?

    If the underlying business can't support it an maintain itself, the best P1C+ design in the world won't fly very far.

    5 watts is looking pretty good about now to me! :smile:
  • jmgjmg Posts: 15,171
    edited 2014-04-03 12:15
    Dr_Acula wrote: »
    Since this is a blue sky thread...

    A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?

    The P2 is only partially emulated on a FPGA
    Likewise for the P1, The PLL's, RC Osc, and any (quasi) Analog pin functions are custom and not emulated in FPGA.

    Ale has a partial design, and says a Cyclone V, A2, can run at 80MHz (/4 opcodes = 20 MOPS) and gives ~160K of RAM
    I think this fits on Arrow's Bemicro CV (Cyclone V-based), not sure how much headroom, for missing features.
  • jmgjmg Posts: 15,171
    edited 2014-04-03 12:18
    mindrobots wrote: »
    Do you also have a financial plan that you can pitch to Ken to show him how this will work without seriously derailing other efforts (including the P2)?

    That is the key issue, and I think a P1C cannot really fly, given their resources. P2 is far from a 'drop dead' problem.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-03 13:01
    @mindrobots,

    As far as I know, you are correct. The P1 was not designed using Verilog.

    There is no "going back" to the P1. There is no going back to P1 plus whatever features of P2 you like.

    Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.

    It's not tenable.

    @Dr A,

    This is not the time for "blue sky". This is is where the rubber hits the road. Concept meets reality.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2014-04-03 15:10
    Drac,

    Sign up here and you will find out soon!

    Ale has a treat coming for those that want to play!!

    I love this forum! I post an idea - just a pipedream, go to sleep, wake up 8 hours later and someone has done what I asked for last week. How good is that?!

    jmg said
    Likewise for the P1, The PLL's, RC Osc, and any (quasi) Analog pin functions are custom and not emulated in FPGA.

    Looking at the specs, maybe not everything can be emulated exactly the same way. But some things might even work out better - eg more pins, faster clock speed and more memory. Looking at what is in Opencores, there are other options too - eg if you think of a "cog" as being a self contained block, you can consider USB drivers, ethernet drivers, serial drivers etc as cogs too. There would be some very interesting synergies between propeller and fpga. At the very least, having more pins would be useful.

    heater said
    Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.

    Looking at what Ale is doing, we could be experimenting with this right now.
  • 4x5n4x5n Posts: 745
    edited 2014-04-03 15:18
    Heater. wrote: »
    @mindrobots,

    As far as I know, you are correct. The P1 was not designed using Verilog.

    There is no "going back" to the P1. There is no going back to P1 plus whatever features of P2 you like.

    Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.

    It's not tenable.

    @Dr A,

    This is not the time for "blue sky". This is is where the rubber hits the road. Concept meets reality.

    I don't think the P2 as it currently is should be given up on. I don't however see a reason for giving up on the P1. From day one of the P1 (I wasn't involved with micros back then) there was an implied promise that there would be a second version with 64 IO pins. I don't see why releasing that promised chip is "going back". I understand that Parallax is a smallish "mom and pop" company with limited resources but I don't see why the P1 should be abandoned. It's not uncommon for chip manufacturers to maintain multiple lines of chips. Let the P1a &P1b be the low powered chips for mobile / low power designs and the P2 for more demanding situation where power consumption is far less a concern. Back in my EE days the company I worked for used the board our of the Radio Shack CoCo (Color Computer) with "interface or IO boards attached as our "micro controller". While that was during the late '80s I know that consumed more then 5W of power and they ran at under 2MHZ. These days a switching power supply capable of supplying 10+W is trivial.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-03 16:26
    Chip has aparently said doing a P1 now in verilog would be quite easy and quick. So a P1B or P1C would not be much harder - just add ADC. Much of the verilog can be culled from the P2, giving P1C some of P2s improvemdntsas well.

    IMHO the P1C wouldhave its own market somewherebetween the P1 & P2. It is more a matter as to whether Parallax could fund P1C and the P2 and what delays to P2 would result.

    I know a number of us could use a P1C now.
  • jmgjmg Posts: 15,171
    edited 2014-04-03 16:27
    Dr_Acula wrote: »
    Looking at what Ale is doing, we could be experimenting with this right now.

    Yes, it seems well matched to the Arrow's Bemicro CV (Cyclone V-based)

    I did find this
    http://www.altera.com/literature/an/an661.pdf
    which I think means you can reconfigure the PLLs which would open more clock control ?

    From memory, Parallax do have 180nm test cells for things like PLL and DACs, but I think not actually OnSemi proven, so it could be a good idea to do an OnSemi shuttle run, to prove those for P2 anyway.
  • jmgjmg Posts: 15,171
    edited 2014-04-03 16:33
    Cluso99 wrote: »
    IMHO the P1C would have its own market somewhere between the P1 & P2. It is more a matter as to whether Parallax could fund P1C and the P2 and what delays to P2 would result.

    One detail that may give this legs, would be if an on-chip regulator cell could be used, to permit a P1 bonding version.
    The larger pin count parts, would bring out the VCore to allow more MHz via less chip self heating.
Sign In or Register to comment.