Shop OBEX P1 Docs P2 Docs Learn Events
P1V specs for "possible silicon version" ??? — Parallax Forums

P1V specs for "possible silicon version" ???

Cluso99Cluso99 Posts: 18,069
edited 2014-10-10 21:03 in Propeller 1
What would a P1V require that "could" be used as the basis for a "silicon" version?
  • 160nm (same as projected for P2)
  • 160-200MHz (at least 2x P1)
  • 64 I/O (at least some with basic analog support)
  • 4KB Minimal boot ROM/FLASH with...
    • Security Fuses and code (from P2)
    • Monitor (simple version from P2)
    • Boot from external EEPROM or SPI FLASH (determine by pullup/pulldown?)
    • If OnSemi process can support internal FLASH this would be even better.
  • 128-512KB Hub RAM
    • Hub access in single clock (P1 is 2 clocks)
    • Hub 32 bits wide (same as P1)
  • 10 COGs as follows...
    • 8 standard COGs support...
      • 2KB COG RAM (single port as per P1)
      • 1:16 hub access (as per P1)
      • Ports A & B (with additional mask registers - see P1V discussions)
      • Counters (as per P1)
      • No VGA/Composite (unless stated otherwise - only on 2 super COGs)
      • 2 COGs could have an extra 2 basic counters (using unused VGA register space)
      • 2 COGs could have 2 simple serial 8-32/36 bit USARTs (using unused VGA register space)
    • 2 super COGs support (in addition to the standard COGs support)...
      • 4-8KB Dual Port COG RAM
        • Pipelined for 2 clock instructions (as per latest P2 test version - ie extra 2x faster than standard COGs)
      • 1:4 hub access (4x faster access than standard COGs)
      • P1 instruction set plus...
        • MUL/MULS
        • AUGxx/LDPTR/STPTR/CALLx, COG RAM Stack etc
          • provides limited extended cog ram data/execution and hubexec support
      • VGA/Composite (as per P1) - Needs another counter/PLL ???
Hub Access...
Hub Clocks:   -0-1-2-3-4-5-6-7-8-9-A-B-C-D-E-F-
Cog 0:         . 1 . . . . . . . . . . . . . .
Cog 1:         . . . 3 . . . . . . . . . . . .
Cog 2:         . . . . . 5 . . . . . . . . . .
Cog 3:         . . . . . . . 7 . . . . . . . .
Cog 4:         . . . . . . . . . 9 . . . . . .
Cog 5:         . . . . . . . . . . . B . . . .
Cog 6:         . . . . . . . . . . . . . D . .
Cog 7:         . . . . . . . . . . . . . . . F
Cog 8: (super) 0 . . . 4 . . . 8 . . . C . . .
Cog 9: (super) . . 2 . . . 6 . . . A . . . E .



To minimise power...
  • Hub RAM divided into 16KB ?? blocks to minimise power
  • ALU divided into blocks to minimise power
Would this spec serve as a P2 (interim) replacement?
It could serve as an excellent test for the new process with little risk of failure.

Comments

  • 65816581 Posts: 132
    edited 2014-10-10 00:33
    Sounds pretty good for me, but I think that it is rather not an option for Parallax to bring out something
    (also the P2 development time has ben as far as I know at least 8 years). Putting something into silicon from
    FPGA designs seems to be much more complex - even if there are companies offering FPGA to ASIC conversion.

    PS: I would like one then ;)
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-10-10 00:37
    160MHz would give a standard COG 40 MIPs and a super COG 80 MIPs.
    Total 8 * 40 + 2 * 80 = 480 MIPs.

    A super COG running hubexec without buffering with hub 1:4 and 2 clock instructions would yield 40 MIPs max.

    A super COG with unravelled RDLONG or WRLONG could maintain 4 bytes each 4 clocks yielding 160MB/s transfer bursts.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-10-10 00:41
    6581 wrote: »
    Sounds pretty good for me, but I think that it is rather not an option for Parallax to bring out something
    (also the P2 development time has ben as far as I know at least 8 years). Putting something into silicon from
    FPGA designs seems to be much more complex - even if there are companies offering FPGA to ASIC conversion.

    PS: I would like one then ;)
    Actually P2 was being designed with Verilog and verified in FPGA. The failed P2 was done using a mix of this process and hand laid sections, and it was this overlap that caused the failure by shorting the joins.
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-10-10 06:55
    So when will the Kickstarter campaign start? I understand there's a chip designer that might be available who's familiar with the Propeller. :)
  • jmgjmg Posts: 15,173
    edited 2014-10-10 16:26
    Cluso99 wrote: »
    What would a P1V require that "could" be used as the basis for a "silicon" version?

    The problem here is the NRE costs are very large, and the risk is non-zero, and such a device would encroach into P2 market.
    Cluso99 wrote: »
    [*]If OnSemi process can support internal FLASH this would be even better.

    FLASH adds a lot to the FAB layers, Mask costs and NRE (and risk). Devices cost more.
    Cluso99 wrote: »
    Would this spec serve as a P2 (interim) replacement?
    It could serve as an excellent test for the new process with little risk of failure.

    It can only be interim if it is footprint compatible with P2. Problem is then you have two devices fighting for the same sockets, which makes little sense.

    Once you have Verilog code, the risk of failure is related to test coverage, so is never zero.

    To me, the smartest use of P1V is to act as a test vehicle for opcodes and Verilog IO extensions (such as USB Minimal HW and SmartPins )

    Chip has earlier indicated he likes the idea of testing the SmartIO pins on P1V, as most of the work there is in application coverage and config confirmation and actual running code.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-10-10 18:16
    jmg wrote: »
    The problem here is the NRE costs are very large, and the risk is non-zero, and such a device would encroach into P2 market.
    My thoughts are that this would be a Parallax chip. They understand their market, and from discussions on the P2 forum, this may solve the commercial P2 requirements.
    FLASH adds a lot to the FAB layers, Mask costs and NRE (and risk). Devices cost more.
    I don't know how many extra layers are required, nor mask and NRE costs. Using OnSemi's process and designers should minimise the risk.
    The actual die is the cheapest part of the retail cost. Using the OnSemi recommended contractors should minimise the risk. I am presuming OnSemi can do flash. It would definately be worthwhile to revisit this exercise. Remember, with small internal flash, an external flash chip to boot may not be required, saving cost. The internal flash might then boot from SD card. The boot could be faster because it wouldn't necessarily have to wait to check the serial line, etc. Part of the flash would be used for the security fuse bits. Not sure the fuses were able to be checked with OnSemi's process.
    It can only be interim if it is footprint compatible with P2. Problem is then you have two devices fighting for the same sockets, which makes little sense.
    I don't mean interim as will be made obsolete. I mean as a chip positioned as a super P1 (renamed P2) and the P2 will have the additional time to test and add all the features required for a renamed P3.
    Once you have Verilog code, the risk of failure is related to test coverage, so is never zero.
    You mean Chip should abandon the P2 ???
    To me, the smartest use of P1V is to act as a test vehicle for opcodes and Verilog IO extensions (such as USB Minimal HW and SmartPins )

    Chip has earlier indicated he likes the idea of testing the SmartIO pins on P1V, as most of the work there is in application coverage and config confirmation and actual running code.
    Doesn't preclude any of this.
  • jmgjmg Posts: 15,173
    edited 2014-10-10 18:50
    Cluso99 wrote: »
    I am presuming OnSemi can do flash. It would definately be worthwhile to revisit this exercise. Remember, with small internal flash, an external flash chip to boot may not be required, saving cost. The internal flash might then boot from SD card. The boot could be faster because it wouldn't necessarily have to wait to check the serial line, etc. Part of the flash would be used for the security fuse bits. Not sure the fuses were able to be checked with OnSemi's process.
    OnSemi certainly can make flash, but if you are meaning for a small boot-level use, Full Flash is probably not cost effective.
    There are other, lower cost-impact choices, like OTP (which can also do the fuses).
    Another approach, given OnSemi already have the die, is a stacked die design where the lowest cost Logic process is used for the main die, and a Serial Loader flip bonded inside the package.
    Looks like they can do 2Mb in 2x3mm package ~30c/2k ( not QuadSPI ?)
    Or, the choice could be to allow just one-COGs stage-1 boot, in a smaller, lower cost serial EE.
    I see Verical quote $0.0408 for a small CAT24C16VP2I



    That could allow Parallax to offer two part variants, from a single die investment.
  • markmark Posts: 252
    edited 2014-10-10 20:21
    RE: Flash. It doesn't have to be on-die. I've seen where dies were stacked on top of flash chips and then wire bonded together, or in some cases, the chip die was in one corner of the packaging, and the flash die in the other - also wire bonded together. Just a matter of finding an IC packaging house with those capabilities. (edit: I see JMG already got around to this)

    I'm still fond of the idea of making a modular chip, one where not everything is on a single die, but rather offer various options (ram, flash, quality DAC/ADCs, pin count, etc) and mount the dies for each component to a given boards via COB (Chip-On-Board https://learn.sparkfun.com/tutorials/how-chip-on-boards-are-made) or perhaps flip chip. The end product might be more expensive than if it was all on one die (although there may be a benefit in terms of yield if you can test each component before it's mounted Vs. having to throw out an entire chip if everything was on one die), but it's the cheapest way in terms of development cost to offer various chips that share a common uC core. Even modern CPUs use this technique (and have for a while) where a separate CPU and cache/memory die are mounted to the same packaging substrate. Don't need 128 I/Os? Simple! Just mount the universal uC die to a packaging substrate with less pins broken out!


    As far as additional features go, one of my top requests would probably be a very simple thread scheduler (maybe a max of 4 threads?) per cog. Shouldn't this be very doable since the execution pipeline of the P1V is very simple? Another thing that crossed my mind is soft-peripheral helper instructions. Instead of implementing something like, say, I2C in hardware, what would be some instructions that would greatly aid in increasing performance of common peripherals? How generic could they be so that their potential usefulness would be broad? I think such features would still be in-line with the ethos of the Propeller design while making it vastly more capable.
  • jmgjmg Posts: 15,173
    edited 2014-10-10 21:03
    [QUOTE=mark
Sign In or Register to comment.