P1V specs for "possible silicon version" ???
Cluso99
Posts: 18,069
What would a P1V require that "could" be used as the basis for a "silicon" version?
To minimise power...
It could serve as an excellent test for the new process with little risk of failure.
- 160nm (same as projected for P2)
- 160-200MHz (at least 2x P1)
- 64 I/O (at least some with basic analog support)
- 4KB Minimal boot ROM/FLASH with...
- Security Fuses and code (from P2)
- Monitor (simple version from P2)
- Boot from external EEPROM or SPI FLASH (determine by pullup/pulldown?)
- If OnSemi process can support internal FLASH this would be even better.
- 128-512KB Hub RAM
- Hub access in single clock (P1 is 2 clocks)
- Hub 32 bits wide (same as P1)
- 10 COGs as follows...
- 8 standard COGs support...
- 2KB COG RAM (single port as per P1)
- 1:16 hub access (as per P1)
- Ports A & B (with additional mask registers - see P1V discussions)
- Counters (as per P1)
- No VGA/Composite (unless stated otherwise - only on 2 super COGs)
- 2 COGs could have an extra 2 basic counters (using unused VGA register space)
- 2 COGs could have 2 simple serial 8-32/36 bit USARTs (using unused VGA register space)
- 2 super COGs support (in addition to the standard COGs support)...
- 4-8KB Dual Port COG RAM
- Pipelined for 2 clock instructions (as per latest P2 test version - ie extra 2x faster than standard COGs)
- 1:4 hub access (4x faster access than standard COGs)
- P1 instruction set plus...
- MUL/MULS
- AUGxx/LDPTR/STPTR/CALLx, COG RAM Stack etc
- provides limited extended cog ram data/execution and hubexec support
- VGA/Composite (as per P1) - Needs another counter/PLL ???
- 4-8KB Dual Port COG RAM
- 8 standard COGs support...
Hub Clocks: -0-1-2-3-4-5-6-7-8-9-A-B-C-D-E-F- Cog 0: . 1 . . . . . . . . . . . . . . Cog 1: . . . 3 . . . . . . . . . . . . Cog 2: . . . . . 5 . . . . . . . . . . Cog 3: . . . . . . . 7 . . . . . . . . Cog 4: . . . . . . . . . 9 . . . . . . Cog 5: . . . . . . . . . . . B . . . . Cog 6: . . . . . . . . . . . . . D . . Cog 7: . . . . . . . . . . . . . . . F Cog 8: (super) 0 . . . 4 . . . 8 . . . C . . . Cog 9: (super) . . 2 . . . 6 . . . A . . . E .
To minimise power...
- Hub RAM divided into 16KB ?? blocks to minimise power
- ALU divided into blocks to minimise power
It could serve as an excellent test for the new process with little risk of failure.
Comments
(also the P2 development time has ben as far as I know at least 8 years). Putting something into silicon from
FPGA designs seems to be much more complex - even if there are companies offering FPGA to ASIC conversion.
PS: I would like one then
Total 8 * 40 + 2 * 80 = 480 MIPs.
A super COG running hubexec without buffering with hub 1:4 and 2 clock instructions would yield 40 MIPs max.
A super COG with unravelled RDLONG or WRLONG could maintain 4 bytes each 4 clocks yielding 160MB/s transfer bursts.
The problem here is the NRE costs are very large, and the risk is non-zero, and such a device would encroach into P2 market.
FLASH adds a lot to the FAB layers, Mask costs and NRE (and risk). Devices cost more.
It can only be interim if it is footprint compatible with P2. Problem is then you have two devices fighting for the same sockets, which makes little sense.
Once you have Verilog code, the risk of failure is related to test coverage, so is never zero.
To me, the smartest use of P1V is to act as a test vehicle for opcodes and Verilog IO extensions (such as USB Minimal HW and SmartPins )
Chip has earlier indicated he likes the idea of testing the SmartIO pins on P1V, as most of the work there is in application coverage and config confirmation and actual running code.
The actual die is the cheapest part of the retail cost. Using the OnSemi recommended contractors should minimise the risk. I am presuming OnSemi can do flash. It would definately be worthwhile to revisit this exercise. Remember, with small internal flash, an external flash chip to boot may not be required, saving cost. The internal flash might then boot from SD card. The boot could be faster because it wouldn't necessarily have to wait to check the serial line, etc. Part of the flash would be used for the security fuse bits. Not sure the fuses were able to be checked with OnSemi's process. I don't mean interim as will be made obsolete. I mean as a chip positioned as a super P1 (renamed P2) and the P2 will have the additional time to test and add all the features required for a renamed P3. You mean Chip should abandon the P2 ??? Doesn't preclude any of this.
There are other, lower cost-impact choices, like OTP (which can also do the fuses).
Another approach, given OnSemi already have the die, is a stacked die design where the lowest cost Logic process is used for the main die, and a Serial Loader flip bonded inside the package.
Looks like they can do 2Mb in 2x3mm package ~30c/2k ( not QuadSPI ?)
Or, the choice could be to allow just one-COGs stage-1 boot, in a smaller, lower cost serial EE.
I see Verical quote $0.0408 for a small CAT24C16VP2I
That could allow Parallax to offer two part variants, from a single die investment.
I'm still fond of the idea of making a modular chip, one where not everything is on a single die, but rather offer various options (ram, flash, quality DAC/ADCs, pin count, etc) and mount the dies for each component to a given boards via COB (Chip-On-Board https://learn.sparkfun.com/tutorials/how-chip-on-boards-are-made) or perhaps flip chip. The end product might be more expensive than if it was all on one die (although there may be a benefit in terms of yield if you can test each component before it's mounted Vs. having to throw out an entire chip if everything was on one die), but it's the cheapest way in terms of development cost to offer various chips that share a common uC core. Even modern CPUs use this technique (and have for a while) where a separate CPU and cache/memory die are mounted to the same packaging substrate. Don't need 128 I/Os? Simple! Just mount the universal uC die to a packaging substrate with less pins broken out!
As far as additional features go, one of my top requests would probably be a very simple thread scheduler (maybe a max of 4 threads?) per cog. Shouldn't this be very doable since the execution pipeline of the P1V is very simple? Another thing that crossed my mind is soft-peripheral helper instructions. Instead of implementing something like, say, I2C in hardware, what would be some instructions that would greatly aid in increasing performance of common peripherals? How generic could they be so that their potential usefulness would be broad? I think such features would still be in-line with the ethos of the Propeller design while making it vastly more capable.