P1B & P1C (P1B+) A possible alternative if the P2 cannot fly yet
Cluso99
Posts: 18,069
From the "We're looking at 5 Watts in a BGA" thread
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!
we were discussing options for possible changes to P2 to reduce power consumption.
Part of this was a possibility of doing a P1B in the short term, followed by P2 later in a smaller geometry.
The P1B basic is fairly well understood, but perhaps there is a possibility of leveraging some more P2 features without going overboard. I have dubbed this the P1C (previously P1B+).
Here is a quote from the thread that likely started this discussion...
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!?p=1255125&viewfull=1#post1255125
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!
we were discussing options for possible changes to P2 to reduce power consumption.
Part of this was a possibility of doing a P1B in the short term, followed by P2 later in a smaller geometry.
The P1B basic is fairly well understood, but perhaps there is a possibility of leveraging some more P2 features without going overboard. I have dubbed this the P1C (previously P1B+).
Here is a quote from the thread that likely started this discussion...
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!?p=1255125&viewfull=1#post1255125
Last night Seairth, RossH and I met for dinner in Sydney. Seairth was visiting for work, so we couldn't miss the opportunity to catch up
A few things came out of that:And IIRC it was in that order. So not much more than the proposed original Prop 1B.
- What we all originally wanted was a few more I/O pins - 64 would be way more than adequate.
- ADC on some/all I/O pins.
- More HUB RAM.
- Faster clock speed.
All of us would be happy with this first and keep the P2 for a 65nm followup.
I described my idea for AUX / COG RAM. They liked the idea.Regarding Speed & Cogs:
- Make COG RAM WIDE
- RD/WRWIDE would take 1 clock to transfer 8 longs, sync'd to hub
- Ditch AUX RAM
- Permit a block of 128 or 256 Cog longs be used as CLUT
- Requires a simple MUX on the Instruction read port for the block(s)
- A simple instruction to allocate the block(s) to CLUT useage
- Increase the LIFO depth for CALL/RET
- Simpler design uses less transistors than AUX as only 18 bits wide.
- Saves 0.20mm2 x8 = 1.6mm silicon
- Maybe that could be used to add 128KB extra HUB RAM (128KB=1.76mm2)
- We all agreed this now makes sense because
- HUBEXEC removes the critical shortage of COG RAM
- Simpler to use and describe
- Saves a tiny amount of power x8
An improved P1B
- None of us liked reducing cogs to 4
- Not all cogs need to be equal
- We did agree, although against previous Prop philosophy, that not all cogs need to be equal
- Only 1 or 2 cogs are likely to really run large programs using HUBEXEC
- None of us are intending to use these cogs for multitasking or multithreading as provided by P2
- IIRC Ross said he would implement his own software model for multitasking/multithreading
- Only 1 or 2 cogs are likely to use Video
- Most cogs will be used to perform intelligent I/O in software
- Not all cogs are equal now (Video DAC pins)
- This gives scope to make two types of cogs, reducing appropriate features from each set.
- We do not actually require the current speed even tho' we like it
- Means we can trade power for speed
- Power vs die geometry
- 5W at 1.8V = 2.8A (180nm)
- 2.5W at 1.2V = 2.1A (65/90nm???)
- We do question the methodology behind the 5W calculations
- Not all features of the chip will be in use at the same time
- Not all 8 videos will be active concurrently
- Not all maths routines will be active concurrently
- What other logic blocks are not active concurrently?
- How much of the instruction block is not active concurrently?
- What blocks use the power?
Overnight I have added these thoughts
- I like the P2 being multi-specified:
- 5W @ 160MHz
- 2.5W @ 80MHz
- 1W @ 32MHz
- and a graph W vs MHz
- Could we use the multitasking feature as a simple clock reduction:
- We can specify up to 16 slots for tasks
- If an instruction setup "idle mode" for tasks 1, 2 & 3 then:
- The cog would only get time for the slots when SETASK allocated task 0
- All other SETASK tasks would "idle" the cog
- This would reduce the power by effectively reducing the clock to n:16
- If Parallax went the P1B route and followed up soon with P2 updated to 65nm or similar
- Most instructions are already in Verilog (opcode 0xxxxxx)
- Increase Hub ram size
- Whatever fits (with less instructions, could be 512KB or more!!!)
- Reduce hub cycles to 1:8 from 1:16
- Possibly increase to 160MHz or 200MHz
- Possibly remove ROM tables and use RAM like P2
- Possibly use SPI Flash instead of EEPROM
- This could likely be done by Chip in a couple/few of weeks
Comments
but the important details are going to be
* How many pins - there will need to be a manually designed PAD ring.
* What target Price - this sets the die area, which sets the RAM
* Is this /4 or /1 clocked - users are going to expect a x1 clock on any new device, but x4 is somewhat simpler.
I'd be happy with 48 external I/O pins - remember that nearly all the board designs for the P1 are compromised by the shortage of only a few pins! If we had 48 pins on the P1, we would have needed only a fraction of the various development boards we ended up with, and life would have been a whole lot easier.
Then we could simply wire together internally the other 16 pins of the B register, to be used for inter-prop synchronziation and communications.
Ross.
Add any features you like, but lets not go overboard.
Basic P1B (starting point)
- 180nm
- 48-64 I/O pins with ADC and maybe 10K pullups
- 128KB hub ram minimum
- 2-4x speed of P1
- hub 1:8 (was 1:16)
- 160-200MHz
- No ROM except P2 style monitor
- Security (Ken raised this) and fuses
Extra P1C (possible nice additional features from P2)General features:
1) expand narrow S/D register fields (4 or 5 bits) into 9 bit fields.
2) translate a smaller, maybe 6 bit, opcode to a full P1 opcode.
3) provide a cache to allow back-to-back instruction execution without waiting for a hub slot.
4) maybe provide a special jump instruction to handle exceptions to the 16 bit instruction set standard encoding.
This would be like an RDWORDC instruction in the early P2 except that the bits would get spread out into a full 32 bit long written to D.
This could make it possible to more efficiently use the hub memory on a P1+ processor but still achieve LMM speeds or faster if the cache is added.
64KB HUB RAM
48 or more IO in a QFP64
Spin/PASM compatible except no mask ROM
No SPIN interpreter, no character set, no tables
MUL,DIV,ENC,DEC functional
Simple loader not in 64KB memory map
Possible DIP32 with limited IO
Pull-up/down per pin
Limited 12bit ADCs (not all pins!)
I2C or SPI program storage
Clocked as fast as possible ;-)
Added: IOVDD to all IO pins for 5V tolerance if possible.
Wide Vcc would have appeal, it is a growing trend amongst small Micros (ans not so small - Nuvoton have 5V Cortex M4 coming)
However, I don't think the process supports it.
What size is the ROM now ?
A new device should be able to load ROM, into RAM, and still be larger than P1.
64K may be a bit small ?
If HUB RAM could be bigger for little incremental cost (sweat equity, die, power, temperature, etc...) I'm sure it could be used.
The SPIN compiler is software, and easily rewritten. I'd opt for 256kb of Hub RAM. Even if the initial version of the Spin compiler can't use it, this would (in 99% of cases) eliminate the need for external SRAM (freeing up even more I/O pins!).
Ross.
EDIT: Add reason.
Lots of opportunities have been lost in all corners of this place though. The bleeding has to stop.
The idea of starting on a fourth design, P1b or whatever, at this point in time seems like crazy talk. We have been waiting years to get this ship launched. Let's not start building another one.
But these are not new designs - or at least they are not if we can convince Chip not to "tinker" too much
We could have both the P1b or P1c relatively quickly, perhaps followed by the original P2 sometime next year, followed by the P3 sometime thereafter (when Parallax can afford it).
Ross.
They are new designs : the IO ring is full manual custom design. Slow and error prone.
That means the 'relatively quickly' claim has a very skewed 'relatively' - on Glacial time frames, certainly.
I think doing a P1B would be like taking a huge step backwards, and a waste of all the tech Parallax have now with P2, it would also take a while to get to market as all the changes needed, I recon the P2 could still make the next shuttle run and be out quick, and have a bigger better hit than any P1B would make.
Rated at 80Mhz would halve the 5W usage down to a more usable value, but also have the grunt to allow users to throw more Mhz at it if they need it. rather than halving it to 4 Cogs.
I just don't think going back to P1B or P1C is a move forward for Parallax.
Making it backward compatible too isn't a good idea either, the ram and IO count was probably the most major of issues we had to overcome, so making it have more ram etc would involve lots of testing to make sure all the old programs worked, I just think it's too late for P1B now, especially when we're on P2 and others are talking about P3 even!
I think P1B would have been a good thing 6 years ago!
P2 more than compensates for P1B's no show.
Is it just me or did someone leave the door ajar of the Opium Den?
A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?
If so, roughly how much horsepower is needed?
Ok, I'm playing around with this project http://zx80.netai.net/grant/Multicomp/index.html
Getting to learn how to code in VHDL and getting right inside FPGAs. In a very rough sense, in terms of $ per horsepower and $ per internal ram, the propeller and FPGAs come out rather similar.
What I am thinking is that rather than just talk and talk about P1B and P1C, it may be actually possible to test the ideas using FPGAs. And if this works, I think the people that make FPGAs can turn them into custom chips if you want them.
This stuff needn't be dreaming. Or asking someone else to do all the work...
Drac,
Sign up here and you will find out soon!
Ale has a treat coming for those that want to play!!
Interesting idea. I have to admit when I first got through this post I thought to myself "instead of 48 IO pins, using half of the pins in the port would be better!" Then I did the math! :nerd:
With the ability of running C programs larger then the 32KB of internal hub ram has made that less of an issue but of course more would be better. For a lot of us the 16 extra pins and the inter-cog communication would go a long way and should be relatively inexpensive to make.
I thought the P1 was pretty much a lsot design as far as any expansion or updating or follow on product were concerned due to the way it was designed and laid out and/or fabricated. As many P1s can be cranked out as wanted but any changes are next to impossible WITHOUT totally redoing it in Verilog.
How is that easy, inexpensive or a short cut or short term solution? Take the P2 design and back it up until it looks more like a P1? The end result is a totally new design that needs to be tested for basic functionality and then made sure it is 100% backward compatible with the existing P1. Plus, it's new fab costs would be similar to the 180nm fab costs for the P2.
Where am I missing the quicker to market and cheaper to produce part of this? How much will a P1C+ cost to the consumer? Does that squeeze the P2 pricing when the P2 comes out? How much does it add to Parallax's expenses in NRE, cost to inventory a 2nd (soon after to be 3rd) chip in the line? How much does it cost to produce the documentation for a P1C+? What else has to be redone (or duplicated) to support the P1C+ which will then have to be redone (or duplicated) to support the P2? How many Parallax human resources are tied up producing the P1C+ that aren't able to work on other projects (like the P2) and what kinds of delays and stress will that put on the companies limited resources?
How can any of this be done relatively quickly or relatively inexpensively?
You folks got the technical ideas, do you also have a financial plan that you can pitch to Ken to show him how this will work without seriously derailing other efforts (including the P2)?
If the underlying business can't support it an maintain itself, the best P1C+ design in the world won't fly very far.
5 watts is looking pretty good about now to me!
The P2 is only partially emulated on a FPGA
Likewise for the P1, The PLL's, RC Osc, and any (quasi) Analog pin functions are custom and not emulated in FPGA.
Ale has a partial design, and says a Cyclone V, A2, can run at 80MHz (/4 opcodes = 20 MOPS) and gives ~160K of RAM
I think this fits on Arrow's Bemicro CV (Cyclone V-based), not sure how much headroom, for missing features.
That is the key issue, and I think a P1C cannot really fly, given their resources. P2 is far from a 'drop dead' problem.
As far as I know, you are correct. The P1 was not designed using Verilog.
There is no "going back" to the P1. There is no going back to P1 plus whatever features of P2 you like.
Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.
It's not tenable.
@Dr A,
This is not the time for "blue sky". This is is where the rubber hits the road. Concept meets reality.
I love this forum! I post an idea - just a pipedream, go to sleep, wake up 8 hours later and someone has done what I asked for last week. How good is that?!
jmg said
Looking at the specs, maybe not everything can be emulated exactly the same way. But some things might even work out better - eg more pins, faster clock speed and more memory. Looking at what is in Opencores, there are other options too - eg if you think of a "cog" as being a self contained block, you can consider USB drivers, ethernet drivers, serial drivers etc as cogs too. There would be some very interesting synergies between propeller and fpga. At the very least, having more pins would be useful.
heater said
Looking at what Ale is doing, we could be experimenting with this right now.
I don't think the P2 as it currently is should be given up on. I don't however see a reason for giving up on the P1. From day one of the P1 (I wasn't involved with micros back then) there was an implied promise that there would be a second version with 64 IO pins. I don't see why releasing that promised chip is "going back". I understand that Parallax is a smallish "mom and pop" company with limited resources but I don't see why the P1 should be abandoned. It's not uncommon for chip manufacturers to maintain multiple lines of chips. Let the P1a &P1b be the low powered chips for mobile / low power designs and the P2 for more demanding situation where power consumption is far less a concern. Back in my EE days the company I worked for used the board our of the Radio Shack CoCo (Color Computer) with "interface or IO boards attached as our "micro controller". While that was during the late '80s I know that consumed more then 5W of power and they ran at under 2MHZ. These days a switching power supply capable of supplying 10+W is trivial.
IMHO the P1C wouldhave its own market somewherebetween the P1 & P2. It is more a matter as to whether Parallax could fund P1C and the P2 and what delays to P2 would result.
I know a number of us could use a P1C now.
Yes, it seems well matched to the Arrow's Bemicro CV (Cyclone V-based)
I did find this
http://www.altera.com/literature/an/an661.pdf
which I think means you can reconfigure the PLLs which would open more clock control ?
From memory, Parallax do have 180nm test cells for things like PLL and DACs, but I think not actually OnSemi proven, so it could be a good idea to do an OnSemi shuttle run, to prove those for P2 anyway.
One detail that may give this legs, would be if an on-chip regulator cell could be used, to permit a P1 bonding version.
The larger pin count parts, would bring out the VCore to allow more MHz via less chip self heating.