P1B & P1C (P1B+) A possible alternative if the P2 cannot fly yet

Cluso99 · 2014-04-02 19:15

From the "We're looking at 5 Watts in a BGA" thread
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!
we were discussing options for possible changes to P2 to reduce power consumption.

Part of this was a possibility of doing a P1B in the short term, followed by P2 later in a smaller geometry.

The P1B basic is fairly well understood, but perhaps there is a possibility of leveraging some more P2 features without going overboard. I have dubbed this the P1C (previously P1B+).

Here is a quote from the thread that likely started this discussion...
http://forums.parallax.com/showthread.php/155014-We-re-looking-at-5-Watts-in-a-BGA!?p=1255125&viewfull=1#post1255125

Cluso99 wrote: »

Last night Seairth, RossH and I met for dinner in Sydney. Seairth was visiting for work, so we couldn't miss the opportunity to catch up

A few things came out of that:
What we all originally wanted was a few more I/O pins - 64 would be way more than adequate.

ADC on some/all I/O pins.

More HUB RAM.

Faster clock speed.

And IIRC it was in that order. So not much more than the proposed original Prop 1B.

All of us would be happy with this first and keep the P2 for a 65nm followup.

I described my idea for AUX / COG RAM. They liked the idea.
Make COG RAM WIDE
RD/WRWIDE would take 1 clock to transfer 8 longs, sync'd to hub

Ditch AUX RAM

Permit a block of 128 or 256 Cog longs be used as CLUT
Requires a simple MUX on the Instruction read port for the block(s)

A simple instruction to allocate the block(s) to CLUT useage

Increase the LIFO depth for CALL/RET
Simpler design uses less transistors than AUX as only 18 bits wide.

Saves 0.20mm2 x8 = 1.6mm silicon
Maybe that could be used to add 128KB extra HUB RAM (128KB=1.76mm2)

We all agreed this now makes sense because
HUBEXEC removes the critical shortage of COG RAM

Simpler to use and describe

Saves a tiny amount of power x8

Regarding Speed & Cogs:
None of us liked reducing cogs to 4

Not all cogs need to be equal
We did agree, although against previous Prop philosophy, that not all cogs need to be equal

Only 1 or 2 cogs are likely to really run large programs using HUBEXEC
None of us are intending to use these cogs for multitasking or multithreading as provided by P2

IIRC Ross said he would implement his own software model for multitasking/multithreading

Only 1 or 2 cogs are likely to use Video

Most cogs will be used to perform intelligent I/O in software

Not all cogs are equal now (Video DAC pins)

This gives scope to make two types of cogs, reducing appropriate features from each set.

We do not actually require the current speed even tho' we like it
Means we can trade power for speed

Power vs die geometry
5W at 1.8V = 2.8A (180nm)

2.5W at 1.2V = 2.1A (65/90nm???)

We do question the methodology behind the 5W calculations
Not all features of the chip will be in use at the same time
Not all 8 videos will be active concurrently

Not all maths routines will be active concurrently

What other logic blocks are not active concurrently?

How much of the instruction block is not active concurrently?

What blocks use the power?

An improved P1B
Overnight I have added these thoughts
I like the P2 being multi-specified:
5W @ 160MHz

2.5W @ 80MHz

1W @ 32MHz

and a graph W vs MHz

Could we use the multitasking feature as a simple clock reduction:
We can specify up to 16 slots for tasks

If an instruction setup "idle mode" for tasks 1, 2 & 3 then:
The cog would only get time for the slots when SETASK allocated task 0

All other SETASK tasks would "idle" the cog

This would reduce the power by effectively reducing the clock to n:16

If Parallax went the P1B route and followed up soon with P2 updated to 65nm or similar
Most instructions are already in Verilog (opcode 0xxxxxx)

Increase Hub ram size
Whatever fits (with less instructions, could be 512KB or more!!!)

Reduce hub cycles to 1:8 from 1:16

Possibly increase to 160MHz or 200MHz

Possibly remove ROM tables and use RAM like P2

Possibly use SPI Flash instead of EEPROM

This could likely be done by Chip in a couple/few of weeks

Cluso99 · 2014-04-02 19:16

reserved for summary

jmg · 2014-04-02 19:34

I doubt Ken, or Chip would get too excited by the diversion of a P1B, and once this is out there, it will be compared with a P2 and dilute the P2 focus....

but the important details are going to be

* How many pins - there will need to be a manually designed PAD ring.
* What target Price - this sets the die area, which sets the RAM
* Is this /4 or /1 clocked - users are going to expect a x1 clock on any new device, but x4 is somewhat simpler.

RossH · 2014-04-02 19:39

Resuming from the other thread ...

I'd be happy with 48 external I/O pins - remember that nearly all the board designs for the P1 are compromised by the shortage of only a few pins! If we had 48 pins on the P1, we would have needed only a fraction of the various development boards we ended up with, and life would have been a whole lot easier.

Then we could simply wire together internally the other 16 pins of the B register, to be used for inter-prop synchronziation and communications.

Ross.

Cluso99 · 2014-04-02 19:41

Here are a few interesting items from the P2 that are nice. Some will probably again use too much power, but lets put them on the table.
Add any features you like, but lets not go overboard.

Basic P1B (starting point)

180nm
48-64 I/O pins with ADC and maybe 10K pullups
128KB hub ram minimum
2-4x speed of P1
- hub 1:8 (was 1:16)
- 160-200MHz
No ROM except P2 style monitor
Security (Ken raised this) and fuses

Extra P1C (possible nice additional features from P2)

256KB hub (or more if space and power ok)
Single clock pipelined instruction ???
- May support slower clock, say 96/100MHz
- Needs quad port cog ram
Some additional counter modes
Some additional video modes
- DAC ???
- be conscious of power
- not all cogs ???
WIDE mode ???
Use a block of 128 or 256 Cog ram as CLUT ???
- Needs quad port cog ram
Hubexec with 4 Wide cache ???
1 wide read cache ???
Single LIFO 18bits, 16 deep supported by CALLS/RETS
1 SERA per cog
1 single bit USB instruction ???
Simple SERDES ???
No multi-tasking, not multi-threading (simpler implementation)
INDA/PTRA ???
CALLRET
No CALL & RET A/B/X/Y
No Cordic, SIN/COS/TAN/ROTATE
MULT & DIV yes/no ???
Delete most of the other special instructions
PortD intercog comms
- Could be simpler 32 bit I/Os without other hw support

David Betz · 2014-04-02 20:02

I'm looking for some sort of hardware assist for CMM to make it run as fast as LMM but I'm afraid it might require an opcode translation table. It would also work best with the quad cache feature that the early P2 design had.

General features:

1) expand narrow S/D register fields (4 or 5 bits) into 9 bit fields.
2) translate a smaller, maybe 6 bit, opcode to a full P1 opcode.
3) provide a cache to allow back-to-back instruction execution without waiting for a hub slot.
4) maybe provide a special jump instruction to handle exceptions to the 16 bit instruction set standard encoding.

This would be like an RDWORDC instruction in the early P2 except that the bits would get spread out into a full 32 bit long written to D.

This could make it possible to more efficiently use the hub memory on a P1+ processor but still achieve LMM speeds or faster if the cache is added.

jazzed · 2014-04-02 20:15

My list is P1 with these additions:

64KB HUB RAM
48 or more IO in a QFP64
Spin/PASM compatible except no mask ROM
No SPIN interpreter, no character set, no tables
MUL,DIV,ENC,DEC functional
Simple loader not in 64KB memory map
Possible DIP32 with limited IO
Pull-up/down per pin
Limited 12bit ADCs (not all pins!)
I2C or SPI program storage
Clocked as fast as possible ;-)

Added: IOVDD to all IO pins for 5V tolerance if possible.

jmg · 2014-04-02 21:25

jazzed wrote: »

Added: IOVDD to all IO pins for 5V tolerance if possible.

Wide Vcc would have appeal, it is a growing trend amongst small Micros (ans not so small - Nuvoton have 5V Cortex M4 coming)
However, I don't think the process supports it.

jazzed wrote: »

64KB HUB RAM
Spin/PASM compatible except no mask ROM
No SPIN interpreter, no character set, no tables

What size is the ROM now ?
A new device should be able to load ROM, into RAM, and still be larger than P1.
64K may be a bit small ?

jazzed · 2014-04-02 21:38

jmg wrote: »

64K may be a bit small ?

I'm interested in running existing programs. The current SPIN interpreter will not address anything beyond 64KB, so I don't see much point in it being bigger ... except for C/C++ programs.

If HUB RAM could be bigger for little incremental cost (sweat equity, die, power, temperature, etc...) I'm sure it could be used.

RossH · 2014-04-02 21:39

jazzed wrote: »

I'm interested in running existing programs. The current SPIN compiler will not address anything beyond 64KB.

The SPIN compiler is software, and easily rewritten. I'd opt for 256kb of Hub RAM. Even if the initial version of the Spin compiler can't use it, this would (in 99% of cases) eliminate the need for external SRAM (freeing up even more I/O pins!).

Ross.

EDIT: Add reason.

jazzed · 2014-04-02 21:51

Ross, It's all speculation at this point ;-)

Lots of opportunities have been lost in all corners of this place though. The bleeding has to stop.

Heater. · 2014-04-03 01:25

In my mind, the current design is a P3. It's so far removed from the P2 that went to shuttle run a year or so ago. It deserves to be 3.

The idea of starting on a fourth design, P1b or whatever, at this point in time seems like crazy talk. We have been waiting years to get this ship launched. Let's not start building another one.

RossH · 2014-04-03 01:52

Heater. wrote: »

In my mind, the current design is a P3. It's so far removed from the P2 that went to shuttle run a year or so ago. It deserves to be 3.

The idea of starting on a fourth design, P1b or whatever, at this point in time seems like crazy talk. We have been waiting years to get this ship launched. Let's not start building another one.

But these are not new designs - or at least they are not if we can convince Chip not to "tinker" too much

We could have both the P1b or P1c relatively quickly, perhaps followed by the original P2 sometime next year, followed by the P3 sometime thereafter (when Parallax can afford it).

Ross.

Heater. · 2014-04-03 01:57

RossH,

But these are not new designs - or at least they are not if we can convince Chip not to "tinker" too much

Yeah right.

jmg · 2014-04-03 02:41

RossH wrote: »

But these are not new designs ...

We could have both the P1b or P1c relatively quickly...

They are new designs : the IO ring is full manual custom design. Slow and error prone.
That means the 'relatively quickly' claim has a very skewed 'relatively' - on Glacial time frames, certainly.

Baggers · 2014-04-03 05:12

My 2c for what it's worth,

I think doing a P1B would be like taking a huge step backwards, and a waste of all the tech Parallax have now with P2, it would also take a while to get to market as all the changes needed, I recon the P2 could still make the next shuttle run and be out quick, and have a bigger better hit than any P1B would make.

Rated at 80Mhz would halve the 5W usage down to a more usable value, but also have the grunt to allow users to throw more Mhz at it if they need it. rather than halving it to 4 Cogs.

I just don't think going back to P1B or P1C is a move forward for Parallax.

Making it backward compatible too isn't a good idea either, the ram and IO count was probably the most major of issues we had to overcome, so making it have more ram etc would involve lots of testing to make sure all the old programs worked, I just think it's too late for P1B now, especially when we're on P2 and others are talking about P3 even!

ozpropdev · 2014-04-03 05:26

Yet another 2 cents.
I think P1B would have been a good thing 6 years ago!
P2 more than compensates for P1B's no show.

Is it just me or did someone leave the door ajar of the Opium Den?

Dr_Acula · 2014-04-03 06:40

Since this is a blue sky thread...

A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?

If so, roughly how much horsepower is needed?

Ok, I'm playing around with this project http://zx80.netai.net/grant/Multicomp/index.html

Getting to learn how to code in VHDL and getting right inside FPGAs. In a very rough sense, in terms of $ per horsepower and $ per internal ram, the propeller and FPGAs come out rather similar.

What I am thinking is that rather than just talk and talk about P1B and P1C, it may be actually possible to test the ideas using FPGAs. And if this works, I think the people that make FPGAs can turn them into custom chips if you want them.

This stuff needn't be dreaming. Or asking someone else to do all the work...

mindrobots · 2014-04-03 06:47

Dr_Acula wrote: »

Since this is a blue sky thread...

A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?

If so, roughly how much horsepower is needed?

Ok, I'm playing around with this project http://zx80.netai.net/grant/Multicomp/index.html

Getting to learn how to code in VHDL and getting right inside FPGAs. In a very rough sense, in terms of $ per horsepower and $ per internal ram, the propeller and FPGAs come out rather similar.

What I am thinking is that rather than just talk and talk about P1B and P1C, it may be actually possible to test the ideas using FPGAs. And if this works, I think the people that make FPGAs can turn them into custom chips if you want them.

This stuff needn't be dreaming. Or asking someone else to do all the work...

Drac,

Sign up here and you will find out soon!

Ale has a treat coming for those that want to play!!

4x5n · 2014-04-03 10:59

RossH wrote: »

Resuming from the other thread ...

I'd be happy with 48 external I/O pins - remember that nearly all the board designs for the P1 are compromised by the shortage of only a few pins! If we had 48 pins on the P1, we would have needed only a fraction of the various development boards we ended up with, and life would have been a whole lot easier.

Then we could simply wire together internally the other 16 pins of the B register, to be used for inter-prop synchronziation and communications.

Ross.

Interesting idea. I have to admit when I first got through this post I thought to myself "instead of 48 IO pins, using half of the pins in the port would be better!" Then I did the math! :nerd:

With the ability of running C programs larger then the 32KB of internal hub ram has made that less of an issue but of course more would be better. For a lot of us the 16 extra pins and the inter-cog communication would go a long way and should be relatively inexpensive to make.

mindrobots · 2014-04-03 11:29

Am I missing something. (And/or my memory may be failing me)

I thought the P1 was pretty much a lsot design as far as any expansion or updating or follow on product were concerned due to the way it was designed and laid out and/or fabricated. As many P1s can be cranked out as wanted but any changes are next to impossible WITHOUT totally redoing it in Verilog.

How is that easy, inexpensive or a short cut or short term solution? Take the P2 design and back it up until it looks more like a P1? The end result is a totally new design that needs to be tested for basic functionality and then made sure it is 100% backward compatible with the existing P1. Plus, it's new fab costs would be similar to the 180nm fab costs for the P2.

Where am I missing the quicker to market and cheaper to produce part of this? How much will a P1C+ cost to the consumer? Does that squeeze the P2 pricing when the P2 comes out? How much does it add to Parallax's expenses in NRE, cost to inventory a 2nd (soon after to be 3rd) chip in the line? How much does it cost to produce the documentation for a P1C+? What else has to be redone (or duplicated) to support the P1C+ which will then have to be redone (or duplicated) to support the P2? How many Parallax human resources are tied up producing the P1C+ that aren't able to work on other projects (like the P2) and what kinds of delays and stress will that put on the companies limited resources?

How can any of this be done relatively quickly or relatively inexpensively?

You folks got the technical ideas, do you also have a financial plan that you can pitch to Ken to show him how this will work without seriously derailing other efforts (including the P2)?

If the underlying business can't support it an maintain itself, the best P1C+ design in the world won't fly very far.

5 watts is looking pretty good about now to me!

jmg · 2014-04-03 12:15

Dr_Acula wrote: »

Since this is a blue sky thread...

A question out of left field. If the P2 can be emulated on a FPGA, can the P1 be emulated too?

The P2 is only partially emulated on a FPGA
Likewise for the P1, The PLL's, RC Osc, and any (quasi) Analog pin functions are custom and not emulated in FPGA.

Ale has a partial design, and says a Cyclone V, A2, can run at 80MHz (/4 opcodes = 20 MOPS) and gives ~160K of RAM
I think this fits on Arrow's Bemicro CV (Cyclone V-based), not sure how much headroom, for missing features.

jmg · 2014-04-03 12:18

mindrobots wrote: »

Do you also have a financial plan that you can pitch to Ken to show him how this will work without seriously derailing other efforts (including the P2)?

That is the key issue, and I think a P1C cannot really fly, given their resources. P2 is far from a 'drop dead' problem.

Heater. · 2014-04-03 13:01

@mindrobots,

As far as I know, you are correct. The P1 was not designed using Verilog.

There is no "going back" to the P1. There is no going back to P1 plus whatever features of P2 you like.

Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.

It's not tenable.

@Dr A,

This is not the time for "blue sky". This is is where the rubber hits the road. Concept meets reality.

Dr_Acula · 2014-04-03 15:10

Drac,

Sign up here and you will find out soon!

Ale has a treat coming for those that want to play!!

I love this forum! I post an idea - just a pipedream, go to sleep, wake up 8 hours later and someone has done what I asked for last week. How good is that?!

jmg said

Likewise for the P1, The PLL's, RC Osc, and any (quasi) Analog pin functions are custom and not emulated in FPGA.

Looking at the specs, maybe not everything can be emulated exactly the same way. But some things might even work out better - eg more pins, faster clock speed and more memory. Looking at what is in Opencores, there are other options too - eg if you think of a "cog" as being a self contained block, you can consider USB drivers, ethernet drivers, serial drivers etc as cogs too. There would be some very interesting synergies between propeller and fpga. At the very least, having more pins would be useful.

heater said

Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.

Looking at what Ale is doing, we could be experimenting with this right now.

4x5n · 2014-04-03 15:18

Heater. wrote: »

@mindrobots,

As far as I know, you are correct. The P1 was not designed using Verilog.

There is no "going back" to the P1. There is no going back to P1 plus whatever features of P2 you like.

Seems to me that kind of "hack job" on the P2 to make it look like a P1 ++ will take more months and years.

It's not tenable.

@Dr A,

This is not the time for "blue sky". This is is where the rubber hits the road. Concept meets reality.

I don't think the P2 as it currently is should be given up on. I don't however see a reason for giving up on the P1. From day one of the P1 (I wasn't involved with micros back then) there was an implied promise that there would be a second version with 64 IO pins. I don't see why releasing that promised chip is "going back". I understand that Parallax is a smallish "mom and pop" company with limited resources but I don't see why the P1 should be abandoned. It's not uncommon for chip manufacturers to maintain multiple lines of chips. Let the P1a &P1b be the low powered chips for mobile / low power designs and the P2 for more demanding situation where power consumption is far less a concern. Back in my EE days the company I worked for used the board our of the Radio Shack CoCo (Color Computer) with "interface or IO boards attached as our "micro controller". While that was during the late '80s I know that consumed more then 5W of power and they ran at under 2MHZ. These days a switching power supply capable of supplying 10+W is trivial.

Cluso99 · 2014-04-03 16:26

Chip has aparently said doing a P1 now in verilog would be quite easy and quick. So a P1B or P1C would not be much harder - just add ADC. Much of the verilog can be culled from the P2, giving P1C some of P2s improvemdntsas well.

IMHO the P1C wouldhave its own market somewherebetween the P1 & P2. It is more a matter as to whether Parallax could fund P1C and the P2 and what delays to P2 would result.

I know a number of us could use a P1C now.

jmg · 2014-04-03 16:27

Dr_Acula wrote: »

Looking at what Ale is doing, we could be experimenting with this right now.

Yes, it seems well matched to the Arrow's Bemicro CV (Cyclone V-based)

I did find this
http://www.altera.com/literature/an/an661.pdf
which I think means you can reconfigure the PLLs which would open more clock control ?

From memory, Parallax do have 180nm test cells for things like PLL and DACs, but I think not actually OnSemi proven, so it could be a good idea to do an OnSemi shuttle run, to prove those for P2 anyway.

jmg · 2014-04-03 16:33

Cluso99 wrote: »

IMHO the P1C would have its own market somewhere between the P1 & P2. It is more a matter as to whether Parallax could fund P1C and the P2 and what delays to P2 would result.

One detail that may give this legs, would be if an on-chip regulator cell could be used, to permit a P1 bonding version.
The larger pin count parts, would bring out the VCore to allow more MHz via less chip self heating.

P1B & P1C (P1B+) A possible alternative if the P2 cannot fly yet

Comments