Consensus on the P16X32B?

mindrobots · 2014-04-05 16:59

rogloh wrote: »

My vote is no. Because to make this device perform well, you will have to bring in a bunch features from the P2 project like hubexec and wides, autoincrementing memory pointers etc. My guess is this will ultimately turn into another P2 development before it is done. It won't be as quick as people think.

Whatever it turns into, I guarantee it won't be as quick to delivery as people think. Not being negative just realistic.

tonyp12 · 2014-04-05 17:43

>34bit P1 cog longs
C.W. Yes I forgot that the COG code is first stored in hub,
So that would make it hard but not impossible, something like every 16 long have the upper pair bits to be ORd to the 32bit already there.

So instead I propose Switching Banks that is done with jmp and the wc flag,
You would need to arrange your code as two distinctive program blocks each at org 0 and then jump between then with jmp #label wc (but the compiler handles that).

jmpret will work with some care, you could make bank2 only have subroutines that always return to bank1
as you can not determine if ret should have wc or not at compile time.

Any bank switching is done just before jmpret writes the return address (pc+1) to the new bank ret location.
I actually like the idea of two windows in PropTool, main routine in one and bank2 subroutines in the second

There is only one set of special registers, so only 1.94kb ram of die space is needed for a second set of 496longs.
And only one set of shadow registers too, that could be used for some lite data sharing between banks programs.

Tubular · 2014-04-05 17:55

I'll have a good think about this while enjoying the sunshine today. I assume we could have code protection though its not mentioned in RossH's original post, I think it was mentioned elsewhere.

Like Rogloh I suspect there is a very big elephant in the room in the P2 design already completed and its features will be plucked across to the p1e. That's not an altogether bad thing in some respects.

Power dissipation may still have to be dealt with, given the smaller and hotter process.

Electrodude · 2014-04-05 18:05

Nothing from the P2 should end up in a P1E except maybe quads and dacs. Otherwise feature creep will make us end up with two almost identical P2s, both of which draw 8W of power and really should be made in 65nm but can't yet due to lack of money. The only reason I support the P1E is because it might avoid the danger of a 4 cog P2.

tonyp12 wrote: »

>34bit P1 cog longs
...
Maybe Bank-Switching is simpler and if you don't want to do bank switching you stick to one bank and be happy with the 496longs.

Or we could just wait for the P3 or 4 or 5, which will be 64 bit and have instructions that look like %iiiiiiiiiiii_zcp_ril_cccccccc_ddddddddddddddddddd_sssssssssssssssssss.
12 bit instructions, 3 flags (parity?), optional wr, immediate source and destination flags, 8 conditional bits to accommodate all 3 flags, and 19 bit D and S. It would probably be easier to do LMM the hard P1 way than have to worry about loading 34 bit instructions from 32 bit hubram. Bank switching is the worst thing that could ever possibly happen to a processor, as we have seen from the x86 and for other reasons.

But how about we finish the P1E or P2 first? I am almost completely certain that no 32 bit propeller will ever have more than 512 longs of cogram per cog.

electrodude

msrobots · 2014-04-05 18:15

well spoken ElectroDude.

Either we create a P1.5 as close as doable to the P1 so that Software and Documentation just need small upgrades or this is another mythical beast chased around and tacked with features.

Enjoy!

Mike

mindrobots · 2014-04-05 18:17

Electrodude wrote: »

Nothing from the P2 should end up in a P1E except maybe quads and dacs. Otherwise feature creep will make us end up with two almost identical P2s, both of which draw 8W of power and really should be made in 65nm but can't yet due to lack of money. The only reason I support the P1E is because it might avoid the danger of a 4 cog P2.

Or we could just wait for the P3 or 4 or 5, which will be 64 bit and have instructions that look like %iiiiiiiiiiii_zcp_ril_cccccccc_ddddddddddddddddddd_sssssssssssssssssss.
12 bit instructions, 3 flags (parity?), optional wr, immediate source and destination flags, 8 conditional bits to accommodate all 3 flags, and 19 bit D and S. It would probably be easier to do LMM the hard P1 way than have to worry about loading 34 bit instructions from 32 bit hubram. Bank switching is the worst thing that could ever possibly happen to a processor, as we have seen from the x86 and for other reasons.

But how about we finish the P1E or P2 first? I am almost completely certain that no 32 bit propeller will ever have more than 512 longs of cogram per cog.

electrodude

You forgot the indirection bit....there's been talk of indirect addressing.

potatohead · 2014-04-05 18:19

Nothing from the P2 should end up in a P1E except

LOL Just watch. First it's this, then it's that, and we really need this, etc...

mindrobots · 2014-04-05 18:22

Electrodude wrote: »

Nothing from the P2 should end up in a P1E except maybe quads and dacs.

You mean except the things that have already been deemed as absolutely necessary for the P1E to have any chance at being a success and being finished in a week or two?

Yup, nothing from the P2!

Electrodude · 2014-04-05 18:26

mindrobots wrote: »

You forgot the indirection bit....there's been talk of indirect addressing.

Of course! How silly of me... There has to be one for both source and destination, bringing the amount of D and S bits down to a measly 18.

%iiiiiiiiiiii_zcp_ril_cccccccc_in_dddddddddddddddddd_ssssssssssssssssss

mindrobots wrote: »

You mean except the things that have already been deemed as absolutely necessary for the P1E to have any chance at being a success and being finished in a week or two?

Yup, nothing from the P2!

No, quads and dacs (and 32 bit mul and muls!) are the only P2 features that should be added to the P1E from the P2. Did you have two other features in mind?

mindrobots · 2014-04-05 18:35

Electrodude wrote: »

Of course! How silly of me... There has to be one for both source and destination, bringing the amount of D and S bits down to a measly 18.

No, quads and dacs (and 32 bit mul and muls!) are the only P2 features that should be added to the P1E from the P2. Did you have two other features in mind?

Me? No, I'm not sponsoring any special features. I've just seen others mentioned....hub exec I think was one. The creep has started.......

RossH · 2014-04-05 18:42

mindrobots wrote: »

Me? No, I'm not sponsoring any special features. I've just seen others mentioned....hub exec I think was one. The creep has started.......

The only "official" additions (i.e. suggested by Chip himself) are the "analog rich" I/O pins.

If it were up to me I'd leave everything else out. Save those for the P2.

Ross.

GordonMcComb · 2014-04-05 18:46

I vote for whatever brings revenue to Parallax, helps Chip enjoy his family, and allows Ken to sleep at night. Not that they're aren't already doing these things, of course, but given the latest threads about high power consumption and such, I'm getting a bit concerned over the prospects of having John A. not live long enough to see the P2.

A P+1 sounds perfectly fine to me. Atmel and other chip makers come out with incremental variants all the time. A few options in the Propeller line won't hurt anything.

I'd be opposed to continued open debate on what would be in this new variant. No more than a *closed* focus group for bouncing ideas, if even that.

Dave Hein · 2014-04-05 18:47

Doing a P1+ is just the wrong way to go. It would be much better to start with the P2 as it currently is, and wack away features to get the power down below a couple of watts. A list of the P2 features is shown below in order of my preference. I've drawn a line after the 256K RAM to indicate the minimum features that I'd like to see in P2. If we can move the line down and still stay below 2 Watts that would be great.

P1 Core
4-port cog memory
20-bit multiplier
4 tasks
256K RAM

Data Cache
PTRA/PTRB
INDA/INDB
Aux RAM
PTRX/PTRY
4 tasks
32-bit Multiply/Divide
Cordic Engine
hubex
4 Instruction Caches
serial I/O
Pre-emptive threads
200+ Special Instructions

msrobots · 2014-04-05 19:02

The 64 bit thing Is a whole new can of worms and belongs to the P2 thread.

As I understud Chip the whole idea here is to take the existing - hand layouted(?) - outer ring of the P2 design and fill the inner area with a different content synthesized from Verilog.

Thus the reason to have ADC/DAC and no PLL direct on the pin. It comes from the P2 design. And is a quite powerful stuff.

Even two independent 8 cog Hubs with some common hub-hub would fit.

But as I said before change as less as absolutely needed on PASM. So most/a lot/some Stuff can even run unchanged directly from the OBEX.

Just faster. with more cogs and more ram. and 64 P2 pins.

what a great product between P1 and P2.

And PropGCC is already running faster then Spin on the P1. With 5x the speed and 16 cogs, even without HubExec C and C++ will run circles around a PASM P1 on that proposed P1.5 with 16 cogs and 512 ram. LMM or CMM. No problem.

No need for dedicated external ram support. Just use a cog and do it in SW.

I am repeating myself, but PLEASE keep it simple and add no features. That is what P2 is for.

P1.5 is for being as doable for Chip and Beau without adding to much new stuff and as close as possible to the P1 for the rest of the crew to minimize support/documentation/development and to maximize source reuse.

Count me in for it Ross. And I would found this also.

Enjoy!

Mike

ps. P2 will do all the magic. Just later. Even in 64 bit ?

Electrodude · 2014-04-05 19:05

msrobots wrote: »

The 64 bit thing Is a whole new can of worms and belongs to the P2 thread.

64 bits was a joke. I'm saying more than more than 512 longs in a P1E or P2 is just as likely as 64bits happening any time soon. 64 bits may happen in a long distant time, but >512 longs certainly won't come first.

electrodude

Invent-O-Doc · 2014-04-05 19:10

Yes on P16x32B. I'll also commit $500 to a kickstarter.

Invent-O-Doc · 2014-04-05 19:13

Oh, that P1 variant should look more like a P1 with a some added cool stuff, but closer to P1 than the hungry P2 design.

John Abshier · 2014-04-05 19:37

Gordon, I am hopeful. My grandmother lived to 99 and my dad is still active at 92.

John Abshier

Phil Pilgrim (PhiPi) · 2014-04-05 19:44

GordonMcComb wrote:

I'd be opposed to continued open debate on what would be in this new variant. No more than a *closed* focus group for bouncing ideas, if even that.

I totally agree. Chip's asking us for technical input on the P1+ or P2 is like a touring rock group asking its groupies for advice on arrangements, sound mixing, and lighting effects.

-Phil

jmg · 2014-04-05 20:08

Dave Hein wrote: »

Doing a P1+ is just the wrong way to go. It would be much better to start with the P2 as it currently is, and wack away features to get the power down below a couple of watts.

Chip has already taken this path with a 4 COG P2 build passed to OnSemi for Sim checks.

I would like to see that spare die space converted to more RAM, but this ~4 GOGs should fit better within the power envelopes available.( 5 COGs is another valid option)

Until a P1E power figure is available, any P1E feature mix is undefined, and is likely different for every poster!

Dave Hein · 2014-04-05 20:27

Thanks, I missed that post in the other thread. Somewhere I think he showed that the full-blown P2 was around 8 Watts, so that only gets us down to 4 Watts. We still need to do some pruning to get the power below 2 Watts.

I really hope Parallax pursues the reduced P2, and doesn't spend another minute on the P1+ idea. Parallax needs to move forward and not backward if they want to produce a competitive chip.

JRetSapDoog · 2014-04-05 20:58

Add me to the "in favor/backing" side with a "resounding yes" for an enhanced P1 (or P1-style chip) to be quickly turned around (read "produced this year").

I'd be willing to back a 512KB P1 with 8/16/32 cogs up to the $1,000 level. My enthusiasm drops off with less memory (Update: I originally said "drops off considerably," but felt it might be a bit rude and I'd still be interested in a chip with less memory, but the possible apps would decrease). I think 512KB would fill a niche/be a sweet spot. Update: As pointed out below, some more RAM would be necessary to get up to higher bits per pixel. Apologies for posting quickly (I had a low number of bits per pixel on my brain but still all-points-addressable (i.e., not tiles)).

For me, multiple cogs allow getting rid of the complexity of an operating system (the P2 design seems to be down to 4 cogs due to power concerns, though it has tasks and threading).

If analog features are added to the pins, e.g. DAC's, hopefully consider still supporting P1-style "digital" video (it's already designed and tested and working on P1 chips, of course).

As far as HUB throughput to (or from) a single COG is concerned, I think it should be able to sustain an absolute minimum of 5.76MB/s (which, when used for video, would allow 800x480x2bpp @ 60Hz). Yeah, that's only 4 colors. Obviously, more bits-per-pixel (bpp) would up that considerably. Something like 23MB/s (8-bit color) or even 46MB/s (15- or 16-bit color, update: with more RAM) could be better. I do feel that one COG should be able to handle basic video.

NWCCTV · 2014-04-05 21:11

Count me as a yes, but should we not first get the P2 finished????

jmg · 2014-04-05 21:20

Dave Hein wrote: »

Thanks, I missed that post in the other thread. Somewhere I think he showed that the full-blown P2 was around 8 Watts, so that only gets us down to 4 Watts. We still need to do some pruning to get the power below 2 Watts.

I really hope Parallax pursues the reduced P2, and doesn't spend another minute on the P1+ idea. Parallax needs to move forward and not backward if they want to produce a competitive chip.

That 4 COG ~4 watts is for 180MHz, and 100% usage, so a 50% typical Power Envelope control will meet ~2 Watts.
A perhaps even more typical Power Profile setting of
180MHz on 1 COG, 180MHz/2 on 1 COG, 180MHz/4 on 1 COG, 180MHz/8 on 1 COG
I make as ~1.909W

The same usage profiles at 100MHz gives ~ 0.7277W

The 100MHz case is comfortably under ~2W (4 COG) at 100% Power envelope, and I make the
100% & ~2W point appx 120MHz. on all COGs, so parallax could spec this as 120 MOP / COG continuous capable, or maybe ~500 MOP total ?

jmg · 2014-04-05 21:26

JRetSapDoog wrote: »

... My enthusiasm drops off considerably with less memory. I think 512KB would fill a niche/be a sweet spot. ...

.As far as HUB throughput to (or from) a single COG is concerned, I think it should be able to sustain an absolute minimum of 5.76MB/s (which, when used for video, would allow 800x480x2bpp @ 60Hz). Yeah, that's only 4 colors. Obviously, more bits-per-pixel (bpp) would up that considerably. Something like 23MB/s (8-bit color) or even 46MB/s (15- or 16-bit color) would be better. I do feel that one COG should be able to handle basic video.

At 800x480, that 512K limits you to 10 bits per Pixel or less, 768000 allows 16 bbp, 864000 is 18bbp, and 1152000 allows 24bbp
(plus you need somewhere for code and font buffers... )

Bill Henning · 2014-04-05 21:35

As I recall:

- full blown 8 core P2 @ 180Mhz, running full bore (ie every subsystem in every cog going full blast, CORDIG/MUL/DIV/MAC/Video etc all going at once, full blast), it was 8W

- full blown 8 core P2 @ 100Hmz,running full bore (ie every subsystem in every cog going full blast, CORDIG/MUL/DIV/MAC/Video etc all going at once, full blast), it was 3.1W

Both tests are unrealistic worst case (I doubt you can even code that test in pasm)

Going by my past experiences, others may have had different experiences:

- a realistic maximum is roughly half that, even when using 8 cogs at basically pasm capacity, about 1.6W

- typical usage depends on the application, an informed guess would be 1/2 - 1/10 of the realistic maximum, ie 160mW to 800mW

Dave Hein wrote: »

Thanks, I missed that post in the other thread. Somewhere I think he showed that the full-blown P2 was around 8 Watts, so that only gets us down to 4 Watts. We still need to do some pruning to get the power below 2 Watts.

I really hope Parallax pursues the reduced P2, and doesn't spend another minute on the P1+ idea. Parallax needs to move forward and not backward if they want to produce a competitive chip.

GordonMcComb · 2014-04-05 21:56

NWCCTV wrote: »

Count me as a yes, but should we not first get the P2 finished????

I may be totally out of line and presumptuous -- I often am! -- but I think the idea of a P+1 (or P1+ or whatever) is to get some additional revenue flowing into Rocklin so they can do the P2 the way it should be done.

msrobots · 2014-04-05 22:18

JMG JRetSapDoog

As far as I remember the sweet spot was 786kb. Nobody would ever need more.

We just need the a20 gate...

Enjoy

Mike

Bill Henning · 2014-04-05 22:25

Was it not 640KB?

(leaving space for Hercules and CGA adapters)

msrobots wrote: »

JMG JRetSapDoog

As far as I remember the sweet spot was 786kb. Nobody would ever need more.

We just need the a20 gate...

Enjoy

Mike

msrobots · 2014-04-05 22:43

@Bill,

uups - yes you are right. 640. Where I got that 786 from?

There was a time where I was young AND smart. Now I am just AND.

Enjoy!

Mike

Consensus on the P16X32B?

Comments