P2 Tricks, Traps & Differences between P1 (general discussion)

evanh · 2018-10-08 02:14

As it stands, future models:
a: Won't be compatible without reordering/pacing the cordic commands in software.
b: Software pacing means unused hardware in smaller models.
c: The 16-cog model will be slower than this model which again means we shouldn't be using even the full hardware of the 8-cog models.

Yanomani · 2018-10-08 03:16

evanh wrote: »

c: The 16-cog model will be slower than this model which again means we shouldn't be using even the full hardware of the 8-cog models.

I'm wondering if a 16-cog version based at the present architecture that also implements a full 1MByte of Hub ram...

It appears that P2 design will need to be moved to a 45nm process, wich also means... it could have a dual ported Hub ram, with two simultaneous egg-beaters serving Cog's <=> Hub memory transactions.

If one egg-beater handles only even-numbered Cogs, while the other one deals exclusively with odd-numbered ones, every Cog will have its chance at the same pace it has now.

Combined with Cog pairing and Lut sharing, oh boy....(Dreaming? Perhaps...)

Could the above be seen as feasible, or I'm forgeting/loosing something?
.

cgracey · 2018-10-08 03:20

I think we're just going to have to have a global variable in the tool that says what device we are compiling for, so it can conditionally generate appropriately-optimized code.

Cluso99 · 2018-10-08 06:37

cgracey wrote: »

I think we're just going to have to have a global variable in the tool that says what device we are compiling for, so it can conditionally generate appropriately-optimized code.

Yep. Otherwise we will be sacrificing performance when we shouldn't be doing so. If someone really wants compatibility then they will have to work out how to operate on the lowest common denominator. And who has the crystal ball to know what future variants will be built???

evanh · 2018-10-08 07:58

No we aren't sacrificing at all. No one's using the potential performance and even when trying to it's full of compromises. And at the same time we're losing out to the hub rotations.

Which all results in the cordic being hugely oversized, particularly for any smaller parts ...

DiodeRed · 2018-10-08 08:21

I find myself imagining someone writing intrinsics for CORDIC operations in a C compiler, having a bit of fun making the compiler automatically interleave the instructions to start the operation and get the result, with other instructions and reading out the result on the exact right cycle (with a compiler flag specifying cog count). Seems quite doable from that perspective.

evanh · 2018-10-08 08:51

Red,
It'd be painful to make that work. To get consistent performance for the varying prop2 models would require spreading the load across vary numbers of cogs.

From memory, logic wise, the cordic is bigger than a whole cog. It's just hugely oversized for the work it does.

evanh · 2018-10-08 10:54

CORDIC commands have a 2-clock alignment lag with respect to RD/WRLONG. EDIT: Which means it's faster to follow a cordic command by a hubram access rather than other way round.

EDIT: This has proven to be a hasty conclusion. The alignment of hubram access is so variable that any attempt at adjusting timing of hub-ops, like cordic commands, is futile. Chip is fully aware.

cgracey · 2018-10-08 13:24

evanh wrote: »

CORDIC commands have a 2-clock alignment lag with respect to RD/WRLONG. EDIT: Which means it's faster to follow a cordic command by a hubram access rather than other way round.

Would some other alignment relationship be better? That would be trivial to change.

evanh · 2018-10-08 13:55

Hmm, don't know. For ease of remembering maybe zero.

But I've just remembered that RDLONG's 9-clock minimum spreads right past its next slot. The currently 2-clock lag pretty much suits this. So it's actually different suitability for RDLONG vs WRLONG.

If the CORDIC's are moved by one to 3-clock lag then it would sit nicely on the end of a WRLONG, and leaving only a 2-clock gap after a RDLONG.

EDIT: I was about to say don't do an odd numbered lag because that'll just knock the hub to instruction alignment around more. But then I realised that instruction alignment will nearly always be odd anyway because every hubram access returns on an odd numbered alignment.

EDIT: See correction in earlier post.

cgracey · 2018-10-08 14:54

So, should we do anything?

potatohead · 2018-10-08 15:14

Looks to me like evanh is suggesting a +1 to +3 additional delay to optimize throughput.

I agree with doing that, as it is very likely the P2 will see reads, CORDIC ops, then writes, which currently miss a hub slot opportunity, if I understand evanh correctly.

This is imbalanced. Getting arguments has better throughput than storing results currently does.

It is hard to understand if one cycle is enough.

(I have real chip envy right now, lol. And it is fine, I can wait.)

evanh · 2018-10-08 15:17

right, yep, I've convinced myself I like a lag of 3-clocks. So, +1 to existing.
EDIT: See correction in earlier post.

pik33 · 2018-10-08 15:22

Now this is a trap of "one family, many chips". The P1 has not such a problem. Because there is only one type of of P1 chip.

But then
(1) we have now one kind of P2 with 8 cogs onboard at the testing stage;
(2) we do not know if there will be any other P2 models in production;

so the discusion about the code compbility with future P2s is simply living in the future, which is unknown.

Maybe the solution is - if there will be any 16-cog P2 version in the future - use 2 CORDIC units if this is possible?
Maybe another solution is - if there will be any 4-cog P2 version, use NO cordic in it? P2 Lite - less silicon, less power, not everyone needs the CORDIC unit for what he wants to do with P2.

potatohead · 2018-10-08 15:25

Personally, I think we optimize this one, and when, if there is another one, we deal with it then.

kwinn · 2018-10-08 16:44

potatohead wrote: »

Personally, I think we optimize this one, and when, if there is another one, we deal with it then.

+1 - Best idea so far. A P2 in the hand now is worth more than any future ones.

evanh · 2018-10-08 17:11

Calling it the best idea is disingenuous. One of the features of the Obex is they just work.

koehler · 2018-10-08 18:01

I would think that focusing attention on each and every instruction and subsystem of the P2 that can be tested would be most productive.
Discussing how Parallax is going to make different P2 I3/I5 types of devices and all that entails is rather fanciful at this point.
P2 itself has to be brutally tested, abused and a comprehensive errata completed, not to mention a compendium of User Example/Project Notes.
And once thats done, it'll be up to the public to decide whether or not its in Parallax's best interest to even look at devoting more resources to such niche of niche products.

One of the issues the Prop has had which has limited wider adoption has always been the soft peripheral's vs on-board hardware dillema.
A long time ago I suggested Gold-level objects in the OBEX.
That seems to have died, however I still think most engineer's want to do engineering and not software development, so having as solid a set of pre-built soft hardware is still relevant.
Parallax should maybe start of P2 Bounty program of some type, with a list of devices needed, and perhaps have the community do testing on candidates and some sort of prize/recognition for those picked as the most robust, versitile, etc?
Maybe to the top 5 or 10 contributors get a mounted, signed copy of one of these 1st spin die, or other reward?

kwinn · 2018-10-08 18:26

evanh wrote: »

Calling it the best idea is disingenuous. One of the features of the Obex is they just work.

That's true for the P1, but now we will need a P2 version of the objects, so it is also quite possible that we will need objects that are specific for each model of the P2. Trying to make "one object fits all P2's" is fine as long as it does not delay P2 hardware and software availability. It would be nice to have such universality, but probably not practical since the hardware will differ.

evanh · 2018-10-08 19:51

I believe it was one of the founding ideas.

jmg · 2018-10-08 20:18

kwinn wrote: »

show previous quotes

evanh wrote: »

Calling it the best idea is disingenuous. One of the features of the Obex is they just work.

That's true for the P1, but now we will need a P2 version of the objects, so it is also quite possible that we will need objects that are specific for each model of the P2. Trying to make "one object fits all P2's" is fine as long as it does not delay P2 hardware and software availability. It would be nice to have such universality, but probably not practical since the hardware will differ.

Conditional defines are a common way the SW industry manages "similar code, differing details", and most of the new P2 tool flows will/do have Conditional Defines.
That also means some of those flows, that can/will output P1 code, can be used to build P1 or P2.

Of course, a P2 object that is Smart Pin cell, or Streamer/LUT, intensive makes little sense to port back to P1, but simple generic IO like bit-bang i2c, should be portable, as the tools mature.

idbruce · 2018-10-08 20:28

Personally, I think that Chip should take a breather, kick back, take another six months to a year and get the first production chip exactly where he wants it, instead of being pressured by outside influences.

Chip, give us the best chip possible, without rushing to the finish line like everyone wants. Take your time and get it right.

Dave Hein · 2018-10-08 20:35

Then again, I think Parallax is ready for the P2 to start paying the bills. Not to mention that we, the customers have been waiting years to get our hands on the P2. Personally, I hope that parallax spends several weeks to determine what works and what doesn't work, and only fixes the things that don't work. This will get the chip out faster, and minimize the probably that something else gets broken.

Once the P2 is shipping, and the development tools are done, then Parallax could look into developing a new and improved P2.

Just my 2 cents worth.

idbruce · 2018-10-08 20:43

Dave

Chip is being rushed by outside influences. If he or Parallax succumbs to the pressure, then we will never realize the greatness of what the P2 could truly be and Parallax will never reap the the rewards of Chip's intelligence.

Just my 2 cents worth.

EDIT: I can wait one more year for uC perfection. LOL... Providing my health holds up.

potatohead · 2018-10-08 20:50

I am sure the plan is fix it, ship it.

idbruce · 2018-10-08 20:50

To be perfectly honest.... I would like to see Chip's best before I die.

K2 · 2018-10-08 20:53

I'm not sure how much Chip is being pressured by outside influences right now. Only he could say for sure, but I think he's in something of a payoff phase right now. After years of intense work, he now has the fruit of his labor in his hands, and it is great! It's only natural to want to check it out thoroughly and make the last few tweaks.

I know that if it were me, "kicking back" wouldn't be the thing I most craved right now. I've given birth to a few projects, and at this phase of the process I simply can't get enough of my new baby. If there's any imperfection, I jump on it like a Vurtego Pro pogo stick!

idbruce · 2018-10-08 20:58

K2

I know that if it were me, "kicking back" wouldn't be the thing I most craved right now. I've given birth to a few projects, and at this phase of the process I simply can't get enough of my new baby. If there's any imperfection, I jump on it like a Vurtego Pro pogo stick!

You do have a very good point. I would not be kicking back either, but on the other hand, I would not be letting the forum or "Ken" (sorry Ken) rush me to my greastest creation.

jmg · 2018-10-08 21:16

Dave Hein wrote: »

Then again, I think Parallax is ready for the P2 to start paying the bills. Not to mention that we, the customers have been waiting years to get our hands on the P2. Personally, I hope that parallax spends several weeks to determine what works and what doesn't work, and only fixes the things that don't work. This will get the chip out faster, and minimize the probably that something else gets broken.

Once the P2 is shipping, and the development tools are done, then Parallax could look into developing a new and improved P2.

Exactly, cash flow is now king, and I'm sure that's what is happening.

The errata right now is looking 'good enough' that the 1500 parts can be the the first production chips, on a somewhat rationed basis.
ie Parallax need to avoid a single customer vacuuming all of those parts, those need to seed the 'most design wins'.

If it was me, I would allocate most of the 1500 parts to Chip's advanced Eval Board (tbd?), and some to P2D2's, and ration a few to customers who can make a good enough case for custom board use. That would typically be something not on the Eval Boards.

If there is enough short term demand, (aka customers with opened cheque books) OnSemi could even run another run of Rev A parts.

kwinn · 2018-10-09 04:14

jmg wrote: »

show previous quotes

kwinn wrote: »

evanh wrote: »

Calling it the best idea is disingenuous. One of the features of the Obex is they just work.

That's true for the P1, but now we will need a P2 version of the objects, so it is also quite possible that we will need objects that are specific for each model of the P2. Trying to make "one object fits all P2's" is fine as long as it does not delay P2 hardware and software availability. It would be nice to have such universality, but probably not practical since the hardware will differ.

Conditional defines are a common way the SW industry manages "similar code, differing details", and most of the new P2 tool flows will/do have Conditional Defines.
That also means some of those flows, that can/will output P1 code, can be used to build P1 or P2.

Of course, a P2 object that is Smart Pin cell, or Streamer/LUT, intensive makes little sense to port back to P1, but simple generic IO like bit-bang i2c, should be portable, as the tools mature.

No arguments regarding the utility of conditionals and the potential portability of code between P1 and P2 variants they make possible. I was simply agreeing with potatohead's statement that "we optimize this one, and when, if there is another one, we deal with it then" rather than hold up delivery of the current P2 to make more changes. Better to deal with the the P2 variant differences when we have a better idea of what they are.

P2 Tricks, Traps &amp; Differences between P1 (general discussion)

Comments

P2 Tricks, Traps & Differences between P1 (general discussion)