Propeller II update - BLOG

T Chap · 2013-12-01 06:36

Just have an advanced OBEX section. Anyone wanting to use the advanced stuff must be willing to invest in the learning curve to deal with the potential conflicts. Rather than two versions though, if it is easy enough, have the object be selectable. If there are some that could definitely benefit from the slot borrowing, then why restrict the the potential? All or nothing is a bad approach for the sake of tidiness alone, especially since the messy objects can be easily segregated for advanced users.

Heater. · 2013-12-01 07:25

potatohead,

The "it doesn't work" sets of questions would be much different too.

You get it.

ctwardell

With power comes responsibility to use it wisely, I think we are up to the task.

and T Chap,

Just have an advanced OBEX section. Anyone wanting to use the advanced stuff must be willing to invest in the learning curve to deal with... messy objects can be easily segregated for advanced users.

This attitude is the road to ruin.

We may be up to the task but prospective users should not have to be Prop gurus in order to be tempted towards using the Prop. In fact the vast majority of them won't be.

It should not be necessary to be a guru to use "advanced" objects. Users don't care if it's advanced or not, it's just functionality they want to use.

What we can end up with is a chip with no useful peripherals built in, as intended, but also no easily useful software objects for the naive to get such functionality.

What better way to tell your potential customers bugger off!

What ever happens here it had better be simple and surprise free for end users of objects.

Again, perhaps I'm over emphasising the point, but it is a point.

ctwardell · 2013-12-01 07:31

Heater. wrote: »

This attitude is the road to ruin.

We may be up to the task but prospective users should not have to be Prop gurus in order to be tempted towards using the Prop. In fact the vast majority of them won't be.

It should not be necessary to be a guru to use "advanced" objects. Users don't care if it's advanced or not, it's just functionality they want to use.

What we can end up with is a chip with no useful peripherals built in, as intended, but also no easily useful software objects for the naive to get such functionality.

What better way to tell your potential customers bugger off!

What ever happens here it had better be simple and surprise free for end users of objects.

Again, perhaps I'm over emphasising the point, but it is a point.

I see what you're saying, but I think there are ways to mitigate the issue without hampering the functionality.

If complete orthogonality is required then we need to throw out a lot of the P2.

C.W.

Heater. · 2013-12-01 08:08

I might have missed some developments. What else on the P2 breaks orthogonality?
I had though it was doing quite well.

Bill Henning · 2013-12-01 08:10

Yes, it seems comprehensive to me

cgracey wrote: »
I thought about this for a few hours after reading all that you guys wrote, and I came up with this:
SETSLOT D/#      - Set cog's slot register to %PPP_CCC

                   %PPP = priority code
                   %CCC = a cog given priority

	MY SLOT PRIORITY
PPP     1st    2nd   3rd        WHAT OTHER SLOTS CAN I USE?
--------------------------------------------------------------------
000     ME     ANY   ANY        OTHER COG'S CCC = ME   (default)
001     ME     ANY   ANY        ANY
010     ME     CCC   ANY        OTHER COG'S CCC = ME
011     ME     CCC   ANY        ANY
100     CCC    ME    ANY        OTHER COG'S CCC = ME
101     CCC    ME    ANY        ANY
110     CCC    ANY   ANY        OTHER COG'S CCC = ME
111     CCC    ANY   ANY        ANY
PPP in plain English
000 = If I don't use my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.

001 = If I don't use my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.

010 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.

011 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.

100 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.

101 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.

110 = If CCC doesn't want my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.

111 = If CCC doesn't want my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.

Does this seem comprehensive?

Bill Henning · 2013-12-01 08:21

You do have a good point.

I think the hub utilization is an application level issue.

It should be addressable in Proptool and OpenSpin - each object should declare the number of cogs it requires, and the number of guaranteed hub slots out of 8.

Then when the compiler pulls in all the objects, it can add up the cog requirements, and the hub slot requirements, and say

"Application exceeds available cogs"

or

"Application requires more hub slots than available"

or

"Application exceeds number of cogs and hub slots available:"

The post-compile report could be amended with "N cogs used" and "M hub slots used"

For dynamically loaded/replaced cogs, we may need a "used x dynamic cogs" as well, or better yet, treat any dynamic cogs as a fixed cog.

This could be implemented simply with two constants in the top level file for an object:

CON
COGSUSED = 1 ' if not specified, 1 implied
HUBSLOTS = 1 ' if not specified, 1 implied

Heater. wrote: »

What ever happens here it had better be simple and surprise free for end users of objects.

Again, perhaps I'm over emphasising the point, but it is a point.

ctwardell · 2013-12-01 08:33

Heater. wrote: »

I might have missed some developments. What else on the P2 breaks orthogonality?
I had though it was doing quite well.

Video DAC's now.

PORTD is shared resource, so it will be difficult to mix and match objects that use PORTD.

Within a COG you have to keep track of thread usage to determine how may 'spacer' instructions are needed for some instruction.

With multitasking we will inevitably end up with objects that can share a COG, so the same issues you bring up with hub slots will apply there.

C.W.

ctwardell · 2013-12-01 08:41

Bill Henning wrote: »

It should be addressable in Proptool and OpenSpin - each object should declare the number of cogs it requires, and the number of guaranteed hub slots out of 8.

It won't really be that simple because "hub slots are not equal". The timing of the hub slots is also important.

For example getting an extra slot directly before the regular slot will likely just end up in wasting the regular slot.

I imagine the most useful case will be to use slots shifted by 4 from the 'regular' slot.

C.W.

Invent-O-Doc · 2013-12-01 09:00

Simple things often work better than complex ones; and they sell better too. P1 has had fragmented development board form factors and a fragmented development language/IDE problem that has limited its sales. Having different levels of OBEX and objects that may screw with your own program's determinancy will not help. What is the simplest solution that allows for analog pins to be read from all cogs and permits enough space on the die?

cgracey · 2013-12-01 09:20

Invent-O-Doc wrote: »

Simple things often work better than complex ones; and they sell better too. P1 has had fragmented development board form factors and a fragmented development language/IDE problem that has limited its sales. Having different levels of OBEX and objects that may screw with your own program's determinancy will not help. What is the simplest solution that allows for analog pins to be read from all cogs and permits enough space on the die?

Any ADC pin can be read from any cog. The only limitations are the update-per-clock DAC modes which are video or analog CTR output - every 12 pins is associated with a particular cog. For every other pin function, there are no special cases.

Phil Pilgrim (PhiPi) · 2013-12-01 09:25

Guys,

'Just a reminder of what mission creep and featureitis can produce:

I haven't studied the details enough to speak with much authority, but to me the P2 is looking less and less like a Propeller. I remember a saying about IBM's PL/I language:

"It's like a Swiss army knife. Once you figure out how to use one of the blades, another one stabs you in the back."

Please, let's not let this happen to the P2. The P1 is like a Lego set with just simple bricks and mechanical parts. All the pieces fit together -- orthogonally [cough, cough] -- and with enough imagination, you can build anything with it. That simple elegance is what makes it fun to play with and to build things its creators could never have envisioned. I would like to see the P2 return to that original concept.

It's all well and good that Chip is open to suggestions from the forum. But we are not the customers that Parallax requires ultimately to pay for the P2's more-than-substantial development costs. And if the chip is encumbered by hard-to-understand features, exceptions, and nonothogonality, volume customers will not buy it. Moreover, the delay caused by this endless process of tweaking and adding features only increases the front-end burden on Parallax.

Antoine de Saint-Exupery said it best:

La perfection est atteinte
non quand il ne reste rien

Heater. · 2013-12-01 09:31

What he said. And the French dude.

cgracey · 2013-12-01 09:40

Do you guys think that some programming guidelines would stave off the object-compatibility conundrum?

How about these:

1) Any single-cog program must be able to get by on its own hub slot. Any extra slots cannot be relied upon.
2) Cooperative cog programs can can only rely on extra hub slots that they gift to one another. These will most-likely need to be staggered by 4 slots.

One way to stop the potential for inter-object incompatibility is to say (or make the hardware such) that the ONLY way for a cog to get an extra slot is to have it gifted by another cog within the same object, without any relative-slot requirements. There is no way that would allow for problems, but by not stipulating slots, it may not be that valuable.

Perhaps the main OBEX section needs to be for simple objects which might use multiple cogs, but do not rely on any special slot allocation.

I understand Heater's concern about making a mess of inter-operability. That would not be a good thing and would, most likely, happen. Can it be mitigated, though, by strict programming guidelines, where everything is possible, but objects must declare their type forthrightly and slot-agnostic programming is strongly encouraged?

ctwardell · 2013-12-01 09:55

Chip,

I really like the more general solution that you came up with, but if it alleviates fears and enforces pairing to help eliminate side effects maybe we could go back to something like what I mentioned in post 3082.

I'd hate to loose this functionality completely.

C.W.

cgracey · 2013-12-01 09:55

It's too bad that we can't have this feature, but keep it hidden from all except those who would use it with full responsibility. Maybe putting the chip into a special mode could open up this feature.

I can picture a case where someone gets something working by using extra slots. Then, they add more cog programs, only to find that their initial performance diminished to an unacceptable level. They would lose confidence and have that familiar, lousy feeling that they get everywhere else. No "magic" in that scenario. Trying to explain to them that they could have had consistent inter-operability if they would have followed certain rules would not be that encouraging, as it could likely be said of almost anything.

If it could be done so that when they embark upon using this feature, all sorts of warnings and disclaimers are presented, they would have a realistic awareness of their likelihood of breaking compatibility with other objects.

This feature would be great to have when you are the one using the whole chip and you control the big picture, but it's inviting trouble by encouraging others to employ this feature who don't know the whole-chip application.

Is there some sort of compromise that could allow this feature to exist, but without encouraging bad behavior? I kind of see it as a tool to employ when an application is done, but you'd like to open the valve all the way to maximize performance.

potatohead · 2013-12-01 10:14

Hey, let's license it! You don't get the keys until you can demonstrate you won't drive the car into the building others are working in.

I'm not serious. That's just the first thought that came to mind.

Now I am.

What if P1 had this? What would be different today?

How about two chips?

Fab the one that is most Propeller like, then have it fund the one that isn't, and make it the balls out performance leader and see how it shakes out? Or make the decision to really blow it all out on P3?

And, that pure performance chip could include optimizations for C, etc... why not? Then there would be the education / ease of use chip, and the monster one that is insane fast, but not so easy to use. People would have a migration path. Once they build up an application, or score that design win, they can buy up to the performance model and maximize everything according to their design requirements.

Put the solid P2 out there first. It will see the development work and a big code body will surround it too. That's your center of gravity right there. Parallax education can do what it does well too.

Then release the performance one. Pro model. This one may see Parallax education, but more likely Parallax engineeering services and or outside guru services. Just some thoughts.

I'm dead serious on this. If this one fabs well, we've proven the process. Fabbing the performance one is a lower risk and a much lower timeframe. Could be a year. And it might be funded by then too, assuming Parallax and everybody involved sees enough to do that and eat and do whatever else they want and need to do with the income, of course.

And it might make sense to do it all on one chip on P3 too. P3 could have some operating modes, user mode and protected mode, which can put stuff like this into some boundary or mode that keeps people out of trouble. P3 really could be system on a chip and it could have software written to take care of people and hardware modes to do so in transparent ways too. The gurus could just write for it and have a field day, no worries.

I will say this: Had the P1 employed this feature, I know I would have written those last few video drivers that get me all the colors, 80 column text, etc... in one COG, thinking SPIN can use another one, and the other cogs don't need to be all that fast...

I don't know that I would have gone back and modified the drivers to use two COGS or intercommunicate in complex ways that preserved the other COGS ability to function. Nor would I likely have gone and worked so hard on inter-cog syncronization, which turned out briliant.

Wonder whether or not the C guys would have maxed out the LMM COG? etc...

potatohead · 2013-12-01 10:22

Yesterday when this was proposed, I didn't react well and removed my post and some leading up to it. Was going to just go away for a while, because I don't feel good about the thrashing going on right now.

A few hours later, I thought, "OK" let's entertain it and see where the impressions all get to.

Heater and Phil were at the top of my list. I also wonder what David and a few of the others really think about this. Not in the context of can it be done, but should it be done? Specifically, existing customer adoption and the "a COG is a COG" reuse that has paid us off so well.

I'm going to use the P2 either way. Again, the one that got botched was a perfectly awesome chip. The one we had just before this exercise is a very awesome chip, because it's fine tuned. The one we are discussing now? I don't know.

Maybe I should have left those posts up. I just didn't want to be trouble. Heater's comment brought me right back to the same place. Quick. Ugh.

Re: Can we find a way to help users?

The only way I know of is to use the "center of gravity" dynamic. We've proposed a TON of "if only people would do X" ideas as standards, guidelines, etc... and what happened? Squat. And over time, many of us realize users do what users do, and that includes what they can do instead of should do and that includes me.

If we had a body of code that really was inclusive and that did play by all the rules, many of the additions to it would comply because nobody wants to break the utility of such a code body.

And so that means strongly curating code contributions, leaving compliant ones, and all the other ones out there.

Not sure where that leads, but that's what I think would have to happen.

There would be a ton of, "if only ALL the COGS could go this fast..." discussions too. Many of those would result in users seeking a solution. What solution?

potatohead · 2013-12-01 10:49

Re: Break orthogonality

Yes, the video DACS are going to end up pin range limited by COG. Personally, I don't have a big problem with this one. The ranges are linear, and a quick multiply by the COGID tells somebody what the range is.

The trade-off on that one is a smarter core, which ozpropdev has demonstrated to us as useful as has Chip, and the big bus required to move those signals around takes a lot of chip area and it imposes timing limits all over the place. Worthy trade. Worst case, somebody does a layout like they would with P1, only to find the best solution for that high speed DAC output is on "that side" of the chip. So they fire off the necessary COG and assign the pins and go. The rest runs as it normally would with a simple COGNEW that allocates first available COG as we are used to doing.

There are some odd cases where pins may cross a boundary, and those are just going to have to be addressed in layout. So it's not perfect, but it's way better than "use this pin for feature X" out there otherwise.

One last comment related to feature freeze. I'm really wanting to write code for P2. Enough has changed recently to invalidate all that was written, so I'm just sort of stuck waiting.

Others are in the same place. C, for example. Would be really nice to settle on the P2 sans this functionality, leaving instruction / bugs and the SERDES... Just so code can start to accumulate as we need it to.

I don't see tasks as that big of a deal. The COG is the reuse unit, IMHO. Tasks can be, but the user will have to do some fiddling.

Re: PORTD

Doesn't PORTD have a COG MASK to help share the resource? Still, it's shared, so yes it will require fiddling too. Personally, I think that will result in multi-COG objects that get loaded together and perhaps a quick check of the COGID can see to it they set their masks correctly, leaving just whatever the user has done with PORTD. This one is a don't have to use feature, IMHO. Not such a problem.

ctwardell · 2013-12-01 11:26

I said earlier I'd like to see a wrap on features and get this thing shipped.

When Ray first suggested this, I agreed with the post Phil deleted and since he said it strongly, didn't pile on.

Then Chip brought up that this would actually be simple to do, so it seemed worth doing.

It looks like the time and risk involved from the chip point of view is minimal, but it has a high philosophical cost.

All I can say at this point is users can be educated, once the chip is made it's done. It's easy to say we can add it to another chip later, but in reality that later is going to be later than you think.

Maybe it isn't that important of a feature, I'm not going to be the one writing any high performance video or USB drivers so I really can't argue that strongly in favor of it.

I can say that as an engineer it bothers me to leave resources underutilized due to a philosophy.

C.W.

Heater. · 2013-12-01 11:32

It's not looking good:

1) Software guidlines.

They will end up about as effective as the "Gold Standard" object idea some time back. i.e. useless.

2) license

Err, no. Not now that Parllax is in the world of open source hardware and software. That does not sit at all well.

3) Two chips?

Yeah right. We have not got one out the door yet. An expensive option. But yes perhaps part of the Prop III exploration.

4) Putting the chip into a special mode.

More complexity. I am allergic to modes for most things. They never seem to be in the right mode or I can't get a combination of modes I want (Think alt pin functions on ARM chips)

Chip,

I can picture a case where someone gets something working by using extra slots. Then, they add more cog programs, only to find that their initial performance diminished to an unacceptable level. They would lose confidence and have that familiar, lousy feeling that they get everywhere else. No "magic" in that scenario.

Exactly. Horrible scenario. It's interrupts all over again.

It's not just a case of losing a bit of performance. It may be finding that your desired video mode is not possible or that some external hardware you want to drive cannot be driven at all. USB for example?

A question:

Given that the Prop II is already stunningly more capable than the P1 what is it that people want to do with that little extra performance to be had at the huge cost of breaking orthogonality, simplicity and determinism?

jazzed · 2013-12-01 11:34

Chip,

We have discussed possible ways for COGs to not be restricted to round-robin access dozens of times before this. All were summarily rejected by you as being too complicated before. This time complexity creeps in again as it is simply unavoidable. If it wasn't for the special video processing bandwidth requirement, this probably wouldn't even be discussed.

At this point, I think COG yielding or whatever you call it should be dropped and left on the design table for another product. In the unlikely possibility that it will draw many more customers to achieve a good ROI (beyond satisfying the intellectual), then maybe it's important to add.

However it's still not really useful for large code performance (I.E. programs that won't fit in a COG). A HUB hogging COG will not be of much extra value because instructions need to be interpreted anyway. In the case of higher code density, all instructions need to be interpreted. In the case of using 32 bit instructions, we still need to interpret "LMM jumps", etc.... If all instructions could simply be fetched and executed from HUB without interpretation, the proposition would be more encouraging. Many instructions can be fetched and executed as is, but there are several that must be interpreted like a long jump that requires extra instructions maintaining the pointers, etc....

In the end P2 or whatever you call it has to make a positive difference to the bottom line. Someone needs to decide what is enough.

Heater. · 2013-12-01 11:42

ctwardel,

I can say that as an engineer it bothers me to leave resources underutilized due to a philosophy.

I sort of agree. Many customers of internet providers would disagree. They sign up XMits/Sec on their line, which can be done. Until there are many subscribers around and they only get X/10. "Over selling" is hated.

Jazzed,

Someone needs to decide what is enough.

OK. I'll do that if no one else want's to.

I decide there shall be no changes to the round robin HUB access timing.

Further, I decide all other changes should be kept to the bare essentials and the chip shipped out as soon as possible.

How am I doing?

David Betz · 2013-12-01 11:43

jazzed wrote: »

If all instructions could simply be fetched and executed from HUB without interpretation, the proposition would be more encouraging. Many instructions can be fetched and executed as is, but there are several that must be interpreted like a long jump that requires extra instructions maintaining the pointers, etc.....

Here's a radical idea. Get rid of "aux" RAM and replace it with an instruction and data cache and extend the PC enough to allow it to address all of hub memory. That of course doesn't solve the entire problem since we still need branch instructions that can reach all of hub memory rather than just the 9 bits required to address COG memory.

Heater. · 2013-12-01 11:46

David,

Here's a radical idea. Get rid of "aux"...

ArrgrgggH....

Bill Henning · 2013-12-01 11:47

After chewing on all the recent posts, perhaps there is a slightly less perfect version that would get around the obex friendliness argument.

Principle #1:

All cogs get their own regular hub cycle except if

- they don't have a pending request
- they have specifically 'yielded' their slot

Principle #2:

- a cog that can use extra bandwidth can grab unused hub cycles for its own use, if it is ready to access the hub

Analysis:

I think we only get into the Obex issue if we try to add priorities and guaranteed extra bandwidth.

Now looking at RDQUAD etc., it seems that in most cases, extra slots are only useful if they are 3-4 cycles later.

At most, a cog can only make use of a long every clock cycle.

Careful use of RDQUAD's and RDxxxC's means that a cog can really only use its opposing cogs timing slot to get the most bandwidth it can use.

Someone - I think Ray - suggested pairing cogs, for voluntary bandwidth sharing - one voluntarity gives up its hub slot to its twin. Pairs would be: Cog 1/5, 2/6, 3/7, 4/8

I think that makes sense, and it is easy to document - and easy to enforce for objects (start the pair of cogs in the object)

This reduces the hub states for any cog to:

00 - normal, P1 style round robin - but unused hub cycles are available to any cog
01 - "giving" cog of a cog pair (it volunteers its hub slot to its pair) - unused slots can be used by any cog
10 - "greedy" cog of a cog pair (first claim to unused pair-cog hub cycles) - unused slots can be used by any cog
11 - low bandwidth cog, only uses totally unclaimed cog hub cycles

Note: "giving" cogs still get a chance to use otherwise unused hub slots

I think the above would keep Obex simple, yet allow for up to four pairs of high bandwidth cogs; the only Obex rule would be that paired cogs should be in a single object, which would allow mixing paired / single cog objects freely.

I love simplicity and orthogonality too - but I hate to see wasted hub cycles!

cgracey wrote: »

It's too bad that we can't have this feature, but keep it hidden from all except those who would use it with full responsibility. Maybe putting the chip into a special mode could open up this feature.

I can picture a case where someone gets something working by using extra slots. Then, they add more cog programs, only to find that their initial performance diminished to an unacceptable level. They would lose confidence and have that familiar, lousy feeling that they get everywhere else. No "magic" in that scenario. Trying to explain to them that they could have had consistent inter-operability if they would have followed certain rules would not be that encouraging, as it could likely be said of almost anything.

If it could be done so that when they embark upon using this feature, all sorts of warnings and disclaimers are presented, they would have a realistic awareness of their likelihood of breaking compatibility with other objects.

This feature would be great to have when you are the one using the whole chip and you control the big picture, but it's inviting trouble by encouraging others to employ this feature who don't know the whole-chip application.

Is there some sort of compromise that could allow this feature to exist, but without encouraging bad behavior? I kind of see it as a tool to employ when an application is done, but you'd like to open the valve all the way to maximize performance.

potatohead · 2013-12-01 11:47

Re: The engineer in me hates wasted...

Isn't maximizing the user experience and minimizing risk part of engineering?

Let me relate a short story. (well for me, short)

So companies make big CNC machines. Lasers, punches, routers, mills, lathes, and so on. Just about every one of these machines leaves capability either hidden, or locked away from the end users, who own the machines.

They do this because exposing those things means increasing risk while not also increasing the returns in like kind. Nobody wants to market against "watch out for company x, they promise y, but fail, unless you are careful to only do z" We run that risk with P2. X is possible, with Y issues, leaving Z consistent. Think hard about that.

An example from the laser is wattage and duty cycle limits. Back in the day, I convinced a company I worked for to obtain one of these. They were using punches, which require lots of tooling setup. Additionally, a lot of their work was short run work. Non-optimal at all.

Once I made that case, the machine showed up. The tech and I set it up and got it working very nicely. It didn't take long before I realized we didn't get 100 percent of the machine. It turns out that extra 15 percent or so was just enough to do a few difficult jobs efficiently.

This angered and bothered me, so I went seeking. Didn't take long before I got a call from the regional machine applications engineer. We discussed the use case, and he agreed. The machine was capable, but...

And then came the risk discussion. In his experience dealing with a few hundred installs of this machine, the failures were all due to exceeding absolute limits, not the safe "as shipped and warrantied" limits.

So I got a special paper tape mailed to me. On it was written, "If you do this, you are on your own. Regards, xxx"

They used paper tape as the distribution because most people were using computers and increasingly few had the ability to make new paper tapes. Thought that kind of funny, and I duplicated it on our ancient paper tape read/write machine, just because. That darn tape is probably still in my box of curios somewhere right along with the little 8K invaders game I wrote to tape for use in DOS.

Along with that tape was a full technical description of the machine, absolute limits and on the tape were the secret parameter numbers that enabled the machine to perform in various ways.

So I had a choice.

I could open it all the way up, and the chance of machine failure and repair was going to go up by a very high factor, and none of it would be under warranty, or I could leave it alone and just accept the as shipped limits, or I could split the middle and open it up just enough and in just the right ways to take the work. New work is a very compelling thing, particularly in the 90's when this was all going on. Everybody wanted more, and the pressure to perform here lest the work go overseas was very high.

Worse, I knew what the owners would say. "Do it, we need the work, and be careful." In other words, they wanted the work, but the risk was going to be on me.

So I opened it up just enough to do the work, and I kept the info to myself, simply telling the shop that the manufacturer made an adjustment to maximize our particular machine, implying that not all of them would perform and that there was a baseline. Truth is there was a baseline and not all of them would perform, but I was sent the data for our exact machine and knew where that limit really was.

When I left, so did those tapes and that data. I ended up talking with that engineer until he retired. Just so happens our paths crossed enough for that to happen. Eventually, there ended up being a small club of people who could really drive those machines, and I was in that club for a time, and I felt the danger keenly. Decided it wasn't going to be me that pushed it to blow 5 figure repairs.

The company I worked for never did see an off warranty failure. Technically, they did consume some things like lenses and mirrors more quickly than the spec called for, but those are simple and the spend on those is expected anyway. I figured the new work capability likely paid those off nicely. Never did find out though.

Also technically, they were off warranty, but for my conversation with the engineer. The two of us sorted it out best case, and he never filed that report disclaiming the warranty, because he basically trusted that I wasn't going to leave that data out there at risk. Normally, just sending that tape is enough to void it all. Interesting, yes?

So here we are, performing the tasks of engineering. It's not just about peak performance. It's the whole solution, risks, etc... that we are engineering for. Let's do solid engineering. Everybody involved will be better for it.

potatohead · 2013-12-01 11:52

Re: If it wasn't for the special video processing bandwidth requirement, this probably wouldn't even be discussed.

I'm sorry, but full color 1080p isn't needed for any use case we've thought up yet. The video bandwidth we've got is quite frankly, stunning. Running it at more modest resolutions may appear a loss to us close to the project, but just wait until what we've got right now hits userland!

It's amazing, and if that means reduced color, ultra-high resolution displays, so be it. Besides, we know how video is going to work, and the short of it is it will do a lot more than we think today. Put some people on it and some clever code, and the magic that happened on P1 will also happen with P2.

No way would I trade the reuse for peak video capability.

What we have right now is so darn good it's silly. And it will do textures, and it will pixel blend, and it will drive multiple displays concurrently, and it will render a pile of polys too. Sheesh!

potatohead · 2013-12-01 11:54

Re: f all instructions could simply be fetched and executed from HUB without interpretation, the proposition would be more encouraging.

A hardware LMM engine for P3 would be near top of my list.

Heater. · 2013-12-01 12:01

Bill,

You may have a acceptable "non-intrusive solution their. I'm too tired to tell.

Careful use of RDQUAD's and RDxxxC's means that a cog can really only use its opposing cogs timing slot to get the most bandwidth it can use.

This sort of statement inclines be to believe that the performance improvements we could expect from all this "HUB busting" are not so great.

Any idea what gains it actually does get us?

Ahhg...what am I saying. I already decided we are not going to do this.

ctwardell · 2013-12-01 12:07

Bill Henning wrote: »

00 - normal, P1 style round robin - but unused hub cycles are available to any cog
01 - "giving" cog of a cog pair (it volunteers its hub slot to its pair) - unused slots can be used by any cog
10 - "greedy" cog of a cog pair (first claim to unused pair-cog hub cycles) - unused slots can be used by any cog
11 - low bandwidth cog, only uses totally unclaimed cog hub cycles

I think the "unused slots can be used by any cog" is what causes trouble.
This introduces additional jitter in how long a hub op may take and an object that works when there are a lot of 'free slots' may break when there aren't so many slots.

My proposed solution from post #3082 (it was #3083 at one point...) was more restrictive:

Hub Access Pairs

COG0 and COG4
COG1 and COG5
COG2 and COG6
COG3 and COG7

Option 1: Use 'my' hub slot only
Option 2: Use 'my' hub slot and 'my twins' hub slot if available
Option 3: Use 'my twins' hub slot if available

Setting the 'high performance" COG to Option 2 gives it guaranteed 2X hub slots, while the "donor" set to Option 3 gets whatever hub slots its twin doesn't use.

C.W.

Propeller II update - BLOG

Comments