Your user now needs to know what the cog table looks like, to figure out if his object will run in the cog he intends to load it in.
Not my user, your user.
Your user, will be pulling only from the Plug-and-Play basket, and using the (default) Plug-and-Play settings,
They have nothing to worry about. To them, there is no cog table.
Myself, and many others, will be using the better design-controls, to craft more powerful system designs.
I expect this group will have no problems grasping a cog table.
"Nice object there jmg, too bad it uses one of those ****** table schemes. It bumps into my xxx driver."
Or we get this, "Hey, spud! Nice video object. Can you rejigger it for table allocation X?" -- "No, you've got full control, please use it." lol
Maybe this? "Can anybody help me figure out how come spiffyfotz, warblubble, and fringly won't work with this table allocation needed for dwarbubs, and frotz?"
More generally, "Which all the cogs were fast..." Gotta love that one.
And when the table changes dynamically?
"These objects are stable right now, but when I run this one, frotz and blaarg breaks, but only every so often..."
"I changed it to mesh better with my slot allocation table, and now it does this! Can you, object author, help me out on this one plz" --My standard answer, "jmg says you've got full control, go ask him about it." (half serious)
"The only way my code works is with a fast cog, but I need, blooze, frobozz, and wiffle..." Answer: "Use more cogs together." Followed by some variations of, "how, can't I jigger the table some, wish all the cogs were fast...."
Or this: I only need fast for a little while, so I'll just change the table here, then put it back, etc... Gonna love sorting those out.
At its simplest, I would expect that a cog object could "declare" in the code itself how many hub slots it expects, and would fail to load if you tried to load it into a cog that has not been allocated the required number of slots in the various tables. There may be other things you might need to specify, such as maximum latency between hub slots as well.
Of course, if you don't specify anything for these values in your object, I presume it could be taken to mean your object could accept any level of hub access - but having this warning facility would prevent heartbreak, aggravation and grief when an object that worked with one table configuration silently fails under another table configuration.
Not at all. With the default design, solutions would be found, and once found, are useful for all time. The COGS get used together, or in some sequence, and they get used when needed, repurposed when not, etc... And with 16 of them, this is even more practical / possible than it is on P1 currently.
The only case I've seen yet which merits this is running a large program more quickly, and even that one can be done with multiple large programs running in parallel.
More importantly, those useful for all time solutions may not be found precisely because people will crank the COG / HUB access scheme up to make it work for them, rather than employ parallel programming to get the same result.
More important still: We have a culture here of helping people get the tough stuff sorted. We do that because once it's sorted, it's good for everybody.
With the table schemes, those same efforts will frequently solve for that person or a few cases only. I'm not doing it. "Ask these guys, they will help you out with the tables they asked for..." I'm not likely to go off and help provide table support for objects either. There are much better things to do, and I'm being frank and honest.
And what do you mean imagined?
Those are gonna happen. Bookmark my post. Every single word written there will appear in this forum within a month or two of release, and they will endure too. Over and over and over. They will show up, jigger the table, and begin attending class at the school of hard knocks, until the bell rings, again, and again.
Most likely scenario is: You touch the table, you are on your own. Cheers, thanks for playing.
Not at all. With the default design, solutions would be found, and once found, useful for all time.
Then your objects never hit the use cases you posted, as you seem to think 'solutions would be found' = No problem.
- you cannot have it both ways..., to admit some will be using the smarter mapping, is to admit the default is not a solution to all.
Read it again. People will crank the speed rather than use parallel programming and that's going to push the "works no matter what" solutions off the table in favor of "it works for me this time"
And no. You are wrong about that. When the cogs are used together, the cases get addressed, and they always work in the future no matter what they are combined with.
Framing it extra narrow doesn't work jmg.
I didn't say "no problem", and in fact, not saying that places emphasis on most of my post above. Sometimes it's tough to use the COGS together, but once those solutions are found, they always work in the future. And that "works for all time" means everybody has an incentive to help find those solutions as has been going on here for years. We are good at it now.
It works like this:
Use case hits hub access limit.
Either:
use more cogs, smarter data arrangement / alignment, etc...
, or
Crank the access time on that COG.
When we go for the latter, it's a case by case solution. When we do the former, it's a general case, always works solution.
The former is sometimes harder, but once done, done, with very few exceptions. The latter is going to be easier, because nobody wants to rethink the problem when they can make it go just a little faster...
Soon, our support here is lots of cases, all done case by case, and it's a mess for those who author objects, and for those who want to use them.
Hey, perhaps that's an opportunity. Paid object support!
And perhaps that's best as I know absolutely I'm not going to blow time on lots of hub access cases for other people when I could be just doing my own thing. Have people thought about that dynamic? They should, and it's part of the "What is worth what?" discussion I put here a week or so ago.
It's not going to be all that easy to make general case objects. PASM generally requires people consider those hub access points, pack in instructions, etc... out of order, whatever, to make it work. Not a big deal, if it always works. But when it doesn't always work, somebody has to go into the guts of difficult objects and sort out the hub access details for a different time window scheme, or hope they can just adjust the table to make it work, "this time"
PASM written to the default will frequently perform poorly non-default. This impact has been very understated. And it's real because of how code typically gets written to take best advantage of hub access times.
I suppose object authors could write programs to write programs for various table cases too. Author something, then author a program to author that something to work for case, a, b, c, d, e, f, g, h, etc... And testing for those?
And BTW: That "faithful" **** needs to end. The concerns put here are completely valid and I would use a nicer word for it, but... I just don't want to. Let's put it that way.
"Nice object there jmg, too bad it uses one of those ****** table schemes. It bumps into my xxx driver."
Or we get this, "Hey, spud! Nice video object. Can you rejigger it for table allocation X?" -- "No, you've got full control, please use it." lol
Maybe this? "Can anybody help me figure out how come spiffyfotz, warblubble, and fringly won't work with this table allocation needed for dwarbubs, and frotz?"
More generally, "Which all the cogs were fast..." Gotta love that one.
And when the table changes dynamically?
"These objects are stable right now, but when I run this one, frotz and blaarg breaks, but only every so often..."
"I changed it to mesh better with my slot allocation table, and now it does this! Can you, object author, help me out on this one plz" --My standard answer, "jmg says you've got full control, go ask him about it." (half serious)
"The only way my code works is with a fast cog, but I need, blooze, frobozz, and wiffle..." Answer: "Use more cogs together." Followed by some variations of, "how, can't I jigger the table some, which all the cogs were fast...."
Or this: I only need fast for a little while, so I'll just change the table here, then put it back, etc... Gonna love sorting those out.
Gee! We best get rid of hub ram altogether because its way too complex for your users to manage. Come on now! If you don't like it, don't use it and don't use my objects. It really is that simple!
We have already agreed not to post our co-operating objects in obex, so you're not going to be able to use them anyway. So its FUD.
Your user, will be pulling only from the Plug-and-Play basket, and using the (default) Plug-and-Play settings,
They have nothing to worry about. To them, there is no cog table.
Myself, and many others, will be using the better design-controls, to craft more powerful system designs.
I expect this group will have no problems grasping a cog table.
Cluso,
If you don't like it, don't use it and don't use my objects. It really is that simple!
Ah, the light goes on.
I think I now see the difference between those who value determinism, as in all objects/library codes will work for anyone, anywhere, all the time, and those who value fine grained control of HUB access to suit their purposes.
The former group is of the community, open source, sharing, "let's all muck in together" mind set.
The later group is of the closed source, I don't care what any community is doing, they are to dumb to use my code even if I did release it, mind set.
You see, what happens when there are hundreds of "jmgs" and Cluso's producing these super duper codes and they attempt to share them with the open source pool. Bah, they don't work together because they all make conflicting demands on the system.
I firmly believe that given the current and foreseeable market of the Propeller it absolutely requires the "let's all muck in together" community to thrive. Composing systems from parts created by others has to be easy. Break that and great damage is done.
Greedy programmers with their greedy COGs don't care about causing such breakage. As long as it's good for them.
Tangentially related question:
With all these funky HUB slot tables, and now two layer HUB tables, how on earth does anyone use it?
So I have collected a bunch of objects/library codes whatever. I've carefully read all the supplied documentation and see that they need this or that extra HUB slots over the default case. They have been written by some clever guys who worked out how to get the things working given some HUB slot juggling.
Now I have to contrive some entries to go in these tables to suite the system I'm building. What a pain.
I also have to ensure that I get everything started up in the right order, or make sure I start the right object in the right COG, else my COG allocation in the tables does not match what actually runs. What a pain.
God help me if my program dynamically starts and stops COGs at run time. Or like my FFT can automagically spread itself over 2, 4 or 8 COGS.
This all sounds like allocating interrupts to interrupt levels and assigning priorities to handlers. It's a pain to reason about. Get it wrong a weird intermittent failures start to happen.
It has been pointed out repeatedly that the actual HDL implementation is very small and simple. Fine. Using it is not.
It has been said that I worry about this too much. For example in the "mooching" system if a driver cannot get the slot's then it should just gracefully degrade in performance and not fail catastrophically. In that case my question is "Why on earth is thing written to use mooching if it does not actually need it?". Presumably because it does actually need it and will fail without it.
BTW: That dynamic is why I liked mooch. It would improve big programs. That's the only case I find even remotely compelling. The others are a question of whether or not the work to do things in parallel gets done, or not. Not being able to depend on the moocher basically puts your "why was it written to mooch in the first place?" case off to the side. That will be the first question asked, and from there, "why not use two COGS?" and the end result is something that always works.
Anybody planning on releasing binary blobs with the code protect? Do you think those will see any use at all if they always work? How about when they don't always work? Compare and contrast support in both cases.
Anybody planning on releasing binary blobs with the code protect?
I don't think this is actually possible. Surely code protect works on the whole program not individual objects within it.
But of course those C programmers who are still resisting the opensource revolution might think to put out drivers and such as binary only libraries to be linked with user code. Gack.
potatohead,
I don't think this is actually possible. Surely code protect works on the whole program not individual objects within it.
Yes my understanding is the code protect would be just on the global image?
Binaries do have their place, we're better off with Linus's Turbulence demo, for instance, than without it.
But of course those C programmers who are still resisting the opensource revolution might think to put out drivers and such as binary only libraries to be linked with user code. Gack.
Could be an argument for banning C (ducks for cover)
If the Px really is as wonderful as you claim with a locked 1:16 scan, then it will never be needed or used.
Again, use case hits hub access limit and either:
use more cogs, optimize code better, smarter data structures, etc...
, or
crank the speed up. Once changed, even default objects can fail or perform very differently.
Please quit framing it that way jmg. It's not accurate at all.
We had this very same discussion in the year or so after P1 was released. The answer was to do it in parallel. Turns out that is harder than cranking the speed up, and that's precisely what people will do first.
Notably, that stuff from day one still works for everybody at all times. Huge knowledge base there, filled with solutions to all sorts of tough problems. Find one, learn it, use it, done, next. If somebody made an object, use that, done, next. At most, some variables need to be moved, or a mailbox added, pin definition or timing constant edited, if the object does not dynamically compute it for various clock speeds, etc... but those things are seriously easy compared to recoding the guts of something intricate.
The fantastic support is one often cited advantage of buying Propeller chips. This dynamic is why that support is fantastic. We all know we all benefit when code is shared here, objects shared here, and the solution works.
That fantastic support is going to go down a notch when people start figuring out they are solving for limited or even singular cases. Again, I'm not going to be doing it.
Look at the work Ross has into Catalina. He's got a whole pile of "stuff that always works" nicely integrated in, easy cheezy. That's not going to play out the same if we break the "always works" part of things. I could go on, but I think it's clear enough.
He only left a binary as a puzzle to see if there was going to be any demo scene activity on the P1. He later released code and an assembler. Just playing for fun 'n games in the spirit of those things.
We aren't better for it, unless we go looking, which was his point in the first place.
The greater point is binaries will require a lot more meta-information to even be viable with a variable HUB scheme.
heater & potatohead,
You guys love to have it both ways!
First you want to limit objects that go into the OBEX so that my co-operating cogs don't get posted there. So I agree.
Now you say because they are not there, that is what I wanted, to keep them private.
Come on guys, be realistic. You just don't want the P2 to benefit from smart programmers. Are all your programmers (and followers) that dumb???
Anyway, I am done on the subject online. I will wait until Chip surfaces and discuss it offline in the meantime. It's just not worth the agro!
Lucky I love the prop and have great respect for Parallax (Chip and Ken, etc). That's why I want it to succeed. Otherwise I would go elsewhere.
If the Px really is as wonderful as you claim with a locked 1:16 scan, then it will never be needed or used.
Optional yes. But it's very different from say choosing to use the REV instruction in my code or not. That has no impact on the community of other codes running on my Propeller. They know nothing about it.
These HUB schemes have the potential to bugger up the architecture globally. Make the chip harder to use and fail when you have a community of programmers all trying to use it.
Throwing features like this into the design that may or may not be useful so someone somewhere and are best left switched off seems a bit crazy to me.
Oh, wait - you ran out of COGs - how ever did that happen ? - but were were told 'just use more COGs' was the solution ?
This is a bogus and argument.
Of course resources run out, we cannot defy the laws of physics. Memory, COGS, pins, speed are all finite.
That is totally different from a debate about how to use the resources we have in a simple an efficient way.
Personally I have grave doubt's about the 1:16 "round-robin". That seems nuts as well.
The 8 way HUB of the Prop I made sense. Bandwidth was shared equally. Latency on HUB misses was not show stopping. Bandwidth available to each COG was reasonable given the COGs speed.
Stretching that to a 16 way HUB seems, well, a stretch to far.
I'd be happy to chop the design down the middle and have two 8 way HUBs each serving 8 COGs with 256K of RAM. Instantly doubling performance for HUB intensive code.
Unlikely to happen as we don't seem to have a way to get communications going between the left and right halves of that "brain".
I'm also very fond of the Monte Carlo HUB Arbiter (Pat. Pending). I suggested it with tongue in cheek but the more I think about it the more it grows on me. It has some great properties.
@cluso: I never implied nor stated, nor intended either of these:
First you want to limit objects that go into the OBEX so that my co-operating cogs don't get posted there. So I agree.
Now you say because they are not there, that is what I wanted, to keep them private.
Right now, I don't even know what an OBEX will look like. What I do know is what the forum is likely to look like, and I wrote to some dynamics that will change and what I think the implications could be.
I also wrote to the idea that, "if it's awesome, the table scheme will never be used" and how I see it getting used and what those implications are.
We aren't better for it, unless we go looking, which was his point in the first place.
The greater point is binaries will require a lot more meta-information to even be viable with a variable HUB scheme.
I think there's some confusion here; wouldn't the binary be the whole image, including whatever hub option is selected, crystal frequency etc built into it, ie it would "just work" given the correct oscillator driving it. Ie its most useful from an end user point of view in that the hub details are hidden
As for going looking, yes that's always where the benefit is. A binary blob gives us a focus to look for and perhaps reverse engineer, or strive for. That's better than nothing.
First you want to limit objects that go into the OBEX so that my co-operating cogs don't get posted there. So I agree.
Err what?
I have never said anything about censoring code that goes into OBEX. I'm not even concerned about OBEX. I have a fuzzy dream of a future when millions of Propeller users are sharing their code on Github or SourceForge or on their blogs and random web pages....
Now you say because they are not there, that is what I wanted, to keep them private.
No. Never said that.
You just don't want the P2 to benefit from smart programmers.
Quite the opposite.
In order for smart programmers to contribute those contributions have to work together. Not only that they have to work with whatever the dumb programmers are trying to create.
To that end your scheme of being able to allocate pairs of COGs "atomically" and have them use their HUB slots internally in anyway they see fit is about the only scheme suggested so far that fits the requirement of making contributed code work everywhere, for everyone all the time. It's an atomic unit. It has no impact on the timing of anything outside itself. Great!
I will wait until Chip surfaces and discuss it offline in the meantime.
Personally I have grave doubt's about the 1:16 "round-robin". That seems nuts as well.
Yup, & lot's of other designers will look at Px and also go 1:16 "round-robin" ??. That seems nuts...
Since we both agree "1:16 "round-robin" seems nuts as well.", why not let designers do what they have done for decades, and choose for themselves ?!
and yet you stillwant to try to veto a simple method to remove that drawback ? - now that's Logic I cannot fathom !!
Any merge of parallel Cores to common shared RAM, is going to have similar issues. It is not a surprise.
At some stage, system allocation decisions need to be made. Locking those decisions in Silicon, is going to shrink the Px available market, because is has reduced coverage.
It's that simple.
...wouldn't the binary be the whole image, including whatever hub option is selected, crystal frequency etc built into it, ie it would "just work" given the correct oscillator driving it
Not necessarily.
If we are programming in C I can write a driver and compile it into an object file or library archive and give that binary file to you.
You can then link it into the C program you are creating.
Of course you had better read the instructions carefully because my code will require 6 COGs all with mooching enabled or with some weird slot allocation table being set up. If you get that wrong it will probably fail intermittently or when subjected to heavy load.
Of course Murphy will ensure that such failures won't happen until your gadget is out in the field and you will never be able to produce a test that reproduces the failure case.
In fact it will always work fine in your test harness but be unreliable when placed in your real application.
At that point you will be cursing those who came up with non-deterministic HUB sharing ideas....:)
"Heater was right" you will be thinking to yourself.
To that end your scheme of being able to allocate pairs of COGs "atomically" and have them use their HUB slots internally in anyway they see fit is about the only scheme suggested so far that fits the requirement of making contributed code work everywhere, for everyone all the time. It's an atomic unit. It has no impact on the timing of anything outside itself. Great!
Agreed on this. End game is an 8 COG chip. And I'm not writing that as a negative! This does get us the Heater "split" without having to have two brains.
Re: done for decades
Yeah, you are right. Really tough problems and you know one Chip Gracey sorted out how to make them not so tough problems, and got "works always" out of it all as a bonus. Imagine that!
As for reduced coverage, nobody has shown anything other than the general case of large programs being improved in ways that also can't be done with multiple COGS. Cluso actually does suggest using multiple COGS with pairing! And it's still "always works", and that's why I liked pairing too, along with mooch. Mooch can speed up a big program, or a few of them, while not impacting anything else. Always works, but it might work faster. And mooch doesn't cost COGS. Pairing does cost a COG, or maybe they can tag-team on things as a unit (likely), but it can also be depended on too, unlike mooch. Nice trade-offs there. And guess what? You got it! "Always works"
I don't like the table schemes because they don't "always work", and I think the cost of that is really high.
What I've not done is PM or message Chip or call him and say "don't do it" And I could. I don't think he needs that at the moment, but that's just me.
Just putting the costs out there, that's all. Every time I do, people get really ****** off. Sorry. Costs are costs. Don't blame me for the basic realities of the world. Tagging it all as FUD, etc... is just a dodge really. Lame.
So if it happens with a table, then "cheap entertainment" will likely ensue. If not, then not. We can count up the bodies, or not, and have this discussion then too.
I stayed off the topic for a while, for the most part, looking to see the different schemes and cases. Thought it was only fair, since I made a nice post about how resolving those down to a well defined set would benefit everybody.
That didn't happen. Wish it had. Seems like we have pairing and mooch as candidates though. Maybe.
As far as I can tell, the impasse is still "always works" vs "works for me" Potentially unresolvable.
To Cluso's credit, pairing and to Bill's credit, mooching, both get at the goal without breaking important dynamics and incurring the costs associated with that. I've no problem at all recognizing those things, and feel both of those are the best options to date. Maybe ever. "always works" is worth a lot. Worth more than, "works for me" and that's just how I roll.
So this painful discussion got us two options that preserve "always works", and both of them address "run a big program faster" and "using multiple COGS is hard" too. (mooch, pairs) Progress!!
I'm happy. If it were not for the endless and painful tables, I might actually get back into the mooch and pair camp where I started some time ago. I really dislike the dynamics surrounding the tables.
...why not let designers do what they have done for decades, and choose for themselves
Great idea. The issue here is that when you get the power to mess with HUB timing you are not "choosing for yourself" you are choosing for the other 15 programmers who created the other codes running on the other COGs.
That's where my comment about "greedy programmers came in". Who says those other 15 guys don't want to be greedy as well?
Of course none of this matters for the truly "greedy programmer" who producing all his own code. I suggest that is and will continue to be a rare phenomena in the Propeller world.
...you still want to try to veto a simple method to remove that drawback ?
I have yet to see a simple method to remove that drawback. At least not one that is generally applicable to the community of Prop programmers.
Any merge of parallel Cores to common shared RAM, is going to have similar issues.
Yep, no surprise.
At some stage, system allocation decisions need to be made.
Yep, that is why we are all having the interminable debate.
Mooch can speed up a big program, or a few of them, while not impacting anything else.
Not exactly.
Let's build an application with some of Cluso's and jmg's clever code that uses mooching. Let's throw some mooching in of my own. And perhaps you have a moocher that might be useful in the application as well.
Well, now all the moochers are fighting over scraps in my Propeller.
That's OK, you say, they will run and work correctly even if they don't get those scraps. They just run slower.
Well, if that is true, everything continues to work fine. Then clearly they did not need mooching in the first place! We have implemented a feature into the Propeller that is not even needed!
Comments
Not my user, your user.
Your user, will be pulling only from the Plug-and-Play basket, and using the (default) Plug-and-Play settings,
They have nothing to worry about. To them, there is no cog table.
Myself, and many others, will be using the better design-controls, to craft more powerful system designs.
I expect this group will have no problems grasping a cog table.
Oh perfect!
We have our solution!
"Nice object there jmg, too bad it uses one of those ****** table schemes. It bumps into my xxx driver."
Or we get this, "Hey, spud! Nice video object. Can you rejigger it for table allocation X?" -- "No, you've got full control, please use it." lol
Maybe this? "Can anybody help me figure out how come spiffyfotz, warblubble, and fringly won't work with this table allocation needed for dwarbubs, and frotz?"
More generally, "Which all the cogs were fast..." Gotta love that one.
And when the table changes dynamically?
"These objects are stable right now, but when I run this one, frotz and blaarg breaks, but only every so often..."
"I changed it to mesh better with my slot allocation table, and now it does this! Can you, object author, help me out on this one plz" --My standard answer, "jmg says you've got full control, go ask him about it." (half serious)
"The only way my code works is with a fast cog, but I need, blooze, frobozz, and wiffle..." Answer: "Use more cogs together." Followed by some variations of, "how, can't I jigger the table some, wish all the cogs were fast...."
Or this: I only need fast for a little while, so I'll just change the table here, then put it back, etc... Gonna love sorting those out.
At its simplest, I would expect that a cog object could "declare" in the code itself how many hub slots it expects, and would fail to load if you tried to load it into a cog that has not been allocated the required number of slots in the various tables. There may be other things you might need to specify, such as maximum latency between hub slots as well.
Of course, if you don't specify anything for these values in your object, I presume it could be taken to mean your object could accept any level of hub access - but having this warning facility would prevent heartbreak, aggravation and grief when an object that worked with one table configuration silently fails under another table configuration.
Ross.
lulz
And that's it. Entertainment. Carry on kids. Be careful what you ask for.
You miss that anyone changing table mapping, is only doing so, because the present scheme has failed them.
All your imagined fail cases, exist only as an admission the present 1:16 is simply not good enough for all potential designs.
Seems we agree ?
The only case I've seen yet which merits this is running a large program more quickly, and even that one can be done with multiple large programs running in parallel.
More importantly, those useful for all time solutions may not be found precisely because people will crank the COG / HUB access scheme up to make it work for them, rather than employ parallel programming to get the same result.
More important still: We have a culture here of helping people get the tough stuff sorted. We do that because once it's sorted, it's good for everybody.
With the table schemes, those same efforts will frequently solve for that person or a few cases only. I'm not doing it. "Ask these guys, they will help you out with the tables they asked for..." I'm not likely to go off and help provide table support for objects either. There are much better things to do, and I'm being frank and honest.
And what do you mean imagined?
Those are gonna happen. Bookmark my post. Every single word written there will appear in this forum within a month or two of release, and they will endure too. Over and over and over. They will show up, jigger the table, and begin attending class at the school of hard knocks, until the bell rings, again, and again.
Most likely scenario is: You touch the table, you are on your own. Cheers, thanks for playing.
Yes, it's not so much a "cog machine" any more, but a "slot machine"!
Ka-ching!
Ross.
Then your objects never hit the use cases you posted, as you seem to think 'solutions would be found' = No problem.
- you cannot have it both ways..., to admit some will be using the smarter mapping, is to admit the default is not a solution to all.
And no. You are wrong about that. When the cogs are used together, the cases get addressed, and they always work in the future no matter what they are combined with.
Framing it extra narrow doesn't work jmg.
I didn't say "no problem", and in fact, not saying that places emphasis on most of my post above. Sometimes it's tough to use the COGS together, but once those solutions are found, they always work in the future. And that "works for all time" means everybody has an incentive to help find those solutions as has been going on here for years. We are good at it now.
It works like this:
Use case hits hub access limit.
Either:
use more cogs, smarter data arrangement / alignment, etc...
, or
Crank the access time on that COG.
When we go for the latter, it's a case by case solution. When we do the former, it's a general case, always works solution.
The former is sometimes harder, but once done, done, with very few exceptions. The latter is going to be easier, because nobody wants to rethink the problem when they can make it go just a little faster...
Soon, our support here is lots of cases, all done case by case, and it's a mess for those who author objects, and for those who want to use them.
Hey, perhaps that's an opportunity. Paid object support!
And perhaps that's best as I know absolutely I'm not going to blow time on lots of hub access cases for other people when I could be just doing my own thing. Have people thought about that dynamic? They should, and it's part of the "What is worth what?" discussion I put here a week or so ago.
It's not going to be all that easy to make general case objects. PASM generally requires people consider those hub access points, pack in instructions, etc... out of order, whatever, to make it work. Not a big deal, if it always works. But when it doesn't always work, somebody has to go into the guts of difficult objects and sort out the hub access details for a different time window scheme, or hope they can just adjust the table to make it work, "this time"
PASM written to the default will frequently perform poorly non-default. This impact has been very understated. And it's real because of how code typically gets written to take best advantage of hub access times.
I suppose object authors could write programs to write programs for various table cases too. Author something, then author a program to author that something to work for case, a, b, c, d, e, f, g, h, etc... And testing for those?
And BTW: That "faithful" **** needs to end. The concerns put here are completely valid and I would use a nicer word for it, but... I just don't want to. Let's put it that way.
We have already agreed not to post our co-operating objects in obex, so you're not going to be able to use them anyway. So its FUD.
Ah, the light goes on.
I think I now see the difference between those who value determinism, as in all objects/library codes will work for anyone, anywhere, all the time, and those who value fine grained control of HUB access to suit their purposes.
The former group is of the community, open source, sharing, "let's all muck in together" mind set.
The later group is of the closed source, I don't care what any community is doing, they are to dumb to use my code even if I did release it, mind set.
You see, what happens when there are hundreds of "jmgs" and Cluso's producing these super duper codes and they attempt to share them with the open source pool. Bah, they don't work together because they all make conflicting demands on the system.
I firmly believe that given the current and foreseeable market of the Propeller it absolutely requires the "let's all muck in together" community to thrive. Composing systems from parts created by others has to be easy. Break that and great damage is done.
Greedy programmers with their greedy COGs don't care about causing such breakage. As long as it's good for them.
Tangentially related question:
With all these funky HUB slot tables, and now two layer HUB tables, how on earth does anyone use it?
So I have collected a bunch of objects/library codes whatever. I've carefully read all the supplied documentation and see that they need this or that extra HUB slots over the default case. They have been written by some clever guys who worked out how to get the things working given some HUB slot juggling.
Now I have to contrive some entries to go in these tables to suite the system I'm building. What a pain.
I also have to ensure that I get everything started up in the right order, or make sure I start the right object in the right COG, else my COG allocation in the tables does not match what actually runs. What a pain.
God help me if my program dynamically starts and stops COGs at run time. Or like my FFT can automagically spread itself over 2, 4 or 8 COGS.
This all sounds like allocating interrupts to interrupt levels and assigning priorities to handlers. It's a pain to reason about. Get it wrong a weird intermittent failures start to happen.
It has been pointed out repeatedly that the actual HDL implementation is very small and simple. Fine. Using it is not.
It has been said that I worry about this too much. For example in the "mooching" system if a driver cannot get the slot's then it should just gracefully degrade in performance and not fail catastrophically. In that case my question is "Why on earth is thing written to use mooching if it does not actually need it?". Presumably because it does actually need it and will fail without it.
BTW: That dynamic is why I liked mooch. It would improve big programs. That's the only case I find even remotely compelling. The others are a question of whether or not the work to do things in parallel gets done, or not. Not being able to depend on the moocher basically puts your "why was it written to mooch in the first place?" case off to the side. That will be the first question asked, and from there, "why not use two COGS?" and the end result is something that always works.
Anybody planning on releasing binary blobs with the code protect? Do you think those will see any use at all if they always work? How about when they don't always work? Compare and contrast support in both cases.
Correct, we agree !
and not using it, is even simpler still !!
This is an entirely optional feature.
If the Px really is as wonderful as you claim with a locked 1:16 scan, then it will never be needed or used.
Oh, wait - you ran out of COGs - how ever did that happen ? - but were were told 'just use more COGs' was the solution ?
But of course those C programmers who are still resisting the opensource revolution might think to put out drivers and such as binary only libraries to be linked with user code. Gack.
Yes my understanding is the code protect would be just on the global image?
Binaries do have their place, we're better off with Linus's Turbulence demo, for instance, than without it.
Could be an argument for banning C (ducks for cover)
Again, use case hits hub access limit and either:
use more cogs, optimize code better, smarter data structures, etc...
, or
crank the speed up. Once changed, even default objects can fail or perform very differently.
Please quit framing it that way jmg. It's not accurate at all.
We had this very same discussion in the year or so after P1 was released. The answer was to do it in parallel. Turns out that is harder than cranking the speed up, and that's precisely what people will do first.
Notably, that stuff from day one still works for everybody at all times. Huge knowledge base there, filled with solutions to all sorts of tough problems. Find one, learn it, use it, done, next. If somebody made an object, use that, done, next. At most, some variables need to be moved, or a mailbox added, pin definition or timing constant edited, if the object does not dynamically compute it for various clock speeds, etc... but those things are seriously easy compared to recoding the guts of something intricate.
The fantastic support is one often cited advantage of buying Propeller chips. This dynamic is why that support is fantastic. We all know we all benefit when code is shared here, objects shared here, and the solution works.
That fantastic support is going to go down a notch when people start figuring out they are solving for limited or even singular cases. Again, I'm not going to be doing it.
Look at the work Ross has into Catalina. He's got a whole pile of "stuff that always works" nicely integrated in, easy cheezy. That's not going to play out the same if we break the "always works" part of things. I could go on, but I think it's clear enough.
He only left a binary as a puzzle to see if there was going to be any demo scene activity on the P1. He later released code and an assembler. Just playing for fun 'n games in the spirit of those things.
We aren't better for it, unless we go looking, which was his point in the first place.
The greater point is binaries will require a lot more meta-information to even be viable with a variable HUB scheme.
You guys love to have it both ways!
First you want to limit objects that go into the OBEX so that my co-operating cogs don't get posted there. So I agree.
Now you say because they are not there, that is what I wanted, to keep them private.
Come on guys, be realistic. You just don't want the P2 to benefit from smart programmers. Are all your programmers (and followers) that dumb???
Anyway, I am done on the subject online. I will wait until Chip surfaces and discuss it offline in the meantime. It's just not worth the agro!
Lucky I love the prop and have great respect for Parallax (Chip and Ken, etc). That's why I want it to succeed. Otherwise I would go elsewhere.
These HUB schemes have the potential to bugger up the architecture globally. Make the chip harder to use and fail when you have a community of programmers all trying to use it.
Throwing features like this into the design that may or may not be useful so someone somewhere and are best left switched off seems a bit crazy to me.
This is a bogus and argument.
Of course resources run out, we cannot defy the laws of physics. Memory, COGS, pins, speed are all finite.
That is totally different from a debate about how to use the resources we have in a simple an efficient way.
Personally I have grave doubt's about the 1:16 "round-robin". That seems nuts as well.
The 8 way HUB of the Prop I made sense. Bandwidth was shared equally. Latency on HUB misses was not show stopping. Bandwidth available to each COG was reasonable given the COGs speed.
Stretching that to a 16 way HUB seems, well, a stretch to far.
I'd be happy to chop the design down the middle and have two 8 way HUBs each serving 8 COGs with 256K of RAM. Instantly doubling performance for HUB intensive code.
Unlikely to happen as we don't seem to have a way to get communications going between the left and right halves of that "brain".
I'm also very fond of the Monte Carlo HUB Arbiter (Pat. Pending). I suggested it with tongue in cheek but the more I think about it the more it grows on me. It has some great properties.
Right now, I don't even know what an OBEX will look like. What I do know is what the forum is likely to look like, and I wrote to some dynamics that will change and what I think the implications could be.
I also wrote to the idea that, "if it's awesome, the table scheme will never be used" and how I see it getting used and what those implications are.
I think there's some confusion here; wouldn't the binary be the whole image, including whatever hub option is selected, crystal frequency etc built into it, ie it would "just work" given the correct oscillator driving it. Ie its most useful from an end user point of view in that the hub details are hidden
As for going looking, yes that's always where the benefit is. A binary blob gives us a focus to look for and perhaps reverse engineer, or strive for. That's better than nothing.
I have never said anything about censoring code that goes into OBEX. I'm not even concerned about OBEX. I have a fuzzy dream of a future when millions of Propeller users are sharing their code on Github or SourceForge or on their blogs and random web pages.... No. Never said that. Quite the opposite.
In order for smart programmers to contribute those contributions have to work together. Not only that they have to work with whatever the dumb programmers are trying to create.
To that end your scheme of being able to allocate pairs of COGs "atomically" and have them use their HUB slots internally in anyway they see fit is about the only scheme suggested so far that fits the requirement of making contributed code work everywhere, for everyone all the time. It's an atomic unit. It has no impact on the timing of anything outside itself. Great! Hey...that's cheating
Yup, & lot's of other designers will look at Px and also go 1:16 "round-robin" ??. That seems nuts...
Since we both agree "1:16 "round-robin" seems nuts as well.", why not let designers do what they have done for decades, and choose for themselves ?!
and yet you stillwant to try to veto a simple method to remove that drawback ? - now that's Logic I cannot fathom !!
Any merge of parallel Cores to common shared RAM, is going to have similar issues. It is not a surprise.
At some stage, system allocation decisions need to be made.
Locking those decisions in Silicon, is going to shrink the Px available market, because is has reduced coverage.
It's that simple.
If we are programming in C I can write a driver and compile it into an object file or library archive and give that binary file to you.
You can then link it into the C program you are creating.
Of course you had better read the instructions carefully because my code will require 6 COGs all with mooching enabled or with some weird slot allocation table being set up. If you get that wrong it will probably fail intermittently or when subjected to heavy load.
Of course Murphy will ensure that such failures won't happen until your gadget is out in the field and you will never be able to produce a test that reproduces the failure case.
In fact it will always work fine in your test harness but be unreliable when placed in your real application.
At that point you will be cursing those who came up with non-deterministic HUB sharing ideas....:)
"Heater was right" you will be thinking to yourself.
Agreed on this. End game is an 8 COG chip. And I'm not writing that as a negative! This does get us the Heater "split" without having to have two brains.
Re: done for decades
Yeah, you are right. Really tough problems and you know one Chip Gracey sorted out how to make them not so tough problems, and got "works always" out of it all as a bonus. Imagine that!
As for reduced coverage, nobody has shown anything other than the general case of large programs being improved in ways that also can't be done with multiple COGS. Cluso actually does suggest using multiple COGS with pairing! And it's still "always works", and that's why I liked pairing too, along with mooch. Mooch can speed up a big program, or a few of them, while not impacting anything else. Always works, but it might work faster. And mooch doesn't cost COGS. Pairing does cost a COG, or maybe they can tag-team on things as a unit (likely), but it can also be depended on too, unlike mooch. Nice trade-offs there. And guess what? You got it! "Always works"
I don't like the table schemes because they don't "always work", and I think the cost of that is really high.
What I've not done is PM or message Chip or call him and say "don't do it" And I could. I don't think he needs that at the moment, but that's just me.
Just putting the costs out there, that's all. Every time I do, people get really ****** off. Sorry. Costs are costs. Don't blame me for the basic realities of the world. Tagging it all as FUD, etc... is just a dodge really. Lame.
So if it happens with a table, then "cheap entertainment" will likely ensue. If not, then not. We can count up the bodies, or not, and have this discussion then too.
I stayed off the topic for a while, for the most part, looking to see the different schemes and cases. Thought it was only fair, since I made a nice post about how resolving those down to a well defined set would benefit everybody.
That didn't happen. Wish it had. Seems like we have pairing and mooch as candidates though. Maybe.
As far as I can tell, the impasse is still "always works" vs "works for me" Potentially unresolvable.
To Cluso's credit, pairing and to Bill's credit, mooching, both get at the goal without breaking important dynamics and incurring the costs associated with that. I've no problem at all recognizing those things, and feel both of those are the best options to date. Maybe ever. "always works" is worth a lot. Worth more than, "works for me" and that's just how I roll.
So this painful discussion got us two options that preserve "always works", and both of them address "run a big program faster" and "using multiple COGS is hard" too. (mooch, pairs) Progress!!
I'm happy. If it were not for the endless and painful tables, I might actually get back into the mooch and pair camp where I started some time ago. I really dislike the dynamics surrounding the tables.
That's where my comment about "greedy programmers came in". Who says those other 15 guys don't want to be greedy as well?
Of course none of this matters for the truly "greedy programmer" who producing all his own code. I suggest that is and will continue to be a rare phenomena in the Propeller world. I have yet to see a simple method to remove that drawback. At least not one that is generally applicable to the community of Prop programmers. Yep, no surprise. Yep, that is why we are all having the interminable debate.
Let's build an application with some of Cluso's and jmg's clever code that uses mooching. Let's throw some mooching in of my own. And perhaps you have a moocher that might be useful in the application as well.
Well, now all the moochers are fighting over scraps in my Propeller.
That's OK, you say, they will run and work correctly even if they don't get those scraps. They just run slower.
Well, if that is true, everything continues to work fine. Then clearly they did not need mooching in the first place! We have implemented a feature into the Propeller that is not even needed!