If the cog wants its slot, it get it. If a cog does not use its available slot, it is available and it has no part in the process. This is the normal default, and requires no settings. It remains deterministic. It makes no difference to this cog if another cog uses its unwanted slot, so any existing driver should work untouched.
Optional (I am in two minds of whether this is even necessary)
If the cog wants to yield its slot (even if it wants it), then it must set a yield flag and a cog# it will yield to. eg YIELD #0..7
A cog can claim extra slots for itself by:
Setting that it can use additional unused slots, and which unused slots it can use, with 2 settings - high priority and low priority.
Note: If we implement the optional YIELD, then a priority slot with yield will steal that cogs slot if required.
Each slot can only have a single high priority cog# assigned to it.
Do we really require more than one priority?
Does this sound acceptable? We can work out the best instruction if we can get the requirements done.
Boss Hogg gets all the left over slots that 0,4,7 don't want and 1,2,5,6,7 don't claim during their turn
7 gets all its own slots that it wants, but 0,4 get priority claim if they need it, otherwise Hogg claims it
BTW the first table does not share the unused slots between 0 & 4 (only 0 gets them)
Yes, you have a 2 level priority system, and then a Hog to collect the dregs, for each slot. Plus a yield bit to pass it to either priority in preference to itself, but not to the Hog if it wants it itself.
Bill,
I think we will require a separate YIELD instruction because this should not change other slot settings, and can only be done by the slots own cog.
YIELD #n 'n=1 to yield to either Priority#1 or Priority#2 cogs and not to others.
The set priority cogslots would become (based on your definition, but using the current instruction formats and Chip's naming)...
COGSLOT D/#,#0..7 where D/# = a_bbb_ccc; 0..7 is the cog/slot#
bbb: is first priority cog# for this slot if the current cog does not require it, or yields it
ccc: is the second priority cog# for this slot, if bbb was offered it and does not require it
a: if =1 then this cog# can also use any available cog slot that is not required for:- its own cog (whether yield or not), the first priority cog bbb, or second priority cog ccc.
This way, the primary cog could set up the complete slot regime, with the exception that it cannot take priority over a cogs own priority for its own slot use unless it that cog explicitly uses the YIELD instruction. Of course, whether the cog has yielded its slot or not, if it does not require it, it is available.
I'm sure far from reaching the point of fully understanding the present discussion consequences, but I'm trying to figure out, if by carefully managing that new feature, one can ensure that we almost completely got rid of stuck threads, by hub slot sync starvation.
I'm trying to devise a programmed slowly shifting hub slot offering, that can satisfy each and every of the possible 32 running tasks access needs, that could ensure no one else could ever be almost indefinitely blocked from having at least a chance to dispatch its pending hub ops.
If anyone else could help figuring this out, I'll sure appreciate a lot.
I thought about this for a few hours after reading all that you guys wrote, and I came up with this:
SETSLOT D/# - Set cog's slot register to %PPP_CCC
%PPP = priority code
%CCC = a cog given priority
MY SLOT PRIORITY
PPP 1st 2nd 3rd WHAT OTHER SLOTS CAN I USE?
--------------------------------------------------------------------
000 ME ANY ANY OTHER COG'S CCC = ME (default)
001 ME ANY ANY ANY
010 ME CCC ANY OTHER COG'S CCC = ME
011 ME CCC ANY ANY
100 CCC ME ANY OTHER COG'S CCC = ME
101 CCC ME ANY ANY
110 CCC ANY ANY OTHER COG'S CCC = ME
111 CCC ANY ANY ANY
PPP in plain English
000 = If I don't use my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
001 = If I don't use my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
010 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
011 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
100 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
101 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
110 = If CCC doesn't want my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
111 = If CCC doesn't want my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
Looking good to me.
Defaults to the way it is now, no changes required.
If a cog requires extra slots, it can get easy access to spare slots. If a cog has gifted it a priority slot, it can use that too.
A cog can gift its slot to another cog, either if it does not require it, or to the other cog in priority to itself.
There is no way for a cog to block other cogs.
A smart programmer can write/ modify programs to cooperate forextra slots.
So, standard round-robin access was not compromised. The secret sauce was not
adulterated.
Oh but it was...
One major piece of Propeller magic is that all COGs are equal and all COGSs get exactly the same shot at HUB.
This provides a firm guarantee that any combination of objects I pick up from OBEX or elsewhere will work together given enough COGs, Pins and HUB memory space. There is no possibility that any timing interactions between them will bugger up my plans. I don't ever have to even think about it as I would if interrupts or an RTOS were involved. Coupling between subsystems is as minimal as it can be. This is a very good thing.
As soon as one introduces the possibility that a COG or COGs can grab more than their fair share of HUB access time the above guarantee is broken, coupling between subsystems (objects) is introduced. That coupling undermines my faith that my object "mashup" will work. I now have to check the timing requirements of all the parts I use and perhaps even have to start changing them or managing them in special ways. This is not a good thing.
"Ah", one might say, "these special HUB slot grabbing tricks need only be used by people who know what they are doing. You still get your simple round robin access if you are not into working all this out"
True, but, I can imagine one day picking up SuperVideoDriverX and CosmicHighSpeedCommsY and perhaps something else, all written by gurus who know what they are doing and all needing max HUB access to function. Only problem is trying to put them all together in my humble program fails as there just is not enough HUB bandwidth to accommodate them all.
Of course the cost of that firm guarantee is performance. The bottom line is do we want to it up to gain a bit more speed at the risk of introducing un-decidability into the entire P2 software infrastructure of the future?
P.S. Often I have found myself talking about this guarantee and referring to it as "determinism". That is a bit different than the usual inference of cycle by
cycle determinism. It is the idea that I can easily determine ahead of time that various objects will work together with out having to do a timing analysis.
Ultimately though it arises from that cycle by cycle determinism of HUB access.
P.P.S Just noticed that Cluso said:
A smart programmer can write/ modify programs to cooperate for extra slots.
Which nicely sums up why I don't like the idea. When mixing and matching objects I ultimately end up with many smart programmers in my code. If they all need the bandwidth, which they do else they would not have done the "smart" thing, then my program does not work.
One major piece of Propeller magic is that all COGs are equal and all COGSs get exactly the same shot at HUB...
I get what your saying, but I think it is a luxury that comes at too much cost.
It's frustrating to have a task that is starving for hub access while other COGs are not using the hub.
The P2 needs to let engineers wring out maximum performance from the available resources and to do that engineers need to be able to direct those resources where they are needed.
We can add 'resource requirements' as part of the OBEX info.
On one hand I see merit in Chip's new concept. It seems to fit quite well.
On the other hand I see where Heater is coming from in terms of OBEX compatibility.
In a earlier post I suggested a simple model.
In my model all COGs will ALWAYS get their slot if they need it.
When a COG requests a HUBop it waits for its normal slot, but if WC is included in the HUBop
it will use the first FREE slot if it encounters one before its own slot.
This way no COG misses out on its own slot but can take advantage of free slots if
their available and the HUBop is allowed to look for them. This would not MESS with any
other OBEX code and is simpler for programmers to capatilize on.
WRLONG reg,addr 'would operate the same as it does now. Predictable SYNC to HUB cycles.
WRLONG reg,addr WC 'will use a free slot if it finds one before its own slot
I'm thinking multi-tasking as usual and trying to minimize pipeline stall between tasks.
If 4 tasks just happen to align to each want a HUBop your looking at a 10+8+8+8 cycle period.
If free slots where available this would be drastically reduced. Maybe 10+1+1+1 cycles?
I thought about this for a few hours after reading all that you guys wrote, and I came up with this:
SETSLOT D/# - Set cog's slot register to %PPP_CCC
%PPP = priority code
%CCC = a cog given priority
MY SLOT PRIORITY
PPP 1st 2nd 3rd WHAT OTHER SLOTS CAN I USE?
--------------------------------------------------------------------
000 ME ANY ANY OTHER COG'S CCC = ME (default)
001 ME ANY ANY ANY
010 ME CCC ANY OTHER COG'S CCC = ME
011 ME CCC ANY ANY
100 CCC ME ANY OTHER COG'S CCC = ME
101 CCC ME ANY ANY
110 CCC ANY ANY OTHER COG'S CCC = ME
111 CCC ANY ANY ANY
PPP in plain English
000 = If I don't use my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
001 = If I don't use my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
010 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
011 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
100 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
101 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
110 = If CCC doesn't want my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
111 = If CCC doesn't want my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
Does this seem comprehensive?
Very comprehensive.
It can implement what I was calling options 1, 2, and 3, so if fits the cases I had in mind.
It allows additional cases that would maximize consumed hub bandwidth but at the cost if determination which in some cases may be just what is needed.
The nice part is these can be mixed, so you could have two COG's work together in a manner that let's one of them have deterministic 2X hub access while the other get's table scraps. Other COG's could behave in the 'legacy mode' while others use a hub slot free for all...
This way no COG misses out on its own slot but can take advantage of free slots if their available and the HUBop is allowed to look for them. This would not MESS with any other OBEX code and is simpler for programmers to capatilize on.
But yes it would MESS with other objects in OBEX or elsewhere. When I happen to choose a hand full of "greedy" objects that rely on extra hub slots for their correct operation my resulting system will fail.
For sure such "greedy" objects will exist. Those elusive USB drivers and video things or other functions that we have not even thought of yet.
Instead of an easy go lucky deterministic world we now have something akin to interrupt driven systems. Starve a process of time and it fails.
Now, as much as I love the current firm timing guarantee of the P1 I am not going to be so insistent that it lives on in the P2. It's an engineering trade off against the "need for speed" and should be thought about.
I would like whatever solution comes up to be dead simple. They all look horrible complex to me so far.
Not only that I want a compiler error when I mix two things together that demand more HUB bandwidth than is possible. Just like we do when we run out of HUB memory.
On one hand I see merit in Chip's new concept. It seems to fit quite well.
On the other hand I see where Heater is coming from in terms of OBEX compatibility.
In a earlier post I suggested a simple model.
In my model all COGs will ALWAYS get their slot if they need it.
When a COG requests a HUBop it waits for its normal slot, but if WC is included in the HUBop
it will use the first FREE slot if it encounters one before its own slot.
This way no COG misses out on its own slot but can take advantage of free slots if
their available and the HUBop is allowed to look for them. This would not MESS with any
other OBEX code and is simpler for programmers to capatilize on.
WRLONG reg,addr 'would operate the same as it does now. Predictable SYNC to HUB cycles.
WRLONG reg,addr WC 'will use a free slot if it finds one before its own slot
I'm thinking multi-tasking as usual and trying to minimize pipeline stall between tasks.
If 4 tasks just happen to align to each want a HUBop your looking at a 10+8+8+8 cycle period.
If free slots where available this would be drastically reduced. Maybe 10+1+1+1 cycles?
Any thoughts?
It really is trading one problem for another. One the one hand objects can always have their 'legacy' slot, but an object needing more slots can't be guaranteed useful additional slots.
You would still end up with OBEX incompatibilities were sometime objects work and sometimes they don't depending on the hub usage of other objects.
Chip's scheme let's you explicitly plan hub usage.
I just don't think we should hinder the silicon in order to keep the OBEX simple.
I have so many diagrams on the old whiteboard I think I got LOST.
I don't know how Chip does it!
I now see even more merit in Chip's concept.
The pace of the forum in the last 23hrs has made me giddy!
I just don't think we should hinder the silicon in order to keep the OBEX simple.
Ouch!
That's actually a very big decision to make. If the use of objects from OBEX and elsewhere becomes complex it loses a ton of it's value. At least for hobbyists, and casual programmers who don't want, or have the time, to be bothered with complexity. Simple things become hard.
At that point we can say good by to the hobbyists and casual programmers. Say good by to OBEX and Spin itself. It has no purpose any more. The P2 will be for engineers only (who will want to use C)
Perhaps I exaggerate this possible outcome. Perhaps Parallax does not care about hobbyists and casual programmers for the P2.
Can you explain it to us mortals in English or with a simple diagram?
What I really meant by GET is that the issue is outside my imagination.
Knpw your limitations seems applicable here.
As for a diagram, the whiteboard is now erased.
It's now much better to look at!
That's actually a very big decision to make. If the use of objects from OBEX and elsewhere becomes complex it loses a ton of it's value. At least for hobbyists, and casual programmers who don't want, or have the time, to be bothered with complexity. Simple things become hard.
At that point we can say good by to the hobbyists and casual programmers. Say good by to OBEX and Spin itself. It has no purpose any more. The P2 will be for engineers only (who will want to use C)
Perhaps I exaggerate this possible outcome. Perhaps Parallax does not care about hobbyists and casual programmers for the P2.
I'm just pointing out my concerns.
I don't think it's as bad as you think.
Every coin has two sides, the other side of this one could be: "Parallax only cares about keeping things simple for hobbyists and casual programmers, they force engineers to keep the training wheels on."
I just think hub slot usage will be one more item to look at when picking objects, you already have to look at pins, how much hub memory is used, etc.
I really don't see many objects needing to mess with hub slots anyways, mostly just high performance video and maybe USB.
What I really meant by GET is that the issue is outside my imagination.
I have already realized that there are many features of the PII, perhaps most of the new features, that I will never understand and hence never use. Just look at the instruction set, it's huge!
Now, I could spend a ton of time assimilating it all. But that is not going to happen. Frankly there are so many other things that attract my attention in the world of computing , programming and micros that there is not time to dedicate to it. That's before I think of the demands on time by family and work and life.
As such. It's going to be down to the dedicated gurus to create those objects and libraries I will want to use in my projects.
Which brings us back to where I started. I want those objects to be simple to use with no obscure surprises popping up due to timing conflicts and such.
It's now much better to look at!
There is something very satisfying about a nice clean white board or virgin sheet of A4. Before it gets polluted with all kinds of wrong ideas and scrawls:)
One of the first things I might do is see how far I can push my FFT calculation by grabbing as much HUB bandwidth as possible. That's after adopting all the speed up opportunities offered by the new instruction set.
A user of such an FFT will now have to be aware of that and perhaps be prepared to understand how to make it more or less greedy as required.
One of the first things I might do is see how far I can push my FFT calculation by grabbing as much HUB bandwidth as possible. That's after adopting all the speed up opportunities offered by the new instruction set.
A user of such an FFT will now have to be aware of that and perhaps be prepared to understand how to make it more or less greedy as required.
I'm sure there are many such examples.
Yes, but at least they have that choice.
Keep in mind, Chip's setup gives you a lot of control over allocation, it is easy to prevent a COG from being 'Boss Hogg' as someone recently put it.
The user just has to accept that your FFT will be slower if it get's less hub slots.
Consider a case where a user needs FFT's faster than could be delivered without slot sharing, without this change the P2 is now not a contender for the project.
Now consider that with the new option they could make the decision to say I'm going to sacrifice a COG and let my FFT object have all the slots, now the P2 is still in the running for the project.
It reality it probably isn't that drastic, there are probably some other tasks the donor COG could perform by communicating over PORTD or using hub slots not consumed by the FFT.
... will now have to be aware of that and perhaps be prepared to understand how to make it more or less greedy as required.
Maybe objects from OBEX in some cases will require 2 versions.
The first is the high performance (greedy) version that requires a preferred/compulsory time slot allocation to operate.
The second version can be the single time slot (default) variant that plays well with others!
In a very short period from Ray first mentioning this we went through a few iterations among Chip and the forum and now have a plan for a very flexible hub-slot assignment mechanism.
I can see this having benefit for the SPIN interpreter, Chip should be able to make the code that loads execution fragments take advantage of this.
It could also speed up FCACHE loading and maybe lead to some optimized LMM loops.
Of course these things come at the cost of having other COG's donate hub slot's, but in a lot of cases that should not be a problem.
Keep in mind, Chip's setup gives you a lot of control over allocation, it is easy to prevent a COG from being 'Boss Hogg' as someone recently put it.
The user just has to accept that your FFT will be slower if it get's less hub slots.
Consider a case where a user needs FFT's faster than could be delivered without slot sharing, without this change the P2 is now not a contender for the project.
Now consider that with the new option they could make the decision to say I'm going to sacrifice a COG and let my FFT object have all the slots, now the P2 is still in the running for the project.
It reality it probably isn't that drastic, there are probably some other tasks the donor COG could perform by communicating over PORTD or using hub slots not consumed by the FFT.
C.W.
Perspective from a hobbyist AND everyday commercial user of the Propeller:
Chip's idea is easy to use and is really the only version I've fully understood. (I don't post much, but I lurk a LOT!)
I am accustomed to doing my deterministic stuff first, and then trading the data with the hub. I can imagine times where deterministic access to the hub could be advantageous, but I can't imagine why I wouldn't just use the internal port to get data from one cog to another if data determinism was that important.
If I understand correctly, Chip's way would provide a way to trade determinism for bandwidth with three easy options. I know there would be more complexity underneath, But I'm betting the compiler syntax will be almost as simple as his bulleted list.
BTW. I would use such an FFT object (with the understanding I would need to optimize the
cog access using such simple syntax) in a heartbeat.
If I understand correctly, Chip's way would provide a way to trade determinism for bandwidth with three easy options. I know there would be more complexity underneath, But I'm betting the compiler syntax will be almost as simple as his bulleted list.
Keep in mind there is a give and take. I can have one COG give up determinism for it's hub access while granting it's hub slots to another COG with full determinism.
As an example you have a video driver that needs 2X hub access while reading scan lines.
For this example video is COG5 and the donor is COG1.
COG1 could use SETSLOT %100_101 //PPP = 100, CCC = 101 (Cog 5)
COG5 could use SETSLOT %010_001 //PPP = 010, CCC = 100 (Cog1)
No other COG has set it's CCC value to COG1 or COG5
COG1 only uses hub slots when COG5 doesn't want them.
COG5 can use all of it's slots plus all of COG1's, COG1 will never prevent COG5 from using it's slot.
COG5 will not get any 'extra' unplanned slots as long as no other COG sets a CCC value to COG5.
While reading a scan line COG5 gets uninterrupted 2X hub access, COG1 can access the hub between scan lines and frames when COG5 isn't using the hub.
For reference, PPP values
010 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
100 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
Comments
Basically, my latest (post #3144) had a hub table, with eight slots corresponding to the eight hub cycles for cogs 0 - 7
Only cog X could modify hub slot X's 'Release' and 'Hog' flags
Default state would have %00_000_000 in all eight slots, corresponding to the current p2 fpga hub cycles
Now say only two cogs (say 0 and 4) needed their regular deterministic slots, and actually wanted 2x normal bandwidth
0: 00_000_000
1: 10_000_000
2: 10_000_000
3: 10_000_000
4: 00_000_000
5: 10_000_000
6: 10_000_000
7: 10_000_000
The above settings would give up to 50% of hub bandwidth to cog 0 & 4, leaving the "table scraps" to be shared round-robbin between the other cogs.
Say another cog wanted its reserved slot - cog 7 - but only used a small fraction
7: 00_000_000
See map change above
Now let's say we load cog #3 with boss hog code:
3: 11_000_000
Now 0&4 the two high speed cogs will get all cycles they can use, except those needed by 7 (but including unused cycles from 7)
Boss hog will get to measure how many slots it gets, by how many RDLONG's it can execute in say 1 second.
Hmmm...
Maybe all we really need is the 'Release' and 'Hog' bits?
But here is an example where we want to make sure that 0 & 4 have priority claims on free slots
0: 00_100_000
1: 10_000_100
2: 10_100_000
3: 11_000_100 ' Boss hog
4: 00_100_000
5: 10_000_100
6: 10_100_000
7: 00_000_100
Here 0,4,7 are guaranteed their normal slots
Boss Hogg gets all the left over slots that 0,4,7 don't want and 1,2,5,6,7 don't claim during their turn
7 gets all its own slots that it wants, but 0,4 get priority claim if they need it, otherwise Hogg claims it
Yes, you have a 2 level priority system, and then a Hog to collect the dregs, for each slot. Plus a yield bit to pass it to either priority in preference to itself, but not to the Hog if it wants it itself.
So your table is effectively this... So the default should be
x: 00_xxx_xxx 'where xxx=this slot# (ie skip this priority)
So, set your cogid if there is no priority for AAA and/or BBB.
Sounds perfect! I think "YIELD" and "OTHERS/AVAIL" might be better terms, but this is only semantics.
I think we will require a separate YIELD instruction because this should not change other slot settings, and can only be done by the slots own cog.
YIELD #n 'n=1 to yield to either Priority#1 or Priority#2 cogs and not to others.
The set priority cogslots would become (based on your definition, but using the current instruction formats and Chip's naming)...
COGSLOT D/#,#0..7
where D/# = a_bbb_ccc; 0..7 is the cog/slot#
bbb: is first priority cog# for this slot if the current cog does not require it, or yields it
ccc: is the second priority cog# for this slot, if bbb was offered it and does not require it
a: if =1 then this cog# can also use any available cog slot that is not required for:- its own cog (whether yield or not), the first priority cog bbb, or second priority cog ccc.
This way, the primary cog could set up the complete slot regime, with the exception that it cannot take priority over a cogs own priority for its own slot use unless it that cog explicitly uses the YIELD instruction. Of course, whether the cog has yielded its slot or not, if it does not require it, it is available.
I'm trying to devise a programmed slowly shifting hub slot offering, that can satisfy each and every of the possible 32 running tasks access needs, that could ensure no one else could ever be almost indefinitely blocked from having at least a chance to dispatch its pending hub ops.
If anyone else could help figuring this out, I'll sure appreciate a lot.
Yanomani
PPP in plain English
000 = If I don't use my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
001 = If I don't use my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
010 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
011 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
100 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
101 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
110 = If CCC doesn't want my hub slot, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
111 = If CCC doesn't want my hub slot, anyone else can use it.
If any slot is free and I'm next in line, I can use it.
Does this seem comprehensive?
Defaults to the way it is now, no changes required.
If a cog requires extra slots, it can get easy access to spare slots. If a cog has gifted it a priority slot, it can use that too.
A cog can gift its slot to another cog, either if it does not require it, or to the other cog in priority to itself.
There is no way for a cog to block other cogs.
A smart programmer can write/ modify programs to cooperate forextra slots.
Its a win-win all round.
One major piece of Propeller magic is that all COGs are equal and all COGSs get exactly the same shot at HUB.
This provides a firm guarantee that any combination of objects I pick up from OBEX or elsewhere will work together given enough COGs, Pins and HUB memory space. There is no possibility that any timing interactions between them will bugger up my plans. I don't ever have to even think about it as I would if interrupts or an RTOS were involved. Coupling between subsystems is as minimal as it can be. This is a very good thing.
As soon as one introduces the possibility that a COG or COGs can grab more than their fair share of HUB access time the above guarantee is broken, coupling between subsystems (objects) is introduced. That coupling undermines my faith that my object "mashup" will work. I now have to check the timing requirements of all the parts I use and perhaps even have to start changing them or managing them in special ways. This is not a good thing.
"Ah", one might say, "these special HUB slot grabbing tricks need only be used by people who know what they are doing. You still get your simple round robin access if you are not into working all this out"
True, but, I can imagine one day picking up SuperVideoDriverX and CosmicHighSpeedCommsY and perhaps something else, all written by gurus who know what they are doing and all needing max HUB access to function. Only problem is trying to put them all together in my humble program fails as there just is not enough HUB bandwidth to accommodate them all.
Of course the cost of that firm guarantee is performance. The bottom line is do we want to it up to gain a bit more speed at the risk of introducing un-decidability into the entire P2 software infrastructure of the future?
P.S. Often I have found myself talking about this guarantee and referring to it as "determinism". That is a bit different than the usual inference of cycle by
cycle determinism. It is the idea that I can easily determine ahead of time that various objects will work together with out having to do a timing analysis.
Ultimately though it arises from that cycle by cycle determinism of HUB access.
P.P.S Just noticed that Cluso said:
Which nicely sums up why I don't like the idea. When mixing and matching objects I ultimately end up with many smart programmers in my code. If they all need the bandwidth, which they do else they would not have done the "smart" thing, then my program does not work.
I get what your saying, but I think it is a luxury that comes at too much cost.
It's frustrating to have a task that is starving for hub access while other COGs are not using the hub.
The P2 needs to let engineers wring out maximum performance from the available resources and to do that engineers need to be able to direct those resources where they are needed.
We can add 'resource requirements' as part of the OBEX info.
C.W.
On one hand I see merit in Chip's new concept. It seems to fit quite well.
On the other hand I see where Heater is coming from in terms of OBEX compatibility.
In a earlier post I suggested a simple model.
In my model all COGs will ALWAYS get their slot if they need it.
When a COG requests a HUBop it waits for its normal slot, but if WC is included in the HUBop
it will use the first FREE slot if it encounters one before its own slot.
This way no COG misses out on its own slot but can take advantage of free slots if
their available and the HUBop is allowed to look for them. This would not MESS with any
other OBEX code and is simpler for programmers to capatilize on.
I'm thinking multi-tasking as usual and trying to minimize pipeline stall between tasks.
If 4 tasks just happen to align to each want a HUBop your looking at a 10+8+8+8 cycle period.
If free slots where available this would be drastically reduced. Maybe 10+1+1+1 cycles?
Any thoughts?
Very comprehensive.
It can implement what I was calling options 1, 2, and 3, so if fits the cases I had in mind.
It allows additional cases that would maximize consumed hub bandwidth but at the cost if determination which in some cases may be just what is needed.
The nice part is these can be mixed, so you could have two COG's work together in a manner that let's one of them have deterministic 2X hub access while the other get's table scraps. Other COG's could behave in the 'legacy mode' while others use a hub slot free for all...
I like it.
C.W.
For sure such "greedy" objects will exist. Those elusive USB drivers and video things or other functions that we have not even thought of yet.
Instead of an easy go lucky deterministic world we now have something akin to interrupt driven systems. Starve a process of time and it fails.
Now, as much as I love the current firm timing guarantee of the P1 I am not going to be so insistent that it lives on in the P2. It's an engineering trade off against the "need for speed" and should be thought about.
I would like whatever solution comes up to be dead simple. They all look horrible complex to me so far.
Not only that I want a compiler error when I mix two things together that demand more HUB bandwidth than is possible. Just like we do when we run out of HUB memory.
It really is trading one problem for another. One the one hand objects can always have their 'legacy' slot, but an object needing more slots can't be guaranteed useful additional slots.
You would still end up with OBEX incompatibilities were sometime objects work and sometimes they don't depending on the hub usage of other objects.
Chip's scheme let's you explicitly plan hub usage.
I just don't think we should hinder the silicon in order to keep the OBEX simple.
C.W.
I think I get it now!
I have so many diagrams on the old whiteboard I think I got LOST.
I don't know how Chip does it!
I now see even more merit in Chip's concept.
The pace of the forum in the last 23hrs has made me giddy!
That's actually a very big decision to make. If the use of objects from OBEX and elsewhere becomes complex it loses a ton of it's value. At least for hobbyists, and casual programmers who don't want, or have the time, to be bothered with complexity. Simple things become hard.
At that point we can say good by to the hobbyists and casual programmers. Say good by to OBEX and Spin itself. It has no purpose any more. The P2 will be for engineers only (who will want to use C)
Perhaps I exaggerate this possible outcome. Perhaps Parallax does not care about hobbyists and casual programmers for the P2.
I'm just pointing out my concerns.
I'm giddy enough that I cannot go back and try to read and understand the past days posts,
Mind you, that is true of about most of the last half a years Prop II developments. That does kind of worry me.
What I really meant by GET is that the issue is outside my imagination.
Knpw your limitations seems applicable here.
As for a diagram, the whiteboard is now erased.
It's now much better to look at!
I don't think it's as bad as you think.
Every coin has two sides, the other side of this one could be: "Parallax only cares about keeping things simple for hobbyists and casual programmers, they force engineers to keep the training wheels on."
I just think hub slot usage will be one more item to look at when picking objects, you already have to look at pins, how much hub memory is used, etc.
I really don't see many objects needing to mess with hub slots anyways, mostly just high performance video and maybe USB.
C.W.
Now, I could spend a ton of time assimilating it all. But that is not going to happen. Frankly there are so many other things that attract my attention in the world of computing , programming and micros that there is not time to dedicate to it. That's before I think of the demands on time by family and work and life.
As such. It's going to be down to the dedicated gurus to create those objects and libraries I will want to use in my projects.
Which brings us back to where I started. I want those objects to be simple to use with no obscure surprises popping up due to timing conflicts and such.
There is something very satisfying about a nice clean white board or virgin sheet of A4. Before it gets polluted with all kinds of wrong ideas and scrawls:)
I always seem to take the training wheels off too early and have to put them back on to get my confidence back!
Oh, that explains all my bumps and bruises, I should have made that connection a long time ago...
One of the first things I might do is see how far I can push my FFT calculation by grabbing as much HUB bandwidth as possible. That's after adopting all the speed up opportunities offered by the new instruction set.
A user of such an FFT will now have to be aware of that and perhaps be prepared to understand how to make it more or less greedy as required.
I'm sure there are many such examples.
Yes, but at least they have that choice.
Keep in mind, Chip's setup gives you a lot of control over allocation, it is easy to prevent a COG from being 'Boss Hogg' as someone recently put it.
The user just has to accept that your FFT will be slower if it get's less hub slots.
Consider a case where a user needs FFT's faster than could be delivered without slot sharing, without this change the P2 is now not a contender for the project.
Now consider that with the new option they could make the decision to say I'm going to sacrifice a COG and let my FFT object have all the slots, now the P2 is still in the running for the project.
It reality it probably isn't that drastic, there are probably some other tasks the donor COG could perform by communicating over PORTD or using hub slots not consumed by the FFT.
C.W.
Maybe objects from OBEX in some cases will require 2 versions.
The first is the high performance (greedy) version that requires a preferred/compulsory time slot allocation to operate.
The second version can be the single time slot (default) variant that plays well with others!
In a very short period from Ray first mentioning this we went through a few iterations among Chip and the forum and now have a plan for a very flexible hub-slot assignment mechanism.
I can see this having benefit for the SPIN interpreter, Chip should be able to make the code that loads execution fragments take advantage of this.
It could also speed up FCACHE loading and maybe lead to some optimized LMM loops.
Of course these things come at the cost of having other COG's donate hub slot's, but in a lot of cases that should not be a problem.
C.W.
Perspective from a hobbyist AND everyday commercial user of the Propeller:
Chip's idea is easy to use and is really the only version I've fully understood. (I don't post much, but I lurk a LOT!)
I am accustomed to doing my deterministic stuff first, and then trading the data with the hub. I can imagine times where deterministic access to the hub could be advantageous, but I can't imagine why I wouldn't just use the internal port to get data from one cog to another if data determinism was that important.
If I understand correctly, Chip's way would provide a way to trade determinism for bandwidth with three easy options. I know there would be more complexity underneath, But I'm betting the compiler syntax will be almost as simple as his bulleted list.
BTW. I would use such an FFT object (with the understanding I would need to optimize the
cog access using such simple syntax) in a heartbeat.
Keep in mind there is a give and take. I can have one COG give up determinism for it's hub access while granting it's hub slots to another COG with full determinism.
As an example you have a video driver that needs 2X hub access while reading scan lines.
For this example video is COG5 and the donor is COG1.
COG1 could use SETSLOT %100_101 //PPP = 100, CCC = 101 (Cog 5)
COG5 could use SETSLOT %010_001 //PPP = 010, CCC = 100 (Cog1)
No other COG has set it's CCC value to COG1 or COG5
COG1 only uses hub slots when COG5 doesn't want them.
COG5 can use all of it's slots plus all of COG1's, COG1 will never prevent COG5 from using it's slot.
COG5 will not get any 'extra' unplanned slots as long as no other COG sets a CCC value to COG5.
While reading a scan line COG5 gets uninterrupted 2X hub access, COG1 can access the hub between scan lines and frames when COG5 isn't using the hub.
For reference, PPP values
010 = If I don't use my hub slot, CCC can use it. If CCC doesn't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
100 = If CCC doesn't want my hub slot, I can use it. If I don't want it, anyone else can use it.
If any other cog set their CCC to me, and I'm next in line for that slot, I can use it.
Had this been the scheme with P1... Our obex would look a lot different. The "it doesn't work" sets of questions would be much different too.
All the other COGs can use a PPP value of 000 and behave exactly like the P1 as long as no other COG set's a CCC value to them.
"000 = If I don't use my hub slot, anyone else can use it. If any other cog set their CCC to me, and I'm next in line for that slot, I can use it."
With power comes responsibility to use it wisely, I think we are up to the task.
C.W.