Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 155 — Parallax Forums

Propeller II update - BLOG

1152153155157158223

Comments

  • SeairthSeairth Posts: 2,474
    edited 2014-01-14 04:40
    Just a thought: could the hub access be changed to a "take a number" approach? Maintain two three-bit counters in the hub and an additional three-bit field per cog, where one is incremented for each COG that's requesting hub access (the "issued" counter) and one is incremented at the completion of each HUBOP (the "current" counter). The cog with the active slot is the one whose "issued" equals "current". Since hub operations are atomic and can't be cancelled in the middle of the operation, the only concurrency issue is when the "issued" counter is incremented by two cogs at the same time. I suspect an approach like this would add another clock cycle to the hub accesses. But the worst-case scenario still guarantees hub access every 8 cycles (I think), while the best case scenario is back-to-back, two-cycle accesses for writes and four-cycle for reads.
  • User NameUser Name Posts: 1,451
    edited 2014-01-14 06:38
    ozpropdev wrote: »
    It would be sad :( if commercial users of
    P2 were limited by OBEX restrictions when their IP would never be posted there anyway.
    Not all code written for P1 for example fits the OBEX guidelines of ease of integration.
    I know some of my own P1 code is far too interleaved (tangled) to ever be broken up into
    OBEX compatible objects.

    What oxpropdev says! As a commercial user of course I would want the option of slot sharing. It's axiomatic.
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-01-14 06:44
    Congratulations, Chip!

    The HUBEXEC wonderment is all way above my pay grade but I do have a simple question. ( I still can't wait to play!)

    Once you kick a COG into HUBEXEC mode, what is the intended/best use of the COG register space. Except for the COG H/W registers at the top, does it just become a fast data space?
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-14 08:22
    Sounds too much like the problems people have with interrupts clashing.

    Yep, and that is why I'm not an advocate for this feature.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 09:02
    Heater. wrote: »
    Bill,

    Naysayer here...


    Actually there is:

    A memory shortage will be notified to you by the compiler or can even be know prior to compilation if objects were documented in detail. As far as I can tell timing starvation due to the clash of greedy HUB use could cause failures that are only detectable at run time. Can you make a compiler that will fail with an error when building such greedy codes together that cannot meet their timing requirements?

    A compiler could issue a warning if a module used a "SETMODE HUNGRY" instruction, or issue a warning if more than one module used it (which is the case you are worried about)

    In C's case, the linker could also issue a warning.
    Heater. wrote: »
    This is not about "newbies might get confused". After all the entire world is a Prop II newbie at this point in time. The same applies to professional programmers and those who don't like weird surprises causing their code to fail in mysterious and random ways.

    This is not about "it won't be deterministic" in the sense of counting cycles to get the timing right as you might do with in COG code. I don't think anyone expects to do that for HUB code. This is about that higher level determinism of being able to mix and match objects/libraries/functionality without random failures cropping up. Well, you already know all that.

    Except it will get used. Perhaps by some super duper video driver. Perhaps by some high speed communications protocol driver. Now I come along and want to use both of those in my application. BOOM, random failures, hours wasted head scratching, ultimate disappointment as I find I cannot use them together and my project is doomed.

    I am perfectly happy with a warning in case "HUNGRY" is used in ANY module/object, I am even ok with banning use of "HUNGRY" in obex.

    What I am not happy with is deliberately slowing down hubexec.

    Consider - an eight long cache line is guaranteed to have one hub cycle for fetching the next sequential line, and very likely to have one or two more hubops.

    Without hungry, with prefetch, in an ideal case we get the pre-fetch for free.

    Say two hub ops in the cache line... no more free prefetch, and the cache line will take at least 24 clock cycles to execute.

    With HUNGRY, it can execute in as little as 8-12 cycles, assuming two extra slots are available.

    Deliberately slowing down the speed of compiled C / Spin / etc code 2x for some philosophical point makes zero sense.
    Heater. wrote: »
    Sure this is all some kind debate about "purity" vs "I want speed and damn the complexity" debate. Chip will have to make his own subjective value judgement of that.

    P.S. To answer my own question above, if the compilers would spit out a big red warning:

    "YOU HAVE BROKEN TIMING ISOLATION BETWEEN COG PROCESSES: RANDOM ERRORS MAY OCCUR AT RUNTIME"

    when that greedy mode is used in a program I might be happier about it:)
    But not much. Sounds too much like the problems people have with interrupts clashing.

    I disagree.

    In most use cases I see, there will only likely be one cog in HUNGRY mode - the cog that is running the large program implementing the "business logic" or "user interface".

    A better warning would be:

    "MODULE XXX wants to use extra hub access - if more than one module attempts this, the modules may not get the memory bandwidth they would like"

    Another useful warning:

    "MODULE XXX wants extra hub acess - WARNING - only 1/8 of hub cycles are guaranteed, module should not assume availability of more cycles"

    As I said, banning hungry from obex, your suggestion of adding warnings etc are all good solutions.

    Note that in the hubexec case, the code would gracefully slow down to the same level of performance as without "hungry" in case spare slots are not available. Same in the case of bitmap updates. With the new read/write octal, the largest hub bandwidth consumers (high bandwidth display updates) have more bandwidth than they can use, without extra slots.

    I want P2 to succeed, which means it has to perform as well as it can - deliberately locking out a significant speed gain for hubexec is deliberately being less competitive.

    It's worth noting again:

    - no slots are "stolen", hungry would only use otherwise "wasted" hub slots
    - non-hungry cogs cannot be affected by hungry cogs
    - warnings are a great idea, eliminates the "huh why does this not work the way I expect... DUH! The warnings tell me!"
    - a hungry hubexec cog could actually run code faster than a non-hungry non-hubexec cog, great for C / Spin / etc
    - a hungry non-hubexec VM could run byte/word code much faster than a non-hungry cog
  • dMajodMajo Posts: 855
    edited 2014-01-14 10:19
    Another useful warning:

    "MODULE XXX wants extra hub acess - WARNING - only 1/8 of hub cycles are guaranteed, module should not assume availability of more cycles"

    What does it means? That a hungry cog is not sure to have additional hub windows? That it can have them only if other cogs are not using them?

    How translate this, lets say, to non hubexec hungry-cog executing pasm?
    Now I know how many instructions I can have between two hup ops to keep the sync.
    If I want to eg double the bandwidth I need others cog hub access so I will eg interleave 1 hub op every four instead of every eight instructions.
    Now what happens to my code if there is no cog releasing its resources ... my hungry cog will run once slower than standard.
    If I translate this to a loop and other cogs are occasionally accessing the hub it can happen the the loop will execute some cycles at standard speed, some at double, others at half the speed.

    Perhaps I've get it wrong, but if this is the route to be taken I see this working only under a fixed rule:
    the hungry cog can use others cogs access window only if this window is released by the releasing cog with its internal code using a specific opcode and until not revoked the hub access for this cog is loosen.
    That means that to use this feature the hungry cog will need to coginit another one and within the second release the resource (and eventually waitpeq for ever, on internal io, if no other uses).
    This will warranty that eg in the obex or elsewhere the object will allocate the number of cogs it really needs (even if the second one is used only to release its slot to the first.
    I hope that in the meantime other sated objects/drivers will start appearing releasing the hub window by default.

    Hopefully by pairing such objects in the main app the balance will not be affected.
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-14 10:26
    Those dynamics are why Chip suspected it makes best sense as an agreement between objects.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 10:38
    dMajo wrote: »
    What does it means? That a hungry cog is not sure to have additional hub windows? That it can have them only if other cogs are not using them?

    Exactly!
    dMajo wrote: »
    How translate this, lets say, to non hubexec cog executing pasm?
    Now I know how many instructions I can have between two hup ops to keep the sync.

    If you don't enable hungry, absolutely no change.
    dMajo wrote: »
    If I want to eg double the bandwidth I need others cog hub access so I will eg interleave 1 hub op every four instead of every eight instructions.
    Now what happens to my code if there is no cog releasing its resources ... my hungry cog will run once slower than standard.

    I think I see where the misunderstanding is.

    hungry mode is NOT meant for additional guaranteed bandwidth (that would require paired cogs with explicit giving up of a hub slot... however that is not useful, as the cog could not make use of a higher data rate than 1 long per clock cycle, which RDOCT already gives us)

    With RDOCT, a normal cog can read 8 longs in 8 cycles ... 800MB/sec @ 200Mhz

    1080p60Hz 32bpp is 500MB/sec, so no need for more slots to display 1080p

    hungry mode is for compiled code, VM's etc, that can use all the extra hub banwidth to great benefit... but will still work fine (albeit slower) without spare slots.
    dMajo wrote: »
    If I translate this to a loop and other cogs are occasionally accessing the hub it can happen the the loop will execute some cycles at standard speed, some at double, others at half the speed.

    Correct - and the use cases (compiled code, VM's) that can make use of the "bonus" speed when available.
    dMajo wrote: »
    Perhaps I've get it wrong, but if this is the route to be taken I see this working only under a fixed rule:
    the hungry cog can use others cogs access window only if this window is released by the releasing cog with its internal code using a specific opcode and until not revoked the hub access for this cog is loosen.
    That means that to use this feature the hungry cog will need to coginit another one and within the second release the resource (and eventually waitpeq for ever, on internal io, if no other uses).
    This will warranty that eg in the obex or elsewhere the object will allocate the number of cogs it really needs (even if the second one is used only to release its slot to the first.
    I hope that in the meantime other sated objects/drivers will start appearing releasing the hub window by default.

    Hopefully by pairing such objects in the main app the balance will not be affected.

    Your usage case is only needed for guaranteed access to more than one hub slot, which is (IMHO) not needed; what I see being needed is making compiled / VM code run as fast as possible, when possible.
  • SRLMSRLM Posts: 5,045
    edited 2014-01-14 11:47
    With RDOCT, a normal cog can read 8 longs in 8 cycles ... 800MB/sec @ 200Mhz

    1080p60Hz 32bpp is 500MB/sec, so no need for more slots to display 1080p

    hungry mode is for compiled code, VM's etc, that can use all the extra hub banwidth to great benefit... but will still work fine (albeit slower) without spare slots.

    This seems reasonable to me... My main concern would be if you had two VMs then how would you evenly divide the free slots? I can imagine that it would tend to favor one over the other. But then you test, discover that, and build that assumption into your program. Somewhere down the line you change the order that you initialize cogs, and now the other VM has the higher performance, which breaks your assumptions and your code. Unless you catch this change right away I think it would take some effort to find the bug (changed order of cog starts).

    It seems that hungry mode makes cog startup order no longer idempotent.

    ps: Thanks to evanh for successfully correcting the thread title to what it should be way back in post 4510.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 11:55
    My point is that hungry cogs would only ever be guaranteed their normal slot, and would degrade gracefully.

    I seriously don't see what code would reasonably require additional bandwidth and break if it was not available - i can see the business logic running somewhat slower, lines being drawn slower etc, but not apps simply not working - that would mean very poor design, which would likely mean other bugs not related to spare slots would be there.
    SRLM wrote: »
    This seems reasonable to me... My main concern would be if you had two VMs then how would you evenly divide the free slots? I can imagine that it would tend to favor one over the other. But then you test, discover that, and build that assumption into your program. Somewhere down the line you change the order that you initialize cogs, and now the other VM has the higher performance, which breaks your assumptions and your code. Unless you catch this change right away I think it would take some effort to find the bug (changed order of cog starts).

    It seems that hungry mode makes cog startup order no longer idempotent.

    ps: Thanks to evanh for successfully correcting the thread title to what it should be way back in post 4510.
  • SRLMSRLM Posts: 5,045
    edited 2014-01-14 12:07
    My point is that hungry cogs would only ever be guaranteed their normal slot, and would degrade gracefully.

    I seriously don't see what code would reasonably require additional bandwidth and break if it was not available - i can see the business logic running somewhat slower, lines being drawn slower etc, but not apps simply not working - that would mean very poor design, which would likely mean other bugs not related to spare slots would be there.

    Here's an example: take a program that is intended to read 5 sensors. Each sensor has it's own cog with code that updates a hub variable as fast as that sensor allows (100Hz, 500Hz, 1000Hz, ...). Then we dedicate a cog to taking "snapshots" of those values in the hub, maybe doing some math, and putting the result somewhere else. Finally we have a business logic cog to handle everything else (UI, recording, etc.).

    The problem comes when you test your code, and, due to the way the cogs are started, your sensor snapshot cog can run at 200Hz. So that's what you build your system around. Then one day you refactor some and discover that it only runs at 100Hz. Unbeknownst to you, the business logic cog now has extra cycles to enjoy, but the one that you really care about is now running at 1/2 speed. I suspect that this would be a challenge to figure out.

    Is this a valid use case?
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-14 12:08
    My point is that hungry cogs would only ever be guaranteed their normal slot, and would degrade gracefully.

    I seriously don't see what code would reasonably require additional bandwidth and break if it was not available - i can see the business logic running somewhat slower, lines being drawn slower etc, but not apps simply not working - that would mean very poor design, which would likely mean other bugs not related to spare slots would be there.


    I agree with Bill on this. I really think the functionality should be made available for the P2.

    I believe Chip stated earlier that it is simple to implement so it seems the only issues are philosophical.

    If we need to place dire warnings on the documentation and add warnings to the dev tools that is fine with me.

    C.W.
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-14 12:20
    SRLM wrote: »
    Here's an example: take a program that is intended to read 5 sensors. Each sensor has it's own cog with code that updates a hub variable as fast as that sensor allows (100Hz, 500Hz, 1000Hz, ...). Then we dedicate a cog to taking "snapshots" of those values in the hub, maybe doing some math, and putting the result somewhere else. Finally we have a business logic cog to handle everything else (UI, recording, etc.).

    The problem comes when you test your code, and, due to the way the cogs are started, your sensor snapshot cog can run at 200Hz. So that's what you build your system around. Then one day you refactor some and discover that it only runs at 100Hz. Unbeknownst to you, the business logic cog now has extra cycles to enjoy, but the one that you really care about is now running at 1/2 speed. I suspect that this would be a challenge to figure out.

    Is this a valid use case?

    I would say it is a valid concern.

    My answer would be that you should not have timing critical code depend on extras slots.

    I believe Chip's initial proposal did allow for pairing COG's so that you could have some control over how slots are shared, so you could possibly get around a case like you mentioned by how spare slots are allocated.

    C.W.

    Edit: Here is Chip's post that laid out a nice plan for this: http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1223254&viewfull=1#post1223254
  • jmgjmg Posts: 15,155
    edited 2014-01-14 12:32
    ctwardell wrote: »
    If we need to place dire warnings on the documentation and add warnings to the dev tools that is fine with me.

    Correct, the Silicon should never be crippled, just on a whim.

    Safe defaults are certainly a good idea, but removing options from users, on some misguided 'protect them from themselves', will drive away the power designers, and constrain the area under the sales curve.

    A couple of years back, I looked at a 3 core design and was unable to use it at all, simply because the Chip designers forgot to allow me to control resource allocation. All that design effort missed an application, for the want of simple user-choice.
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-14 12:32
    FWIW, cog start order matters due to the DAC pin changes. Each COG gets a bank of pins assigned to it for full speed, every clock update type DAC functions. Any COG can operate any DAC pin, just on an instruction basis, not clocked.

    So we will be ordering COGS regularly on P2.
  • David BetzDavid Betz Posts: 14,514
    edited 2014-01-14 12:42
    jmg wrote: »
    Safe defaults are certainly a good idea, but removing options from users, on some misguided 'protect them from themselves', will drive away the power designers, and constrain the area under the sales curve.
    I'm not against slot sharing but couldn't you use this same argument to say that interrupts should be supported on the Propeller because they are useful in the hands of "power designers"? Why should power designers be constrained to follow a no-interrupts model?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 12:52
    SRLM wrote: »
    Here's an example: take a program that is intended to read 5 sensors. Each sensor has it's own cog with code that updates a hub variable as fast as that sensor allows (100Hz, 500Hz, 1000Hz, ...). Then we dedicate a cog to taking "snapshots" of those values in the hub, maybe doing some math, and putting the result somewhere else. Finally we have a business logic cog to handle everything else (UI, recording, etc.).

    The problem comes when you test your code, and, due to the way the cogs are started, your sensor snapshot cog can run at 200Hz. So that's what you build your system around. Then one day you refactor some and discover that it only runs at 100Hz. Unbeknownst to you, the business logic cog now has extra cycles to enjoy, but the one that you really care about is now running at 1/2 speed. I suspect that this would be a challenge to figure out.

    Is this a valid use case?

    Sorry, this is not a valid case against slot sharing - your example would use less than 0.1% of the hub bandwidth and could not possibly be slowed down by cog re-ordering.

    Please find explanation/analysis below:

    I would assume that the sensor cogs and snapshot cog would not depend on running in hungry mode, as you want determinism for them. If they are not "hungry" refactoring could not halve the speed.

    An interesting calculation:

    - 200Mhz cog can bit-bang to an SPI slave at 50Mbps
    - say 24 bits, /CS overhead etc., 50 "bit periods" per sample ==> 1Msps
    - At 200Mhz, each cog has 25 hub slots per microsecond, so 1Msps update to hub leaves 24 free slots
    - 5 cogs updating at 1Msps leaves 96% of their hub slots unused

    If the cogs write the sample into five consecutive longs

    - snapshot cog needs only 1M oct reads per second (4% of its slots) to read the five values 1M times per second
    - say it is doing a running average of 10 samples from each sensor, it would need 100K oct writes, or 0.4% of its hub bandwidth

    Conclusions:

    - my 1Msps example above is about three orders of magnitude worse than your example
    - Re-ordering the launcing of the cogs could not affect the timing
    - at 1Msps five cogs could be averaged by a sixth cog to 100Ksps
    - those six cogs would leave approximately 93.7% of their hub slots unused
    - the business logic cog could have hub access almost 7 cycles out of 8
    - a hungry business logic cog could not possibly effect the sampling/averaging
    - non-hungry cogs are about 1000x faster than needed for your example
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 12:58
    P2 effectively has interrupts, via several mechanisms:

    1) "interrupt cog", 4 tasks, each waiting on a pin, extremely low latency (~5-10 clocks)

    2) "interrupt cog", essentially unlimited interrupts based on "branch on pin", variable latency (approx 6*Ninterrupts/2 on average)

    3) "interrupt cog", four tasks, each with small number of interrupts based on "branch on pin", medium latency

    4) "interrupt task", given percentage (1/16 ... 15/16) of the cycles in a cog for an interrupt task behaving like 1/2/3 above, configurable latency

    Matter of fact, these are superior to conventional interrupts as the "interrupt service routine" would run in parallel with other code, instead of pre-empting.

    Also slot sharing is different, no one is forcing any one to use it, and it increases performance vs decreasing it.
    David Betz wrote: »
    I'm not against slot sharing but couldn't you use this same argument to say that interrupts should be supported on the Propeller because they are useful in the hands of "power designers"? Why should power designers be constrained to follow a no-interrupts model?
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-14 12:59
    Safe defaults are certainly a good idea, but removing options from users, on some misguided 'protect them from themselves', will drive away the power designers, and constrain the area under the sales curve.
    I'm not against slot sharing but couldn't you use this same argument to say that interrupts should be supported on the Propeller because they are useful in the hands of "power designers"? Why should power designers be constrained to follow a no-interrupts model?

    Precisely.

    This is more than just removing safe defaults. It affects the basic nature of the Propeller. And we've not yet heard whether making what is not a decision or high complexity thing in HUB access into both really means either.

    Really, I see the primary and very easily abused case being "the super COG" where the other COGS need to be idle, or not doing much to make sense. In that way, we really don't have a gain, just a trade-off. And there are costs associated with that too.

    Deffo not a freebie as often claimed.
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-14 13:01
    P2 effectively has interrupts

    No it doesn't. Seriously.

    In any case, if this is done, I really do want it both passive and modal per COG. That way, when people hose it all up, we can very easily say, "turn it off and do it right", nicely compartmentalizing the problem from a support standpoint.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 13:05
    Sorry, we fundementally disagree.

    As the hungry cog could not take cycles forcibly from other cogs, just use otherwise unused cycles, your argument is flawed. The super cog could not take slots away from the other cogs.

    Any compiled code (C, Spin, other VM's) would benefit greatly, so it is a "freebie"
    potatohead wrote: »
    Precisely.

    This is more than just removing safe defaults. It affects the basic nature of the Propeller. And we've not yet heard whether making what is not a decision or high complexity thing in HUB access into both really means either.

    Really, I see the primary and very easily abused case being "the super COG" where the other COGS need to be idle, or not doing much to make sense. In that way, we really don't have a gain, just a trade-off. And there are costs associated with that too.

    Deffo not a freebie as often claimed.
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-14 13:08
    potatohead wrote: »
    Really, I see the primary and very easily abused case being "the super COG" where the other COGS need to be idle, or not doing much to make sense.

    I'm not sure I would call that abuse.

    There are a lot of cases where cogs are used as peripherals that don't need much in the way of hub access, so why not let cogs that could benefit from that access make use of those access slots?

    A prime example would be the 'interrupt service' cogs that Bill just mentioned, a cog that is busy watching some pin or set of pins isn't going to need many hub access cycles.

    C.W.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 13:10
    The way I look at it, with the examples I posted, it is effectively interrupts - actually better than conventional interrupts.

    It will be a good marketing strategy.

    An interrupt is at its most fundamental an async response to an event, internal or external.

    The sample I posted would use a cog or a task to scan for events and respond to them. Using the counters, it could even get rising/falling edges, and some logic functions.

    We could argue that it is more like an event loop, but hardware interrupts are also effectively an event loop that diverts from normal code execution.

    An interrupt cog/task would have the benefit of letting the main line code run undisturbed.

    If it walks like a duck, acts like a duck, quacks like a duck, for all intents and purposes, it is a duck.
    potatohead wrote: »
    No it doesn't. Seriously.

    In any case, if this is done, I really do want it both passive and modal per COG. That way, when people hose it all up, we can very easily say, "turn it off and do it right", nicely compartmentalizing the problem from a support standpoint.

    By default, "hungry" mode would not be enabled.

    As hungry could only use slots that otherwise would go wasted, I do not understand your objections.
  • David BetzDavid Betz Posts: 14,514
    edited 2014-01-14 13:13
    This discussion has gotten away from me and I've lost track of what is known about slot sharing. Has Chip said it would be easy and low risk to implement slot sharing? I think that should be the first consideration. If there is either a schedule or reliablity risk associated with it it should probably not be considered. Beyond that, it seems to me it would be a nice feature as long as, as Bill keeps saying, it isn't possible for one COG to "steal" another COG's slot. However, does this slot sharing increase an already critical path? I guess only Chip can say for sure. Anyway, I sure wouldn't want this to hold up P2 or reduce the chances of it working on the next fab run.
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-14 13:20
    potatohead wrote: »
    "turn it off and do it right"

    Can you expand on how using spare slots is "wrong"?

    I assume you mean that "real" engineers can parallelize the task to use multiple cogs instead of speeding up one cog. The issue with that is that often those parallel tasks need to share data and that get's right back to the hub being a bottleneck.

    C.W.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 13:23
    As I recall:

    - A simple "let a cog use spare cycles" was said to be easy; the complications arose when attempts were made in making it prioritizable by specific cogs, cog pairings etc.

    - The case of more than one cog wanting a "spare" cycles in the same cycle was not bad to resolve in a "round robbin" scheme, but prioritizing was a pain.

    At some point, when he comes up for air, Chip will let us know what is easy and what is not.

    I think all cogs should default to their current 1/8 behavior, but be able to switch into a mode capable of utilizing spare slots if any are available, but never "steal" slots.

    My motivation is simple - I want C / Spin / VM code to be as fast as possible, and as I showed in an earlier example, this would lead to a 2x+ increase in speed for code that uses hub based code, data and/or stack.
    David Betz wrote: »
    This discussion has gotten away from me and I've lost track of what is known about slot sharing. Has Chip said it would be easy and low risk to implement slot sharing? I think that should be the first consideration. If there is either a schedule or reliablity risk associated with it it should probably not be considered. Beyond that, it seems to me it would be a nice feature as long as, as Bill keeps saying, it isn't possible for one COG to "steal" another COG's slot. However, does this slot sharing increase an already critical path? I guess only Chip can say for sure. Anyway, I sure wouldn't want this to hold up P2 or reduce the chances of it working on the next fab run.
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-14 13:28
    I never said the super cog would take cycles. I do say it will only be super in cases where the other cogs are not really performing.

    This is all about expectations anyway. If there is the expectation that a COG can go really fast relative to the others, then the whole "others need to run slow or not be really used" gets lost in there, as it has for the prime cases presented so far.

    "effective" interrupts? Sure. I think we all understand that. I would not market as such. If we don't have interrupts, then we don't. From there we talk about what we do have and why it matters; otherwise, it's looking like we did a lot of work because we really should have interrupts, which are easy, standard, etc...

    Competition would have a field day with that. I sure do, when mine makes that kind of gaffe. :)

    Anyway, as I already mentioned, I would never advocate we do it this way. I think it is a generally bad idea.

    If it is to be done, then I would prefer it be passive and modal per COG, so the right expectations can be associated and set along with the mode bits, nicely compartmentalizing this whole matter.
  • David BetzDavid Betz Posts: 14,514
    edited 2014-01-14 13:40
    As I recall:

    - A simple "let a cog use spare cycles" was said to be easy; the complications arose when attempts were made in making it prioritizable by specific cogs, cog pairings etc.

    - The case of more than one cog wanting a "spare" cycles in the same cycle was not bad to resolve in a "round robbin" scheme, but prioritizing was a pain.

    At some point, when he comes up for air, Chip will let us know what is easy and what is not.

    I think all cogs should default to their current 1/8 behavior, but be able to switch into a mode capable of utilizing spare slots if any are available, but never "steal" slots.

    My motivation is simple - I want C / Spin / VM code to be as fast as possible, and as I showed in an earlier example, this would lead to a 2x+ increase in speed for code that uses hub based code, data and/or stack.
    Thanks for bringing me up to speed. It will be interesting to hear what Chip has to say about this.
  • SeairthSeairth Posts: 2,474
    edited 2014-01-14 13:40
    The way I look at it, with the examples I posted, it is effectively interrupts - actually better than conventional interrupts.

    One minor disagreement. Typical "interrupt" sense hardware is asynchronous and continues to detect events even when there's an active interrupt. That's much more difficult to do with a single cog, even when using tasks. Event detection (since we're never really interrupting anything) on a cog is always synchronous.
  • jazzedjazzed Posts: 11,803
    edited 2014-01-14 13:42
    This sucker needs to lock down fast so that people who are actually doing compiler work have the chance to finish before the next shuttle goes. Missing the shuttle run (March/April?) would be a disaster. January is half over!

    I guess the good thing about having a thread with more than 4,500 posts is that most people won't read the whole thing and a summary page of features will be greatly appreciated. And when it's all said and done, the features that didn't make it won't be in the data sheet. There will be plenty of room for "I told you so" later if that is necessary.
Sign In or Register to comment.