Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 109 — Parallax Forums

Propeller II update - BLOG

1106107109111112223

Comments

  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 13:33
    Cars are shipped to end users speed limited for precisely the same reasons we are discussing here.
  • jmgjmg Posts: 15,182
    edited 2013-12-01 13:34
    ctwardell wrote: »
    The case I am calling 'promiscuous' are the slots yielded to 'any' instead of a specific COG.

    So you mean free slots ? Those are not deterministic, but are a form of group lottery.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 13:36
    The multi-tasking isn't the same type of comparison.

    The COG is the reuse unit. If people find multi-tasking too complex, they just do, and they've got options. One is to use the JMPTASK method, another is to distribute the problem across COGS, etc...

    Point is, no matter what somebody actually does with a COG, it can be dropped in and run on any COG, multi-tasking or not. When we make assumptions about other COGS, those will get maximized, leaving scenarios where a COG is not a COG, and thus this discussion.

    This whole thing is a tragedy of the commons problem. The expectation is that people will do the right thing. The reality is the vast majority of them will maximize it for them, ignoring others, with Cluso's last post case in point.

    And that's not a personal negative. He gets to do that, and it's OK. But it does speak to the problems we face treating HUB cycles as a commons, not per COG as P1 does. P1 does not invoke a commons, so there is no tragedy.

    Every single one of these schemes so far invokes a commons, and the least of them --pairing COGS, still has implications, the primary one being a 4 COG chip, with users then lamenting the "wasted" COGS...
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 13:36
    jmg wrote: »
    They cannot simply "make the assumption", they need a consenting pair (or more), to grant them the extra cycles.

    Where this gets murky, is what if the consenting pair wants some rare hub access ?
    If they use their slot, they risk tripping the hungry cog, but could do this on a SW permission handshake.

    Or can it rely on the group lottery of 'unused slots' to meet that, or should it have another deterministic moderate bandwidth choice ?

    I made an edit to this to refer to the 9th slot.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-01 13:39
    Personally, I'd avoid the 9th slot and keep it to the regular 8.

    with the four modes I proposed, your example cannot happen

    mode 00 - exactly the same as P1, except unused slots may be used by others

    mode 01/10 - high bandwidth pair, only guaranteed the slots of the pair (but mode 11 cog can use left over scraps)

    mode 11 - low bandwidth cog, only eats scrap slots from 00/01/10

    This totally avoids the case being worried about, and cannot starve except for deliberately built degenerate case.

    Actually, I am pretty sure that mode 11 cannot be starved at all.

    Consider the worst case scenario:

    - three paired-cog high bandwidth drivers (there will be some scraps during sync periods if video drivers)
    - one mode 00 regular cog - there will be spare slots unless deliberately written to consume all slots, remember it can't get spare slots
    - one mode 11 cog - will still get at least 1/8 slots, but can use spare slots... after all, it gives up its slot, usually to itself

    potatohead wrote: »
    So I've got two objects that I really need, and they both make the assumption that the "extra" 9th slow cycles are there. What happens then?
  • ctwardellctwardell Posts: 1,716
    edited 2013-12-01 13:41
    jmg wrote: »
    Where this gets murky, is what if the consenting pair wants some rare hub access ?
    If they use their slot, they risk tripping the hungry cog, but could do this on a SW permission handshake.

    Or can it rely on the group lottery of 'unused slots' to meet that, or should it have another deterministic moderate bandwidth choice ?

    Chip's suggested solution in #3156 covers all the cases that I can see being useful.

    My post #3179 lays out using Chip's solution for just the case you mention. The 'donor' of the pair cannot stall the 'recipient', it only get hub slots the recipient doesn't use.

    C.W.
  • jmgjmg Posts: 15,182
    edited 2013-12-01 13:44
    potatohead wrote: »
    So I've got two objects that I really need, and they both make the assumption that the "extra" 9th slow cycles are there. What happens then?

    New question reply:
    Nothing 'happens', they both get a slot, as expected, independent of the other.

    As mentioned, the BW on this is not fSYS/9, but is fSYS/72, the 9th slot is round-robin allocated, so appears every 72 clks, like clockwork.

    This moderate bandwidth path is 100% deterministic, so is safe, but it also gives another more hungry COG a 100% reliable BW boost.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 13:46
    So, I incorporate some hungry cogs, more than one making the assumption of the added throughput and my table scraps COG isn't working, and or the hungry ones don't perform.

    What then?

    Or, it so happens that I need a couple table scraps type COGS... and there aren't enough scraps...
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-01 13:50
    To simplify where we are up to...
    1. Each cog can gift or yield its slot to another specific cog.
      • Gift = use it if I don't need it
      • Yield = use it if you need it, else I will use it
    2. Each recipient cog of a gift or yield can use the extras cog slot if it needs it.
    3. Any cog can optionally use any spare cog slots (with no guarantees of performance increase)
    Parallax are targeting the commercial users. Many of them may take a few Obex drivers. But they will be quite happy to modify any if required. They will want to do what ever it takes to get their project working. If that means that they need extra slots, they will work out what ever is required to use them. They will handle whatever complexity is demanded. This is just how it is in the real world. I do not see any losses of design ins because of the cog slot mechanism, but I do see losses of design ins without it. If a cog cannot perform a job without the extra slots, it is never going to work without the option to use them, no matter how simple or complex - if its not an option then its not an option!
  • jmgjmg Posts: 15,182
    edited 2013-12-01 13:52
    Consider the worst case scenario:
    - three paired-cog high bandwidth drivers (there will be some scraps during sync periods if video drivers)
    True, but in time terms, that can be many microseconds.
    - one mode 00 regular cog - there will be spare slots unless deliberately written to consume all slots, remember it can't get spare slots
    - one mode 11 cog - will still get at least 1/8 slots, but can use spare slots... after all, it gives up its slot, usually to itself

    If a 11 cog gets get at least 1/8 slots, then it has not yielded anything, and is not what I'd really call a low-bandwidth slot.

    My issue coverage is around the lesser of the mode 01/10 COGS, and gives a means where the hungry COG gets all it needs, whilst the COG that is giving up slots, does not give up everything, but still has reliable fSYS/72 access.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-01 13:57
    mode 00:
    - cannot use scraps, but does provide them when it does not use its own guaranteed slot
    - behaves exactly the same as prop1, but leaves scraps

    mode 01/10 pair - can produce scraps, cannot eat them
    - this is the way to get high bandwidth for cogs that need them
    - can still leave scraps

    mode 11 - can only eat scraps (including its own)
    - it actually de-generates to always getting its own slot, plus scraps, as only another mode 11 can get its slots
    - this is the case that worries you, BUT, if there is more than one, it they share leftovers
    - theoretically this could feed its slot to pairs, but not as described

    Anyone writing drivers that need more than 1/8 hub bandwidth can get it guaranteed from pairs.

    mode 11 is more for leftovers, does NOT guarantee ability to eat scraps.
    potatohead wrote: »
    So, I incorporate some hungry cogs, more than one making the assumption of the added throughput and my table scraps COG isn't working, and or the hungry ones don't perform.

    What then?

    Or, it so happens that I need a couple table scraps type COGS... and there aren't enough scraps...
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 13:57
    Well, OK. Have at it.

    I am strongly opposed. Reasons given.

    Well, there is this. You guys are working really hard to open this door. If it's "just an option" that "people won't really need or depend on", why do that work, unless it will end up needed and depended on?

    See how that works?


    Tragedy of the commons in the classic sense. It won't end well. If I'm wrong, wonderful problem to have. Bring it. But, if I'm right? Not such a good problem to have.

    And those commercial users will come, but not before the money making education / adopters from our current user base. Let's say the commercial users just don't seem to come, leaving the existing user base and various adopters? A best case risk profile would maximize for the known case, leaving the unknowns to some future development.
  • Erik FriesenErik Friesen Posts: 1,071
    edited 2013-12-01 13:58
    Two bits here.

    Be careful about saying that we shouldn't do this, and other extras. If it is easy, has little time implications, and makes sense to the designer, why not?

    Everything we don't understand seems complex.

    With simplicity comes complexity.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-01 13:58
    Please, please, forget the 9th slot idea. It is just bad news! Now divide you clkfreq/9. All cogs are now slightly slower than they were. Really bad idea!
  • jmgjmg Posts: 15,182
    edited 2013-12-01 14:00
    potatohead wrote: »
    So, I incorporate some hungry cogs, more than one making the assumption of the added throughput and my table scraps COG isn't working, and or the hungry ones don't perform.

    What then?

    Or, it so happens that I need a couple table scraps type COGS... and there aren't enough scraps...

    I'm not following - a pencil and paper can easily check allocations, with a 9th Round Robin slot.

    my table scraps COG isn't working ?
    There is always a fixed moderate bandwidth access, so a table scraps COG will work, and stay working, no matter what other balls are in the air.

    the hungry ones don't perform
    You have a defined bonus bandwidth, 100% deterministic, if that is still not enough, you need to respin the pencil.
    Because the moderate bandwidth COG has a separate pathway, there is no need to ever impact the hungry one, or any other COG. No SW handling is needed to share the gifted slot.

    Things are predictably encapsulated.
  • Ken GraceyKen Gracey Posts: 7,400
    edited 2013-12-01 14:01
    Heater. wrote: »

    OK. I'll do that if no one else want's to.

    I decide there shall be no changes to the round robin HUB access timing.

    Further, I decide all other changes should be kept to the bare essentials and the chip shipped out as soon as possible.

    How am I doing? :)

    You're doing very well! I can't begin to describe the importance of closing this project and making a successful foundry run.

    Ken Gracey
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-01 14:04
    I got confused between different versions of my modes.

    Initially I envisioned 01/10 taking each others slots, and scraps, which meant that 11 would only eat scraps sometimes

    Later, due to obex potential issues, I changed 01/10 to only take slots from each other, potentially leaving scraps for mode 11 cogs.

    Due to that change, mode 11 turned into own slot + scraps, as by restricting 01/10 to only take the pairs slot, it was impossible for 11 to give up its own slot.
    jmg wrote: »
    True, but in time terms, that can be many microseconds.

    If a 11 cog gets get at least 1/8 slots, then it has not yielded anything, and is not what I'd really call a low-bandwidth slot.

    My issue coverage is around the lesser of the mode 01/10 COGS, and gives a means where the hungry COG gets all it needs, whilst the COG that is giving up slots, does not give up everything, but still has reliable fSYS/72 access.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 14:04
    @JMG

    Yeah, if everybody does the right thing, the resource is there and working. If it's maximized, then it's not, and maximizing it is absolutely going to happen.

    Sure, there is always some, "if people would just..." common sense case, but that isn't how it plays out. Tragedy of the commons. The solution to that, BTW, is to regulate the commons so that overuse cannot happen; and that's the secret sauce in the P1.

    We are best served by fixing up the last of the instruction issues, mix in the SERDES, and go. This discussion really should be P3, not P2.
  • ctwardellctwardell Posts: 1,716
    edited 2013-12-01 14:06
    I just can't see how a user, be they a hobbyist, engineer, or nobel laureate is assumed to be able to handle all of the complexities already found in the P2, yet somehow their head will explode if exposed to hub slot re-allocation voodoo...

    C.W.
  • jmgjmg Posts: 15,182
    edited 2013-12-01 14:08
    Cluso99 wrote: »
    Please, please, forget the 9th slot idea. It is just bad news! Now divide you clkfreq/9. All cogs are now slightly slower than they were. Really bad idea!

    The impact is 12.5% on HUB access only, not on COG clk speed, and only when in this state
    The jump from FPGA to Siicon, is far greater than 12.5%

    The upside is All cogs are NOT now slightly slower than they were, some can be boosted 2x (or more) faster.

    Both moderate and boosted bandwidths are deterministic, so pass the Prop test.
  • User NameUser Name Posts: 1,451
    edited 2013-12-01 14:09
    I can't believe we were SO close to breaking out of the old round-robin paradigm. In a million years I wouldn't have imagined that it would be concerns for OBEX, beginners, and Homer Simpson that shot it down!!!

    I'm not kidding when I say that it will take a while to get over this $*%$&%&cluck.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-01 14:09
    I have a P1 commercial design. It uses 3 x P1s.

    The main cog runs Catalina C in XMM mode and has 512KB SRAM, an SD card and a serial port, and is overclocked to 6.5MHz (104MHz).
    The cogs used are:
    1. Catalina C XMM
    2. SD Driver
    3. Clock
    4. FullDuplexSerial (modified), 115200 baud
    This P1 is overclocked to 6.5MHz (104MHz).

    I have 4 totally unused cogs. With the "new" abilities, I would assign the 1. Catalina cog the ability to use all available slots. This would result in a big increase in response time to the user when we perform some complex file processing routines.

    Why should I be deprived of this possibility in the P2 because some of you don't like it?
    Perhaps it could mean the difference between a saleable product or not, or using another processor. Maybe if I had to consider this, I might just decide to hell with it and use an ARM for the whole job - a possible loss of 3 props???
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 14:11
    It's all about a COG is a COG. That is the core reuse case right now. No matter what, a COG will run as expected.

    If we treat the hub cycles as a commons that COGS compete for, or one where COGS can exceed their allocation schedules, THEY WILL, which limits the reuse of the COGS. They won't run as expected.
  • Heater.Heater. Posts: 21,230
    edited 2013-12-01 14:11
    And to think. All I ever wanted from a PII originally was an order of magnitude faster execution speed and 256K RAM and 64 pins....
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-01 14:12
    potatohead wrote: »
    We are best served by fixing up the last of the instruction issues, mix in the SERDES, and go. This discussion really should be P3, not P2.

    I agree, with one addition:

    - what will be done with the 20% of die space freed up by getting rid of the video ring?

    At one point Chip suggested using the extra transistors for AUX memory.

    That makes perfect sense to me - it would allow for significantly bigger stacks, which is great for LMM, and is a low-risk way of using the transistors.

    The alternatives would be more hub ram (nice, but may not be a convenient size), and more cog ram - which is WAY TOO BIG a can of worms to open.

    Due to the process change, Beau will be busy changing transistors, so discussion - perhaps oriented to the P3 - is not a waste of time.

    Concentrating on SERDES, USB/CRC helper instructions etc is more important - and lower risk.
  • ctwardellctwardell Posts: 1,716
    edited 2013-12-01 14:17
    potatohead wrote: »
    It's all about a COG is a COG. That is the core reuse case right now. No matter what, a COG will run as expected.

    If we treat the hub cycles as a commons that COGS compete for, or one where COGS can exceed their allocation schedules, THEY WILL, which limits the reuse of the COGS. They won't run as expected.

    What if we got rid of the 'any slot' cases and went with the pairs. A hub sharing pair cannot benefit or hinder any other COGS.

    I assume you are fine with OBEX items that use multiple COGS, why not let this be a case of using multiple COGS?

    C.W.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-01 14:18
    Final solution...

    Chip: Implement it. Don't tell anyone (a few of us already know about it ;) ).
    Give me a pnut compiler version with the extra instruction(s) enabled, and I will pass it on to the others of us who want it.

    OBEX: Ban the use of the instruction(s).

    Problem solved, lets move on.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 14:21
    Re: Why?

    Great question!

    Cluso, you said you would always push it, right? So you've got your advocacy out there. I encourage you strongly to continue it, as others will in like kind. Same goes for UserName, who this actually was written for. I confused the two of you. Sorry for that.

    When it's all said and done, we have a decision, and that's your why right there. I wrote earlier, it's not about us, or you, or any of us specifically.

    Open means open. I'm gonna say my piece as will others. Closed is closed. Then we just live with what ships. Do it yourself is the last option, and that one is painful.
  • jmgjmg Posts: 15,182
    edited 2013-12-01 14:23
    potatohead wrote: »
    It's all about a COG is a COG. That is the core reuse case right now. No matter what, a COG will run as expected.

    If we treat the hub cycles as a commons that COGS compete for, or one where COGS can exceed their allocation schedules, THEY WILL, which limits the reuse of the COGS. They won't run as expected.

    I'm not quite seeing the issue. I focus on Bandwidth and Chips, not COGs

    There are 8 COGs in one chip, users are quite entitled to pair them, and have one run in moderate bandwidth and one in boosted bandwidth.
    The other 6 cogs can play as they like. Another pair can choose to have one run in moderate bandwidth and one in boosted bandwidth, or they can all run at average bandwidth.
    Or a user can have 3 COGS in moderate bandwidth mode, and give 3 more slots to the Boosted one, if it can use them.

    What one does want, is deterministic levels of moderate bandwidth, average bandwidth. & boosted bandwidth.
    It is that which makes COG code portable.
    Users always buy 8 Cogs at a time, so they can Pair if they want.
  • potatoheadpotatohead Posts: 10,261
    edited 2013-12-01 14:24
    Re: Pairs

    I think that gets us to a 4 COG chip. We use multi-cog code now, and I think Chip's initial observation that it should be a private affair between cooperating objects speaks to this one.

    I'm on the fence about it frankly.
Sign In or Register to comment.