Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 154 — Parallax Forums

Propeller II update - BLOG

1151152154156157223

Comments

  • Cluso99Cluso99 Posts: 18,066
    edited 2014-01-13 17:40
    cgracey wrote: »
    That's right.

    I just got it working!

    Whenever the hub cycle comes around and there is no hub instruction currently executing, and we are executing from the hub, and we are not multitasking, that unused hub cycle is used to pre-fetch the next cache line, relative to where we are currently executing. This means that straight-line code that doesn't do hub operations will run at 100% speed. When a branch occurs to a location that is out-of-cache, it takes 4..11 clocks to get the cache reloaded, before execution resumes at the new address. And this works entirely within the cog - it doesn't use other cogs' spare cycles, so code will always run at the same speed.

    There is a minor thing I want to enhance about the way it's working, but it's looking very good and is not complicated, after all. Whew!
    This is absolutely Brilliant Chip. Looking forward to trying the code.

    Might it be worth adding an instruction that could force a prefetch ??? We would use this a few instructions before a branch to a non-cached instruction. A 1 clock instruction to force a prefetch would at least save 3 clocks of the 4-11 and we may even be able to save a hub window as well using specifically tuned code.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-13 18:01
    Cluso99 wrote: »
    Might it be worth adding an instruction that could force a prefetch ??? We would use this a few instructions before a branch to a non-cached instruction. A 1 clock instruction to force a prefetch would at least save 3 clocks of the 4-11 and we may even be able to save a hub window as well using specifically tuned code.


    I'll have to think about how to do that, because it doesn't seem so simple at the moment. It should be possible. I agree that for timed code, you need it to run at the right speed even on the first pass, so it must be pre-cached.
  • jazzedjazzed Posts: 11,803
    edited 2014-01-13 18:02
    Congrats!
    cgracey wrote: »
    That's right.

    I just got it working!

    Whenever the hub cycle comes around and there is no hub instruction currently executing, and we are executing from the hub, and we are not multitasking, that unused hub cycle is used to pre-fetch the next cache line, relative to where we are currently executing. This means that straight-line code that doesn't do hub operations will run at 100% speed. When a branch occurs to a location that is out-of-cache, it takes 4..11 clocks to get the cache reloaded, before execution resumes at the new address. And this works entirely within the cog - it doesn't use other cogs' spare cycles, so code will always run at the same speed.

    There is a minor thing I want to enhance about the way it's working, but it's looking very good and is not complicated, after all. Whew!
  • David BetzDavid Betz Posts: 14,511
    edited 2014-01-13 18:25
    cgracey wrote: »
    That's right.

    I just got it working!

    Whenever the hub cycle comes around and there is no hub instruction currently executing, and we are executing from the hub, and we are not multitasking, that unused hub cycle is used to pre-fetch the next cache line, relative to where we are currently executing. This means that straight-line code that doesn't do hub operations will run at 100% speed. When a branch occurs to a location that is out-of-cache, it takes 4..11 clocks to get the cache reloaded, before execution resumes at the new address. And this works entirely within the cog - it doesn't use other cogs' spare cycles, so code will always run at the same speed.

    There is a minor thing I want to enhance about the way it's working, but it's looking very good and is not complicated, after all. Whew!

    Great news! Congratulations!!
  • SeairthSeairth Posts: 2,474
    edited 2014-01-13 18:37
    cgracey wrote: »
    I'll have to think about how to do that, because it doesn't seem so simple at the moment. It should be possible. I agree that for timed code, you need it to run at the right speed even on the first pass, so it must be pre-cached.

    If you were using a delayed jump, would you still encounter the 4-11 clock delay?
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-13 18:37
    Exciting times ahead! :)
    Well done Chip.
  • roglohrogloh Posts: 5,122
    edited 2014-01-13 19:03
    This is all wonderful news about hub execution.

    As Cluso pointed out it can be useful to be able to preload cache lines, and similarly it might also be required be able to invalidate them from software at times. Think about the case when you have dynamically loaded programs into hub RAM from some external storage such as an SD card and now need to jump to a new instruction at a previously cached hub address in the COG which held old code. If there is no way to invalidate you may execute the old code that used this address instead of the new code. I guess we could restart the COG which will hopefully start out with a clean cache but that approach seems a bit brutal to me as you could lose important state. Be rather nice to have a software instruction to invalidate (maybe it's just a variant of the preload instruction or one preloaded with an invalid hub address, eg 0 which could never match because it is a COG address).

    Roger.
  • jmgjmg Posts: 15,140
    edited 2014-01-13 19:16
    cgracey wrote: »
    ...

    Whenever the hub cycle comes around and there is no hub instruction currently executing, and we are executing from the hub, and we are not multitasking, that unused hub cycle is used to pre-fetch the next cache line, relative to where we are currently executing. This means that straight-line code that doesn't do hub operations will run at 100% speed. When a branch occurs to a location that is out-of-cache, it takes 4..11 clocks to get the cache reloaded, before execution resumes at the new address....

    This has me thinking again about Simulators, and this looks promising
    http://en.wikipedia.org/wiki/Verilator

    The more complicated things like cache handlers get, the harder full precision debug is going to be

    Has anyone used Verilator, to have a handle on what speed simulators this can build ?

    The large appeal of this pathway, is it derives from the Core verilog, and so avoids the creep issues with a separate SW coded simulator. With chips (eventually) able to run full speed code, a slower but accurate Simulator would still have a strong place in the tool-box & training materials.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-13 19:42
    jmg wrote: »
    This has me thinking again about Simulators, and this looks promising
    http://en.wikipedia.org/wiki/Verilator

    The more complicated things like cache handlers get, the harder full precision debug is going to be

    Has anyone used Verilator, to have a handle on what speed simulators this can build ?

    The large appeal of this pathway, is it derives from the Core verilog, and so avoids the creep issues with a separate SW coded simulator. With chips (eventually) able to run full speed code, a slower but accurate Simulator would still have a strong place in the tool-box & training materials.


    I agree that such a simulator is the only sane way to make something faithful - especially when things keep changing.

    I've been using SETRACE to view what the cog is doing during caching. It's helped quite a bit. I added the hub cycle signal, as well, so I can see code in relation to hub timing.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-13 19:44
    rogloh wrote: »
    This is all wonderful news about hub execution.

    As Cluso pointed out it can be useful to be able to preload cache lines, and similarly it might also be required be able to invalidate them from software at times. Think about the case when you have dynamically loaded programs into hub RAM from some external storage such as an SD card and now need to jump to a new instruction at a previously cached hub address in the COG which held old code. If there is no way to invalidate you may execute the old code that used this address instead of the new code. I guess we could restart the COG which will hopefully start out with a clean cache but that approach seems a bit brutal to me as you could lose important state. Be rather nice to have a software instruction to invalidate (maybe it's just a variant of the preload instruction or one preloaded with an invalid hub address, eg 0 which could never match because it is a COG address).

    Roger.


    I have an ICACHEX instruction planned that invalidates the instruction cache lines. It's really simple to implement. Cache preload seems a lot more difficult.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-13 19:46
    Seairth wrote: »
    If you were using a delayed jump, would you still encounter the 4-11 clock delay?


    Yes, because at that early stage of the pipeline, we don't know if instructions are going to be cancelled, so the caching mechanism must always deliver three instructions trailing branches. A delayed jump would improve efficiency and make every instruction execute before the 4..11 clocks spent on cache reload.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-13 20:07
    Excellent new Chip!

    A few points...

    1) No need for a separate pre-fetch instruction, it already exists

    ins0
    ins1
    ins2
    ins3
    JMPD #start_of_next_8_longs ' forces pre-fetch, in case we don't want to trust the pre-fetch (I WILL TRUST THE PREFETCH)
    ins5
    ins6
    ins7

    start_of_next_8_longs:

    2) Please Please Please

    Allow hubexec to use spare hub cycles from cogs not using their hub cycle!

    Maybe add a mode bit that needs to be set to allow it

    Why?

    It will make a huge speed difference to virtual machines, compiled code, and especially when multiple tasks in a cog use hubexec (or just hub memory instructions)

    I can understand that some people would not want to use the "hungry" mode, but those of us that could use it... would really, really like it. It would greatly benefit Spin and GCC. And all other VM's.

    cgracey wrote: »
    Yes, because at that early stage of the pipeline, we don't know if instructions are going to be cancelled, so the caching mechanism must always deliver three instructions trailing branches. A delayed jump would improve efficiency and make every instruction execute before the 4..11 clocks spent on cache reload.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-13 20:21
    2) Please Please Please

    Allow hubexec to use spare hub cycles from cogs not using their hub cycle!

    Maybe add a mode bit that needs to be set to allow it

    Why?

    It will make a huge speed difference to virtual machines, compiled code, and especially when multiple tasks in a cog use hubexec (or just hub memory instructions)

    I can understand that some people would not want to use the "hungry" mode, but those of us that could use it... would really, really like it. It would greatly benefit Spin and GCC. And all other VM's.

    +1 :)
  • whickerwhicker Posts: 749
    edited 2014-01-13 20:24
    Bill and ozpropdev,

    At what point does this not all fit anymore? I'm getting really worried.
    The hub arbitration logic I fear worst case would be 64 times larger, or at least 7+6+5+4+3+2+1=28 times larger.
  • Cluso99Cluso99 Posts: 18,066
    edited 2014-01-13 20:36
    Excellent new Chip!

    A few points...

    1) No need for a separate pre-fetch instruction, it already exists

    ins0
    ins1
    ins2
    ins3
    JMPD #start_of_next_8_longs ' forces pre-fetch, in case we don't want to trust the pre-fetch (I WILL TRUST THE PREFETCH)
    ins5
    ins6
    ins7

    start_of_next_8_longs:
    Nice indeed.
    At first I thought what if ins5,6 or 7 was another JMP but of course we would have tailored the prefetch ins4 deliberately so there would be no jmps in ins5/6/7.

    Of course it does not cater for the case where we know we may have a jmp shortly, so we want to prefetch the start of the routine now, just in case.

    2) Please Please Please

    Allow hubexec to use spare hub cycles from cogs not using their hub cycle!

    Maybe add a mode bit that needs to be set to allow it

    Why?

    It will make a huge speed difference to virtual machines, compiled code, and especially when multiple tasks in a cog use hubexec (or just hub memory instructions)

    I can understand that some people would not want to use the "hungry" mode, but those of us that could use it... would really, really like it. It would greatly benefit Spin and GCC. And all other VM's.
    EMPHATICALLY +1 (even if it's on the quiet) ;)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-13 20:36
    Sorry, I don't see how the hub arbitration logic would become 28..64 times larger, but Chip would know.

    Right now, as I understand it:

    - a cog essentially waits for its turn at the hub to come around, and when it does, it performs a read or write cycle

    To implement a hungry mode, each cog would need to release if another cog did not use its assigned cycle, and use it instead

    If it takes too much logic, of course it can be skipped.

    If the argument against it is philosophical, then it should be implemented even if it is optional - for the P2/P3/+++ to compete, we cannot afford to throw away (nearly free) performance improvements

    whicker wrote: »
    Bill and ozpropdev,

    At what point does this not all fit anymore? I'm getting really worried.
    The hub arbitration logic I fear worst case would be 64 times larger, or at least 7+6+5+4+3+2+1=28 times larger.
  • whickerwhicker Posts: 749
    edited 2014-01-13 20:53
    Sorry, I don't see how the hub arbitration logic would become 28..64 times larger, but Chip would know.

    Right now, as I understand it:

    - a cog essentially waits for its turn at the hub to come around, and when it does, it performs a read or write cycle

    To implement a hungry mode, each cog would need to release if another cog did not use its assigned cycle, and use it instead

    If it takes too much logic, of course it can be skipped.

    If the argument against it is philosophical, then it should be implemented even if it is optional - for the P2/P3/+++ to compete, we cannot afford to throw away (nearly free) performance improvements
    It's not philosophical. On a particular clock edge, up to 7 other places could be demanding the same resource that the local one does not want. how do you decide who gets it? Then you have to tell each of the requesters if it was gotten or not gotten. Priority by cog number would be the least logic... all the way up to some sort of fairness distribution logic running on a pattern of at least modulo 64 to ensure nothing stalled.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-13 21:14
    whicker wrote: »
    Bill and ozpropdev,

    At what point does this not all fit anymore? I'm getting really worried.
    The hub arbitration logic I fear worst case would be 64 times larger, or at least 7+6+5+4+3+2+1=28 times larger.

    I believe in many applications of the P2 not all the COGs will be used, so why waste these slots.

    Here's a typical code example that would benefit from "hungry mode".
    In a video example consider the common function of setting pixels, a HUB read is required then the data is modified
    then the data is written back to the HUB. In plotting a line this is repeated n times.
        RDLONG ax,bx
        OR ax,mask
        WRLONG ax,bx
    

    The cumulative loss of cycles grows rapidly when multiple pixels/lines are being plotted.
    Any extra slots that can speed such code would make significant performance gains in graphic applications.

    Like Bill stated, if it's too hard or space restrictive then leave it for the next evolution silicon.
  • roglohrogloh Posts: 5,122
    edited 2014-01-13 21:23
    cgracey wrote: »
    I have an ICACHEX instruction planned that invalidates the instruction cache lines. It's really simple to implement.

    Great! I'm sure this will come in very handy for some applications.
  • Cluso99Cluso99 Posts: 18,066
    edited 2014-01-13 21:34
    ozpropdev wrote: »
    I believe in many applications of the P2 not all the COGs will be used, so why waste these slots.

    Here's a typical code example that would benefit from "hungry mode".
    In a video example consider the common function of setting pixels, a HUB read is required then the data is modified
    then the data is written back to the HUB. In plotting a line this is repeated n times.
        RDLONG ax,bx
        OR ax,mask
        WRLONG ax,bx
    

    The cumulative loss of cycles grows rapidly when multiple pixels/lines are being plotted.
    Any extra slots that can speed such code would make significant performance gains in graphic applications.

    Like Bill stated, if it's too hard or space restrictive then leave it for the next evolution silicon.
    Another great use of free slots!

    I cannot fathom why some are so opposed to using "almost free" processing power that they will come up with ridiculous excuses against implementing it. Chip has already stated it's simple, and his method was way more complex than we asked for.

    Hub execution could benefit substantially by being able to (optionally) use free slots.
  • roglohrogloh Posts: 5,122
    edited 2014-01-13 22:05
    ozpropdev wrote: »
    I believe in many applications of the P2 not all the COGs will be used, so why waste these slots.

    Here's a typical code example that would benefit from "hungry mode".
    In a video example consider the common function of setting pixels, a HUB read is required then the data is modified
    then the data is written back to the HUB. In plotting a line this is repeated n times.
        RDLONG ax,bx
        OR ax,mask
        WRLONG ax,bx
    

    The cumulative loss of cycles grows rapidly when multiple pixels/lines are being plotted.
    Any extra slots that can speed such code would make significant performance gains in graphic applications.

    Like Bill stated, if it's too hard or space restrictive then leave it for the next evolution silicon.

    Another example of the possible gain is something like an optimized memory block copy loop (doing block copies and other memory transfers like this are not unusual in C for example)
       REPD count, #2
       SETPTRA, source
       SETPTRB, dest
       NOP
       RDOCTAL temp, PTRA++  ' or whatever the new 8-long read instruction is called now
       WROCTAL temp, PTRB++
    

    On a lightly loaded system where other COGs aren't doing a lot of hub transfers themselves, this type of loop (once cached if hub executed) could theoretically copy each 32 bytes in 4 clocks instead of always taking 16 clocks, that's a 4x speedup in the best case. Be nice if it was doable on P2, but I imagine it could get a bit complicated to marry this with hub execution prefetching and other COGs greedy requests. Something needs priority. Hopefully the silicon is available...
  • potatoheadpotatohead Posts: 10,253
    edited 2014-01-13 22:09
    Well, if you look at one really busy COG working with other COGS that aren't so busy, then sure. Take that same video example and couple it with a few COGS rendering sprites and suddenly there just aren't all that many slots...

    Or, let's say enough of the sprites fall out of the list not drawn creating free slots. Now there are slots, and now that COG can run faster, right?

    Personally, I'm not opposed to it, given it's entirely passive and modal by COG.

    There will be some use cases where A hungry COG will see an improvement. There will be other cases where it really won't, and it all depends on how busy the other COGS are, and whether or not they too are written as hungry.

    In the case of the scan line buffer loop, that is something easily parallelized too. And I would argue it is best done that way as the difference between normal operation and hungry operation might not be that much where parallel processing would deliver very significant and known, consistent gains.

    Choice for the developer, assuming they aren't picking through a lot of hungry COGS. Whether or not that happens depends on whether or not people have the discipline to write ordinary COG code, and or allow graceful failures, or tiers of functionality. Right now, if it runs on a COG, it works. That won't be true going forward, because people will write greedy code, not hungry code. But we shall see on all of that. Maybe those of us thinking along those lines are wrong about it.

    I do share whicker's concern in that something that isn't a decision / logic point right now will become one. How does that affect critical timing paths?

    Wouldn't it be a bummer to get hungry COGS at the expense of peak overall clock speed, for example? Chip will have to tell us this one.

    All in all, the "it goes really fast!" cases really do depend on the other COGS just not doing that.
  • dMajodMajo Posts: 855
    edited 2014-01-14 00:24
    Cluso99 wrote: »
    Nice indeed.
    At first I thought what if ins5,6 or 7 was another JMP but of course we would have tailored the prefetch ins4 deliberately so there would be no jmps in ins5/6/7.

    Currently I know what the offset of every single line of my dat pasm section will be once loaded into the cog's ram but I do not know (at design time) where it will be in to hub ram. There can be many dat sections from many objects that once compiled can be everywhere.
    In other words: how do you know (at design time) that your ins0 will be at offset 0 of a 8 longs aligned block (at runtime, after compiling it and loading to the hub) so that you can place your jmp in ins4? Perhaps a beginner question ... sorry
  • cgraceycgracey Posts: 14,133
    edited 2014-01-14 00:32
    dMajo wrote: »
    Cluso99 wrote: »
    Nice indeed.
    At first I thought what if ins5,6 or 7 was another JMP but of course we would have tailored the prefetch ins4 deliberately so there would be no jmps in ins5/6/7.

    Currently I know what the offset of every single line of my dat pasm section will be once loaded into the cog's ram but I do not know (at design time) where it will be in to hub ram. There can be many dat sections from many objects that once compiled can be everywhere.
    In other words: how do you know (at design time) that your ins0 will be at offset 0 of a 8 longs aligned block (at runtime, after compiling it and loading to the hub) so that you can place your jmp in ins4? Perhaps a beginner question ... sorry


    There is a WIDEALIGN assembler directive that fills $00's up to the next 32nd-byte address in hub memory (%xx_xxxx_xxxx_xxx0_0000).
  • dMajodMajo Posts: 855
    edited 2014-01-14 00:37
    potatohead wrote: »
    Well, if you look at one really busy COG working with other COGS that aren't so busy, then sure. Take that same video example and couple it with a few COGS rendering sprites and suddenly there just aren't all that many slots...

    Or, let's say enough of the sprites fall out of the list not drawn creating free slots. Now there are slots, and now that COG can run faster, right?

    Personally, I'm not opposed to it, given it's entirely passive and modal by COG.

    There will be some use cases where A hungry COG will see an improvement. There will be other cases where it really won't, and it all depends on how busy the other COGS are, and whether or not they too are written as hungry.

    In the case of the scan line buffer loop, that is something easily parallelized too. And I would argue it is best done that way as the difference between normal operation and hungry operation might not be that much where parallel processing would deliver very significant and known, consistent gains.

    Choice for the developer, assuming they aren't picking through a lot of hungry COGS. Whether or not that happens depends on whether or not people have the discipline to write ordinary COG code, and or allow graceful failures, or tiers of functionality. Right now, if it runs on a COG, it works. That won't be true going forward, because people will write greedy code, not hungry code. But we shall see on all of that. Maybe those of us thinking along those lines are wrong about it.

    I do share whicker's concern in that something that isn't a decision / logic point right now will become one. How does that affect critical timing paths?

    Wouldn't it be a bummer to get hungry COGS at the expense of peak overall clock speed, for example? Chip will have to tell us this one.

    All in all, the "it goes really fast!" cases really do depend on the other COGS just not doing that.

    I am with potatohead here.
    I am afraid that at the end the obex will start to fill with objects stating this: look how I am great, how many thinghs I can do and how fast ... but if you want to use me you need to keep switched off at least two cogs because I need their hub slots.

    At the beginning we thought that the minimum sample rate of a given input pin was 4 clock cycles of the cog pasm instructions, then we have learned that cogs can be interleaved thus allowing sample rates down to single clock cycle.
    Now, with hub execution, can't you do the same with the hub memory contents? What is the difference between using two cogs to accomplish the job as opposed to use a single cog that requires it's neighbor's hub slot?
  • dMajodMajo Posts: 855
    edited 2014-01-14 00:41
    cgracey wrote: »
    dMajo wrote: »


    There is a WIDEALIGN assembler directive that fills $00's up to the next 32nd-byte address in hub memory (%xx_xxxx_xxxx_xxx0_0000).

    Hi Chip ... thanks .... and congratulations for the job done.

    Does it mean that all the filling 00 will be memory holes (wasted/unusable memory)?
  • cgraceycgracey Posts: 14,133
    edited 2014-01-14 00:50
    dMajo wrote: »
    cgracey wrote: »
    Does it mean that all the filling 00 will be memory holes (wasted/unusable memory)?

    I guess it would be useable if you kept track of where it was, but that may not be worth doing.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-14 00:56
    There are PRO's and CON's for hub slot "hungry mode" implementation.
    I can see valid points for both arguments. It would be sad :( if commercial users of
    P2 were limited by OBEX restrictions when their IP would never be posted there anyway.
    Not all code written for P1 for example fits the OBEX guidelines of ease of integration.
    I know some of my own P1 code is far too interleaved (tangled) to ever be broken up into
    OBEX compatible objects.
    Anyhow P2 is going to fantastic with or without hub slot yielding, good times ahead! :)
    I
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-14 01:02
    dMajo wrote: »
    I am with potatohead here.
    I am afraid that at the end the obex will start to fill with objects stating this: look how I am great, how many thinghs I can do and how fast ... but if you want to use me you need to keep switched off at least two cogs because I need their hub slots.

    At the beginning we thought that the minimum sample rate of a given input pin was 4 clock cycles of the cog pasm instructions, then we have learned that cogs can be interleaved thus allowing sample rates down to single clock cycle.
    Now, with hub execution, can't you do the same with the hub memory contents? What is the difference between using two cogs to accomplish the job as opposed to use a single cog that requires it's neighbor's hub slot?

    That is no different than two random objects needing more hub memory than is available.

    The difference is that the second cog could do something else.

    Frankly, I am very dissapointed by all the "naysaying" against something with a great deal of potential, that can greatly improve Spin/gcc/other vm and graphics code, when all the arguments against it boil down to "newbies might get confused" and "it won't be deterministic".

    Regarding newbies - they will get confused regardless, and they will learn, and not be newbies. Parallax is NOT trying to position P2 as an educational chip, but as a powerful microcontroller. If still worried about newbies, do not allow objects in obex that set the "hungry" mode.

    Regarding determinism - trying to achieve determinism in hubexec mode is far more complicated, and impossible for xlmm anyway. Determinism is great for cog pasm code that needs it, let's not cripple hubexec for some philosopy. If "hungry" mode needs an instruction to set that mode, no one will force anyone to use it, yet it will be available to those who can make good use of it.
  • Heater.Heater. Posts: 21,230
    edited 2014-01-14 03:54
    Bill,

    Naysayer here...
    That is no different than two random objects needing more hub memory than is available.
    Actually there is:

    A memory shortage will be notified to you by the compiler or can even be know prior to compilation if objects were documented in detail. As far as I can tell timing starvation due to the clash of greedy HUB use could cause failures that are only detectable at run time. Can you make a compiler that will fail with an error when building such greedy codes together that cannot meet their timing requirements?

    This is not about "newbies might get confused". After all the entire world is a Prop II newbie at this point in time. The same applies to professional programmers and those who don't like weird surprises causing their code to fail in mysterious and random ways.

    This is not about "it won't be deterministic" in the sense of counting cycles to get the timing right as you might do with in COG code. I don't think anyone expects to do that for HUB code. This is about that higher level determinism of being able to mix and match objects/libraries/functionality without random failures cropping up. Well, you already know all that.
    If "hungry" mode needs an instruction to set that mode, no one will force anyone to use it, yet it will be available to those who can make good use of it.
    Except it will get used. Perhaps by some super duper video driver. Perhaps by some high speed communications protocol driver. Now I come along and want to use both of those in my application. BOOM, random failures, hours wasted head scratching, ultimate disappointment as I find I cannot use them together and my project is doomed.

    Sure this is all some kind debate about "purity" vs "I want speed and damn the complexity" debate. Chip will have to make his own subjective value judgement of that.

    P.S. To answer my own question above, if the compilers would spit out a big red warning:

    "YOU HAVE BROKEN TIMING ISOLATION BETWEEN COG PROCESSES: RANDOM ERRORS MAY OCCUR AT RUNTIME"

    when that greedy mode is used in a program I might be happier about it:)
    But not much. Sounds too much like the problems people have with interrupts clashing.
Sign In or Register to comment.