Shop OBEX P1 Docs P2 Docs Learn Events
[SOLVED] Using SETINDx with REPS, a little gotcha. - Page 4 — Parallax Forums

[SOLVED] Using SETINDx with REPS, a little gotcha.

1246

Comments

  • jmgjmg Posts: 15,148
    edited 2014-01-30 14:59
    Dave Hein wrote: »
    It might be a good idea to include some form of conditional assembly in the assembler so that code can work when running stand-alone or with other tasks. Maybe something like the #define and #ifdef directives used with C. The NOPs would be added, or not depending on an assembler flag. So using the previous example, the code would look something like this.
    #define MULTI_TASK
    ...
        
    dummy   long  $FACE0000
            mov   myreg,dummy
            reps  #4,#4
    #ifndef MULTI_TASK
            nop
    #endif
            shr   myreg,#1
            shr   myreg,#1
            shr   myreg,#1
            shr   myreg,#1
    

    Close, but even that code is not user tolerant. - worse, it gives the illusion it is coded tolerant.

    The code Ariba gave above IS user tolerant, for the current silicon.
  • TorTor Posts: 2,010
    edited 2014-01-30 15:09
    jmg wrote: »
    Sigh. It is actually this simple : The assembler does nothing you do not ask it to do. Period.
    But then we are all in agreement, aren't we? I may have misunderstood. I thought this was about if the assembler should supply spacer instructions where that is necessary for correct working of e.g. reps
    If that is not the issue then I admit to being confused about what is actually discussed here and what the disagreement really is about. Maybe someone could provide a short description of exactly what the issues are. I for one would be grateful.

    -Tor
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-30 15:12
    Dave Hein wrote: »
    It might be a good idea to include some form of conditional assembly in the assembler so that code can work when running stand-alone or with other tasks. Maybe something like the #define and #ifdef directives used with C. The NOPs would be added, or not depending on an assembler flag. So using the previous example, the code would look something like this.
    #define MULTI_TASK
    ...
        
    dummy   long  $FACE0000
            mov   myreg,dummy
            reps  #4,#4
    #ifndef MULTI_TASK
            nop
    #endif
            shr   myreg,#1
            shr   myreg,#1
            shr   myreg,#1
            shr   myreg,#1
    

    It isn't quite that simple because it depends on how the multitasking slots are allocated.

    I think this really is just something the user needs to handle on their own.

    C.W.
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-30 15:15
    jmg wrote: »
    Close, but even that code is not user tolerant. - worse, it gives the illusion it is coded tolerant.

    The code Ariba gave above IS user tolerant, for the current silicon.

    The user needs to bring some intelligence to the table, trying to protect users from themselves is a losing battle.

    C.W.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-30 15:17
    Ariba wrote: »
    - Add a spacer NOP after REPS
    - Add an additional NOP at the end of the loop
    - Set the instruction count in REPS so that it includes the last NOP.

    I think Andy has the solution.

    If all REPS follow the conventions above , it covers both single task and mulri-task scenarios safely.
    Then the only warnings needed in the DOCS are about time slot influences and a reminder that their is only one REPx circuit per cog.

    No assembler changes required. :)
    Brian
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-01-30 15:21
    But doesn't that make the code inefficient if you are only repeating a single instruction? It basically runs at half-speed.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-30 15:41
    Dave Hein wrote: »
    But doesn't that make the code inefficient if you are only repeating a single instruction? It basically runs at half-speed.

    It all comes back to the DOCS again. Those coders who want to get the MAX out of REPS will still be able to tweak the code for performance.
  • jmgjmg Posts: 15,148
    edited 2014-01-30 16:01
    Dave Hein wrote: »
    But doesn't that make the code inefficient if you are only repeating a single instruction? It basically runs at half-speed.

    Yes, there is a run time cost inside the loop, (which I missed earlier) - that makes it a context safe, but less than optimal, solution.
    That run-time cost is highest on shortest loops.

    Chip may yet find a way to deliver both context safe and optimal (or at least no inner loop cost) - he's done quite well so far.
  • TubularTubular Posts: 4,622
    edited 2014-01-30 16:05
    Am I the only one that finds a thread like this vaguely reassuring - with regard to what we're fussing over.

    I think essentially jmg's suggestion is a "user friendly" one and is really just a bit ahead of its time wrt where we are with prop2 right now. It's probably a whole lot easier to provide a kind of tool-tip that reminds the programmer or the potential trap, together with clear and complete description in the manual.

    I wonder whether Chip doesn't look at a thread like this and think it would be easier to add the remaining three repeater blocks to at least remove that gotcha. I remember thinking about this in some detail and giving my response here .

    I still think its fine to have to think about these resources and how they must be used carefully. May help to prevent the onset of Alzheimer's.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-30 17:40
    So what have we learned here?
    * Spacer instructions ARE required in 1-task applications to allow the pipeline to prime
    before repeating can commence. If REPS is used by a task that uses no more than every 2nd
    time slot, NO spacers are needed. If REPD is used by a task that uses no more than every
    4th time slot, NO spacers are needed, as three intervening instructions will be provided
    by the other task(s).

    "Spacers CANNOT be used in REPx loops in multi-tasking"

    Brian :)
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-01-30 17:44
    Wow! If it's that simple, it makes all this brilliant discussion seem a bit silly! :0)
  • jmgjmg Posts: 15,148
    edited 2014-01-30 19:02
    ozpropdev wrote: »
    So what have we learned here?

    "Spacers CANNOT be used in REPx loops in multi-tasking"

    Almost, but not quite : your own typo-test and Ariba's code shows there is another middle case where if you multi-task and fail to meet the greater-than setting, then you do need the user-spacer.
    In your case, you got two different unstable results on that setting.

    I think Ariba's code also works safely in that middle case, but at the cost of some added loop overhead.

    Also, REPD may need one or two or three user-provided spacers, depending on the slot settings.
    The REPD has multiple middle cases, and they may prove harder to avoid - eg someone may want to tune the slots a little.

    You can avoid user-spacers in Multi-tasking, provided you always keep above a certain slot space, but fall below that, and your code will become erratic.

    Ariba's solution (REPS) gives two user spacers, of which one is always used within the loop, (sometime leading, sometimes trailing) for 100% predictable outcomes (any slot map tolerated, even a mixed or dynamic one)

    A similar redundant REPD solution will exist, but at the cost of more code-padding, and more wasted cycles in the loop.

    Simple ? Not really. Dangerous ? Yes.
  • jmgjmg Posts: 15,148
    edited 2014-01-30 19:08
    Tubular wrote: »
    I think essentially jmg's suggestion is a "user friendly" one and is really just a bit ahead of its time wrt where we are with prop2 right now.

    Correct, it is looking ahead to how users might/will stumble, and how the Obex can be made more robust.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-30 20:26
    jmg wrote: »
    Almost, but not quite : your own typo-test and Ariba's code shows there is another middle case where if you multi-task and fail to meet the greater-than setting, then you do need the user-spacer.
    In your case, you got two different unstable results on that setting.

    I think Ariba's code also works safely in that middle case, but at the cost of some added loop overhead.

    Also, REPD may need one or two or three user-provided spacers, depending on the slot settings.
    The REPD has multiple middle cases, and they may prove harder to avoid - eg someone may want to tune the slots a little.

    You can avoid user-spacers in Multi-tasking, provided you always keep above a certain slot space, but fall below that, and your code will become erratic.

    Ariba's solution (REPS) gives two user spacers, of which one is always used within the loop, (sometime leading, sometimes trailing) for 100% predictable outcomes (any slot map tolerated, even a mixed or dynamic one)

    A similar redundant REPD solution will exist, but at the cost of more code-padding, and more wasted cycles in the loop.

    Simple ? No. Dangerous ? Yes.

    Running my same test with a different schedule produced erratic behaviour with a spacer NOP and without.
    schedule		long	%%2010_1010_1010_1010
    changed to
    schedule		long	%%2111_1111_0000_0000
    

    It seems their is no guarantee when the REPS block starts in the schedule. In this schedule most of the time the
    REPS instruction will need a spacer except in one case where it doesn't. A bit of a lottery.

    Based on the results I have revised my statement.

    "Spacers CANNOT be used reliably in REPx loops in multi-tasking"

    Brian :)
  • jmgjmg Posts: 15,148
    edited 2014-01-30 20:48
    ozpropdev wrote: »

    It seems their is no guarantee when the REPS block starts in the schedule. In this schedule most of the time the
    REPS instruction will need a spacer except in one case where it doesn't. A bit of a lottery.

    Lotteries are what worries me.

    Did you test that with Ariba's code ? ( #71) - ie with a NOP at each end and a plus 1 on the block size.
    The hope is that 'covers all timing bases', and so will be stable. Be nice to confirm Y/N on his suggestion.
  • ozpropdevozpropdev Posts: 2,791
    edited 2014-01-30 21:26
    jmg wrote: »
    Lotteries are what worries me.

    Did you test that with Ariba's code ? ( #71) - ie with a NOP at each end and a plus 1 on the block size.
    The hope is that 'covers all timing bases', and so will be stable. Be nice to confirm Y/N on his suggestion.

    Sorry I didn't make that clear, (Doing too many things at once at the moment).
    I was talking about time slots < every 2nd in this test.
    I tried the original way as well as Ariba's way.
    Yes, Ariba's suggested convention works perfectly in all the scenarios I tested.

    Brian :)
  • jmgjmg Posts: 15,148
    edited 2014-01-30 21:41
    ozpropdev wrote: »
    Yes, Ariba's suggested convention works perfectly in all the scenarios I tested.

    Good - so there is a 'high tolerance' structure for tasks, (amongst the many minefields) it's just a pity is has a loop-penalty as well.
    The code-overhead side is almost tolerable, if the speed hit could be avoided.
    Something for Chip to think about ?
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 22:11
    This is a hard thing to overcome in hardware. The pipeline needs spacers to implement REPS and REPD, and I don't think I can magically insert them into the pipeline, as there's too much going on.

    In 4-way multitasking, a jmp (or loop) takes only one clock - same as a NOP. So, in 4-way multitasking, REPS/REPD offers no speed or code size advantage, anyway.

    Maybe the documentation should just state that REPS/REPD are for single-task programs, only. That would save a lot of headaches.

    This spacer thing really nailed me once on the Spin2 interpreter. I turned on multitasking and everything blew up. I had actually forgotten that spacers weren't needed in REPS cases where a task was getting half or less of the clock cycles. I just recoded everything to use DJNZ's. Problem solved. Sanity maintained.

    The thing that I cannot figure out how to solve is the spacer issue. If that were solved, things would simplify and there would be strong reason to add 3 more instances of the REPS/REPD hardware so that every task could have one to its advantage.
  • SapiehaSapieha Posts: 2,964
    edited 2014-01-30 22:31
    Hi Chip.

    It is not possible to stall pipeline for REPS settle before it start execute --->
    Give some speed missing but I thing -- will solve problems

    cgracey wrote: »
    This is a hard thing to overcome in hardware. The pipeline needs spacers to implement REPS and REPD, and I don't think I can magically insert them into the pipeline, as there's too much going on.

    In 4-way multitasking, a jmp (or loop) takes only one clock - same as a NOP. So, in 4-way multitasking, REPS/REPD offers no speed or code size advantage, anyway.

    Maybe the documentation should just state that REPS/REPD are for single-task programs, only. That would save a lot of headaches.

    This spacer thing really nailed me once on the Spin2 interpreter. I turned on multitasking and everything blew up. I had actually forgotten that spacers weren't needed in REPS cases where a task was getting half or less of the clock cycles. I just recoded everything to use DJNZ's. Problem solved. Sanity maintained.

    The thing that I cannot figure out how to solve is the spacer issue. If that were solved, things would simplify and there would be strong reason to add 3 more instances of the REPS/REPD hardware so that every task could have one to its advantage.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 22:34
    Sapieha wrote: »
    Hi Chip.

    It is not possible to stall pipeline for REPS settle before it start execute --->
    Give some speed missing but I thing -- will solve problems


    It's not a matter of stalling the pipeline, but of stepping things through it.
  • SapiehaSapieha Posts: 2,964
    edited 2014-01-30 22:44
    Hi Chip.

    Sory as I still don't fully grasp pipelining in P2.
    So I post maybe dumb questions

    What about if REPS discard last previous instructions pipeline.
    Then loads without pipeline active to settle

    cgracey wrote: »
    It's not a matter of stalling the pipeline, but of stepping things through it.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 23:25
    Sapieha wrote: »
    What about if REPS discard last previous instructions pipeline.
    Then loads without pipeline active to settle


    Since your post, I've been thinking about how to get around this problem by doing what you mentioned above.

    I could cancel the pipeline by doing a JMP to PC+1. That would start REPx with a clean pipe. The only problem is that GETPIX requires 3 clocks in its two prior pipeline stages, so we wouldn't be able to repeat GETPIX, which is important to be able to do.

    The alternative is to add a state to the repeat circuit where we don't repeat (JMP PC-n) until one (REPS) or three (REPD) instructions pass from the same task. That wouldn't require any code modification and would be fastest for single-task programs, as the pipeline wouldn't need to be cleared. I'm going to try to do it that way. If it works, I'll add three more circuits so each task can have one.
  • SapiehaSapieha Posts: 2,964
    edited 2014-01-30 23:29
    Hi Chip.

    Thanks.

    Any solution that skip spacers will be GOOD.

    Remove all confusions for users


    cgracey wrote: »
    Since your post, I've been thinking about how to get around this problem by doing what you mentioned above.

    I could cancel the pipeline by doing a JMP to PC+1. That would start REPx with a clean pipe. The only problem is that GETPIX requires 3 clocks in its two prior pipeline stages, so we wouldn't be able to repeat GETPIX, which is important to be able to do.

    The alternative is to add a state to the repeat circuit where we don't repeat (JMP PC-n) until one (REPS) or three (REPD) instructions pass from the same task. That wouldn't require any code modification and would be fastest for single-task programs, as the pipeline wouldn't need to be cleared. I'm going to try to do it that way. If it works, I'll add three more circuits so each task can have one.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 23:37
    Sapieha wrote: »
    Hi Chip.

    Thanks.

    Any solution that skip spacers will be GOOD.

    Remove all confusions for users


    I could skip the spacers by clearing the pipeline, but this would waste three clocks in single-task mode. I've also found that it's sometimes very handy to have an instruction or three to do some pin output with, just before the repeating block executes.

    Code would be easier to write without spacers, though. You just couldn't do any output right before the repeating block executes.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 23:41
    Would you guys like to see REPS/REPD work so that no spacers are ever required?

    This would make code easy to write, but would waste three clock in single-task mode and would not allow you to abut some pin output instruction(s) right up against the start of the repeating block.

    I could stall the pipeline for the two instructions before GETPIX, without needing two discrete 'WAIT #2' instructions, so GETPIX would still work.
  • potatoheadpotatohead Posts: 10,254
    edited 2014-01-30 23:43
    I like the spacers for that reason (setup), rather than just burn the clocks. If it's not easily sorted in tasking mode, so be it. We've got the DNJZ option handy for that case. Deffo want to keep GETPIX optimized.

    I don't understand what this means:
    The alternative is to add a state to the repeat circuit where we don't repeat (JMP PC-n) until one (REPS) or three (REPD) instructions pass from the same task.

    Is this essentially burning the cycles?
  • SapiehaSapieha Posts: 2,964
    edited 2014-01-30 23:44
    Hi Chip.

    If it waste 1x3 clocks on entire function and still are deterministic -- That NOT BIG problem --

    BUT if it waste 3 clocks every round --- BIG problem

    cgracey wrote: »
    I could skip the spacers by clearing the pipeline, but this would waste three clocks in single-task mode. I've also found that it's sometimes very handy to have an instruction or three to do some pin output with, just before the repeating block executes.

    Code would be easier to write without spacers, though. You just couldn't do any output right before the repeating block executes.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 23:46
    potatohead wrote: »
    I like the spacers for that reason (setup), rather than just burn the clocks. If it's not easily sorted in tasking mode, so be it. We've got the DNJZ option handy for that case.

    I don't understand what this means:

    Is this essentially burning the cycles?


    I should have said, instead of "(REPS)", "(in the case of REPS)". Same for REPD. I meant that if REPS had executed, I would ensure that one instruction from that task passed before repeating. I would wait for three instructions in the case of REPD.

    I'm talking about making coding consistent in all task modes.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-30 23:48
    Sapieha wrote: »
    Hi Chip.

    If it waste 1x3 clocks on entire function and still are deterministic -- That NOT BIG problem --

    BUT if it waste 3 clocks every round --- BIG problem


    It would only waste three clocks initially. The loops would be zero-overhead.
  • SapiehaSapieha Posts: 2,964
    edited 2014-01-30 23:50
    Hi Chip.

    That sounds VERY good

    cgracey wrote: »
    It would only waste three clocks initially. The loops would be zero-overhead.
Sign In or Register to comment.