Shop OBEX P1 Docs P2 Docs Learn Events
C: What Is The Closest C Statement To A SPIN REPEAT — Parallax Forums

C: What Is The Closest C Statement To A SPIN REPEAT

idbruceidbruce Posts: 6,197
edited 2015-04-12 23:40 in Propeller 1
I realize there are several ways to do this, but I am looking for the fastest possible execution.

For example, for a 10 iteration loop in SPIN, we simply say:
REPEAT 10

Now let's say that we assign 10 to a variable called STEPS, our code then becomes:
REPEAT STEPS

During the iterations, the STEPS variable remains unchanged, but you do not get to see what is going on in the background. Now I would assume that in the background of the REPEAT execution, there is a counter variable to which a value of STEPS would be assigned, and that variable would be decremented or incremented for each iteration.

With that in mind, I would asume that something like the following would probably be the fastest:
int i = STEPS;

while(i != 0) 
{
    SOME CODE;

    i--;
}

It would be nice if I could just use STEPS without creating a variable and assigning a value, and without the value of STEPS being altered. Just looking for ideas and opinions. However, like I said, I am looking for the fastest possible execution. I would imagine someone might know the answer without testing :)

Comments

  • David BetzDavid Betz Posts: 14,516
    edited 2015-04-11 14:04
    I think what you did is probably about the best you'll do in C. In fact, it probably isn't much different from what Spin is doing except that the variable is explicit instead of being hidden like in Spin.
  • abecedarianabecedarian Posts: 312
    edited 2015-04-11 14:14
    int STEPS = 10;
    
    // loop to iterate "steps" number of times:
    for (int i = 1; i <= STEPS; i++) {
      // do something
      // variable "i" can be used do indicate which "step" you're currently in
      // and is only available for use within the scope of this "for" loop
    }
    
  • idbruceidbruce Posts: 6,197
    edited 2015-04-11 14:15
    David
    I think what you did is probably about the best you'll do in C. In fact, it probably isn't much different from what Spin is doing except that the variable is explicit instead of being hidden like in Spin.

    Yea, I would assume SPIN is very similar.

    Actually, I just thought of something......

    In my case, STEPS actually represents both the ramp-up and ramp-down steps for a motor driver. If I seperate these two and have a variable for both of them, I could then eliminate the assignment within the driver, for both the ramp-up and ramp-down iterations. So then I could decrement the variable itself and not worry about it being altered.

    Hmmm... I do believe that is what I am going to do.
  • jmgjmg Posts: 15,173
    edited 2015-04-11 14:16
    idbruce wrote: »
    Just looking for ideas and opinions. However, like I said, I am looking for the fastest possible execution. I would imagine someone might know the answer without testing :)

    Mostly people look for smallest, and fast comes along too... :)
    There is a DJNZ opcode, so ideally, you want to aim for that.

    For that, see this code example from David Betz (#48 & #45)
    http://forums.parallax.com/showthread.php/160731-What-are-the-limits-of-in-COG-with-C-or-BASIC?p=1325462&viewfull=1#post1325462

    In that example
    [B]for (count = 8; --count >= 0; ) {[/B]
    
    compiles to this loop control code 
    
    	mov	r6, #9    
    	jmp	#.L9      
    .L10    
    ' Loop stuff here                   
    .L9                       
    	djnz	r6,#.L10  
    
    Which has the smallest, and fastest, Looping.
    It takes 3 lines to 'get going' and has the iterating value of 8..1, but as you see, there is nothing faster for looping speed.
  • idbruceidbruce Posts: 6,197
    edited 2015-04-11 14:39
    @jmg
    but as you see, there is nothing faster for looping speed.

    I will just have to take your word for it, because I am ASM iliterate :)
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 14:46
    Here are some loops:
    void loop1() {
        int i = steps;
        while(i != 0)
        {
            someCode();
            i--;
        }
    }
    
    void loop2() {
        int i = steps;
        while(i-- != 0)
        {
            someCode();
        }
    }
    
    void loop3() {
        int i = steps;
    loop:
        someCode();
        if (--i) goto loop;
    }
    
    void loop4() {
        int i = steps;
        for (int i = 0; i < 10; i++)
        {
            someCode();
        }
    }
    
    Here is what the compile too:
    loop1:
    _loop1
    	mov	__TMP0,#(2<<4)+14
    	call	#__LMM_PUSHM
    	mvi	r7,#_steps
    	rdlong	r14, r7
    	brs	#.L3
    .L4
    	lcall	#_someCode
    	sub	r14, #1
    .L3
    	cmps	r14, #0 wz,wc
    	IF_NE	brs	#.L4
    	mov	__TMP0,#(2<<4)+15
    	call	#__LMM_POPRET
    
    
    _loop2
    	mov	__TMP0,#(2<<4)+14
    	call	#__LMM_PUSHM
    	mvi	r7,#_steps
    	rdlong	r14, r7
    	brs	#.L6
    .L7
    	lcall	#_someCode
    	sub	r14, #1
    .L6
    	cmps	r14, #0 wz,wc
    	IF_NE	brs	#.L7
    	mov	__TMP0,#(2<<4)+15
    	call	#__LMM_POPRET
    
    
    _loop3
    	mov	__TMP0,#(2<<4)+14
    	call	#__LMM_PUSHM
    	mvi	r7,#_steps
    	rdlong	r14, r7
    .L9
    	lcall	#_someCode
    	djnz	r14,#__LMM_JMP
    	long	.L9
    	mov	__TMP0,#(2<<4)+15
    	call	#__LMM_POPRET
    
    
    _loop4
    	mov	__TMP0,#(2<<4)+14
    	call	#__LMM_PUSHM
    	mov	r14, #10
    .L12
    	lcall	#_someCode
    	djnz	r14,#__LMM_JMP
    	long	.L12
    	mov	__TMP0,#(2<<4)+15
    	call	#__LMM_POPRET
    
    Here is the summary of instructions used:
            Loop   func
    loop1 -    4     11
    loop2 -    4     11
    loop3 -    3      9
    loop4 -    3      8
    
    Just use "for".
  • idbruceidbruce Posts: 6,197
    edited 2015-04-11 14:59
    Heater

    Very interesting....
    In my case, STEPS actually represents both the ramp-up and ramp-down steps for a motor driver. If I seperate these two and have a variable for both of them, I could then eliminate the assignment within the driver, for both the ramp-up and ramp-down iterations. So then I could decrement the variable itself and not worry about it being altered.

    In the case of a while loop, what happens if the assignment is made outside the function? I realize that it is extra instructions elsewhere, but the initialization would not be in a time critical area?
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 15:03
    Hmm...anyone have any idea why the code generate to read the steps variable is different in loop4() above?

    loop4() uses just:
    	mov	r14, #10
    
    but the other three all do:
    	mvi	r7,#_steps
    	rdlong	r14, r7
    
  • abecedarianabecedarian Posts: 312
    edited 2015-04-11 15:10
    Probably because you assign i the value of steps before entering the for loop, then in the for loop you re-define it to equal 0.
    The compiler will see the redefinition without anything happening between the two and discard the first.
    void loop4() {
        int i = steps;                // <- gets optimized away
        for (int i = 0; i < 10; i++)  // because i is redefined here
        {
            someCode();
        }
    }
    
    // could be re-written:
    void loop4() {
        for (int i = 0; i < steps; i++)
        {
            someCode();
        }
    }
    // or:
    void loop4() {
       for (int i = steps; i > 0; i--)
        {
            someCode();
        }
    }
    
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 15:14
    You mean like:
    int i;
    
    void loop1() {
        while(i != 0)
        {
            someCode();
            i--;
        }
    }
    
    int main()
    {
        i = 10;
        loop1();
    }
    
    That would work.

    Don't do that though. Having global variables lying around will get you into a mess.

    Anyway, skip the premature optimization, get the code running correctly first.
  • jmgjmg Posts: 15,173
    edited 2015-04-11 15:15
    Heater. wrote: »
    Hmm...anyone have any idea why the code generate to read the steps variable is different in loop4() above?
    I'd say a for loop is by far the most common, so gets the most focus, and thus the best attention to size/speed efforts.
    Note the for structure example I gave above, that compiles to a single DJNZ looping instruction.
  • jmgjmg Posts: 15,173
    edited 2015-04-11 15:26
    idbruce wrote: »
    @jmg
    I will just have to take your word for it, because I am ASM iliterate :)

    You do not need to be very ASM literate, just enough to be able to see the looping structure.
    Note also the optimizer can do things you may not expect. (so checking ASM is always a good idea)

    As well as removing the preload, look carefully at the count direction in Loop4.Post#7
    The code says i++, but the ASM says DJNZ ?!
    That happens, because the for loop did not reference the i, so the optimizer flips to use the more compact DJNZ, of course, if you actually use i in the loop, the for structure will change.
    ie use care when doing small benchmarks.
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 15:35
    jmg,

    My code also compiles to djnz (oops you may have read the wrong asm, I posted the wrong file initially, it's correct now)

    Your example is for in COG code if I'm not mistaken.

    Which starts me thinking....this is LMM code, so in the "while" loop examples we see things like this:
    .L4
            lcall   #_someCode
            sub     r14, #1
    .L3
            cmps    r14, #0 wz,wc
            IF_NE   brs     #.L4
    
    And the "for" loop looks more efficient like so:
    .L12
            lcall   #_someCode
            djnz    r14,#__LMM_JMP
            long    .L12
    

    But wait...they are both doing more work than it appears. The "for" loop case is jumping off to a kernel routine __LMM_JMP to do the LMM jump. The "while" loop is using "brs" which is not a Propeller instruction!

    Until we know how long those things take to run we still don't know which loop is faster.
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 15:37
    abecedarian,

    Oops, yes, well spotted. I forgot to remove that redundant declaration. Which was half the point of showing the code!
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 15:55
    @abecedarian,

    Unfortunately correcting my code, well actually using yours, makes our for loop bigger
    void loop4a() {
        for (int i = 0; i < steps; i++)
        {
            someCode();
        }
    }
    
    void loop4b() {
        for (int i = steps; i > 0; i--)
        {
            someCode();
        }
    }
    
    Becomes:
    _loop4a
    	mov	__TMP0,#(3<<4)+13
    	call	#__LMM_PUSHM
    	mov	r14, #0
    	mvi	r13,#_steps
    	brs	#.L12
    .L13
    	lcall	#_someCode
    	add	r14, #1
    .L12
    	rdlong	r7, r13
    	cmps	r14, r7 wz,wc
    	IF_B 	brs	#.L13
    	mov	__TMP0,#(3<<4)+15
    	call	#__LMM_POPRET
    
    
    _loop4b
    	mov	__TMP0,#(2<<4)+14
    	call	#__LMM_PUSHM
    	mvi	r7,#_steps
    	rdlong	r14, r7
    	brs	#.L15
    .L16
    	lcall	#_someCode
    	sub	r14, #1
    .L15
    	cmps	r14, #0 wz,wc
    	IF_A 	brs	#.L16
    	mov	__TMP0,#(2<<4)+15
    	call	#__LMM_POPRET
    

    the loops are now 5 and 4 instructions long. Making the while loops the winners!


    jmg is so right about this micro-optimization problem.
  • idbruceidbruce Posts: 6,197
    edited 2015-04-11 17:14
    Anyway, skip the premature optimization, get the code running correctly first.

    I hear ya :) but thanks for the testing... I thought it was fairly interesting
  • davidsaundersdavidsaunders Posts: 1,559
    edited 2015-04-11 17:32
    idbruce wrote: »
    I hear ya :) but thanks for the testing... I thought it was fairly interesting
    The easiest implementation would be:
    {
    long tmp;
     for (tmp = STEPS; tmp; tmp--)
     {
       /*code in loop*/
     }
    }
    

    And that is just the way to do things in C.
  • ersmithersmith Posts: 6,053
    edited 2015-04-11 17:54
    If you're running in LMM mode, and your loop body has function calls in it, it's probably not going to matter -- the LMM jump overhead is going to dominate the instruction counting.

    For COG mode, or for LMM code with no (non-native) functions in the loop, you can force a djnz with:
    int i = steps;
    do {
       // stuff
    } while(--i != 0);
    
    which, if you think about it, is exactly what djnz does (note that the loop will always be executed at least once though).

    Usually though one writes a loop like this as:
    for (i = 0; i < steps; i++) {
      // stuff
    }
    
    This is a common enough pattern that the compiler will usually optimize it very well, especially if it knows that steps is non-negative (e.g. if it is declared unsigned).

    In LMM mode, try very hard to keep the loop small ( < 1K code) and simple (no function calls). This will allow the compiler to put the loop into FCACHE, which will speed it up enormously (4X or so).
  • jmgjmg Posts: 15,173
    edited 2015-04-11 18:22
    Heater. wrote: »
    Anyway, skip the premature optimization, get the code running correctly first.

    This type of analysis I call less "premature optimization" and more 'getting ones head in sync with how the compiler thinks', which is certainly a good idea for any embedded developer looking for the best results.
    The lowest looping overhead occurs when the Compiler uses DJNZ, but that does decrement the variable, and the last loop value is 1, not 0, which some may expect.

    I guess that shows just how old the DJNZ opcode is, and a smarter opcode would have been DJNU (dec & Jump if not underflow), which would have allowed 0 based indexing and cleaner nesting of DJNU opcodes.
    Do we blame intel for that oversight ?
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 18:49
    jmg,

    I'm all in favour of getting to know what the compiler does. The question is: Is it worth worrying about micro-optimizations like this if the cost might be writing less than blindingly obvious code.

    Currently I can't be sure that using DJNZ in LMM loops is the lowest overhead. That code does a jump to some LMM kernel routine in the COG at __LMM_JMP. So far I have no idea what that routine looks like.

    I like the DJNU idea.
  • abecedarianabecedarian Posts: 312
    edited 2015-04-11 19:12
    Heater. wrote: »
    @abecedarian,
    >snip<
    Unfortunately correcting my code, well actually using yours, makes our for loop bigger.
    the loops are now 5 and 4 instructions long. Making the while loops the winners!


    jmg is so right about this micro-optimization problem.
    Probably the re-cast of i from int to long causing part of that. I keep forgetting everything is a long in Propeller land.
    I wonder how it'd add up if "i" was declared long?
  • Heater.Heater. Posts: 21,230
    edited 2015-04-11 19:22
    Make no difference. A long and an int are both 32 bit signed in prop-gcc.
  • GenetixGenetix Posts: 1,754
    edited 2015-04-12 23:40
    Bruce, for what's is worth, REPEAT is the Spin equivalent of FOR.NEXT in PBASIC.

    Also, FOR is usually used for counting from or to a particular value, but WHILE is used to either wait for something to happen or do something is happening.

    For example, I used this to wait for C to be pressed in a QBASIC program. INKEY$ grabs whatever is sitting in the Keyboard buffer even if it's nothing.
    DO : LOOP UNTIL INKEY$ = "C"
    
Sign In or Register to comment.