C: What Is The Closest C Statement To A SPIN REPEAT
idbruce
Posts: 6,197
I realize there are several ways to do this, but I am looking for the fastest possible execution.
For example, for a 10 iteration loop in SPIN, we simply say:
Now let's say that we assign 10 to a variable called STEPS, our code then becomes:
During the iterations, the STEPS variable remains unchanged, but you do not get to see what is going on in the background. Now I would assume that in the background of the REPEAT execution, there is a counter variable to which a value of STEPS would be assigned, and that variable would be decremented or incremented for each iteration.
With that in mind, I would asume that something like the following would probably be the fastest:
It would be nice if I could just use STEPS without creating a variable and assigning a value, and without the value of STEPS being altered. Just looking for ideas and opinions. However, like I said, I am looking for the fastest possible execution. I would imagine someone might know the answer without testing
For example, for a 10 iteration loop in SPIN, we simply say:
REPEAT 10
Now let's say that we assign 10 to a variable called STEPS, our code then becomes:
REPEAT STEPS
During the iterations, the STEPS variable remains unchanged, but you do not get to see what is going on in the background. Now I would assume that in the background of the REPEAT execution, there is a counter variable to which a value of STEPS would be assigned, and that variable would be decremented or incremented for each iteration.
With that in mind, I would asume that something like the following would probably be the fastest:
int i = STEPS; while(i != 0) { SOME CODE; i--; }
It would be nice if I could just use STEPS without creating a variable and assigning a value, and without the value of STEPS being altered. Just looking for ideas and opinions. However, like I said, I am looking for the fastest possible execution. I would imagine someone might know the answer without testing
Comments
Yea, I would assume SPIN is very similar.
Actually, I just thought of something......
In my case, STEPS actually represents both the ramp-up and ramp-down steps for a motor driver. If I seperate these two and have a variable for both of them, I could then eliminate the assignment within the driver, for both the ramp-up and ramp-down iterations. So then I could decrement the variable itself and not worry about it being altered.
Hmmm... I do believe that is what I am going to do.
Mostly people look for smallest, and fast comes along too...
There is a DJNZ opcode, so ideally, you want to aim for that.
For that, see this code example from David Betz (#48 & #45)
http://forums.parallax.com/showthread.php/160731-What-are-the-limits-of-in-COG-with-C-or-BASIC?p=1325462&viewfull=1#post1325462
In that example Which has the smallest, and fastest, Looping.
It takes 3 lines to 'get going' and has the iterating value of 8..1, but as you see, there is nothing faster for looping speed.
I will just have to take your word for it, because I am ASM iliterate
Very interesting....
In the case of a while loop, what happens if the assignment is made outside the function? I realize that it is extra instructions elsewhere, but the initialization would not be in a time critical area?
loop4() uses just:
but the other three all do:
The compiler will see the redefinition without anything happening between the two and discard the first.
Don't do that though. Having global variables lying around will get you into a mess.
Anyway, skip the premature optimization, get the code running correctly first.
Note the for structure example I gave above, that compiles to a single DJNZ looping instruction.
You do not need to be very ASM literate, just enough to be able to see the looping structure.
Note also the optimizer can do things you may not expect. (so checking ASM is always a good idea)
As well as removing the preload, look carefully at the count direction in Loop4.Post#7
The code says i++, but the ASM says DJNZ ?!
That happens, because the for loop did not reference the i, so the optimizer flips to use the more compact DJNZ, of course, if you actually use i in the loop, the for structure will change.
ie use care when doing small benchmarks.
My code also compiles to djnz (oops you may have read the wrong asm, I posted the wrong file initially, it's correct now)
Your example is for in COG code if I'm not mistaken.
Which starts me thinking....this is LMM code, so in the "while" loop examples we see things like this: And the "for" loop looks more efficient like so:
But wait...they are both doing more work than it appears. The "for" loop case is jumping off to a kernel routine __LMM_JMP to do the LMM jump. The "while" loop is using "brs" which is not a Propeller instruction!
Until we know how long those things take to run we still don't know which loop is faster.
Oops, yes, well spotted. I forgot to remove that redundant declaration. Which was half the point of showing the code!
Unfortunately correcting my code, well actually using yours, makes our for loop bigger Becomes:
the loops are now 5 and 4 instructions long. Making the while loops the winners!
jmg is so right about this micro-optimization problem.
I hear ya but thanks for the testing... I thought it was fairly interesting
And that is just the way to do things in C.
For COG mode, or for LMM code with no (non-native) functions in the loop, you can force a djnz with: which, if you think about it, is exactly what djnz does (note that the loop will always be executed at least once though).
Usually though one writes a loop like this as: This is a common enough pattern that the compiler will usually optimize it very well, especially if it knows that steps is non-negative (e.g. if it is declared unsigned).
In LMM mode, try very hard to keep the loop small ( < 1K code) and simple (no function calls). This will allow the compiler to put the loop into FCACHE, which will speed it up enormously (4X or so).
This type of analysis I call less "premature optimization" and more 'getting ones head in sync with how the compiler thinks', which is certainly a good idea for any embedded developer looking for the best results.
The lowest looping overhead occurs when the Compiler uses DJNZ, but that does decrement the variable, and the last loop value is 1, not 0, which some may expect.
I guess that shows just how old the DJNZ opcode is, and a smarter opcode would have been DJNU (dec & Jump if not underflow), which would have allowed 0 based indexing and cleaner nesting of DJNU opcodes.
Do we blame intel for that oversight ?
I'm all in favour of getting to know what the compiler does. The question is: Is it worth worrying about micro-optimizations like this if the cost might be writing less than blindingly obvious code.
Currently I can't be sure that using DJNZ in LMM loops is the lowest overhead. That code does a jump to some LMM kernel routine in the COG at __LMM_JMP. So far I have no idea what that routine looks like.
I like the DJNU idea.
I wonder how it'd add up if "i" was declared long?
Also, FOR is usually used for counting from or to a particular value, but WHILE is used to either wait for something to happen or do something is happening.
For example, I used this to wait for C to be pressed in a QBASIC program. INKEY$ grabs whatever is sitting in the Keyboard buffer even if it's nothing.