bsr or (int)log2 in Prop C / PASM help too (Optimization Problem)

I need a bsr type function for integers in Prop C.
In the past I've used (int)log2(arg) because it works quickly on systems with FPUs. I assume this uses the processor's log tables to perform log(arg)/log(2).
Since the propeller has no FPU I am concerned about how long this takes, so I'm trying to test it.
Here is my code so far:
and the results:
I get the expected results including a negative value for undefined: log2(0).
There are a few issues:
No log2 in this gcc?
How do I read the PC so I can test the speed in clocks? Is this a valid method of timing?
Is there a better(faster) way than either of these? I read through the propeller manual and I did not find anything on log2 or bsr.
In the past I've used (int)log2(arg) because it works quickly on systems with FPUs. I assume this uses the processor's log tables to perform log(arg)/log(2).
Since the propeller has no FPU I am concerned about how long this takes, so I'm trying to test it.
Here is my code so far:
#include "simpletools.h" // Include simple tools
//#include <math.h>
int main() { // Entry point
int index; // location within the binary integer
int pow2a; // Characteristic sans mantissa, first test
int pow2b; // Characteristic sans mantissa, second test
int num_in; // log2 argument
unsigned int startA;
unsigned int endA;
unsigned int startB;
unsigned int endB;
while(1) { // Primary loop
print("Provide an integer: "); // Prompt user
scan("%d\n", &num_in); // Collect input
//startA = __current_pc();
index = 32; // length of int in bits
pow2a = -1; // flag value
while((pow2a == -1) & (index-- > 0)) { // While no characteristic and bits remain, decrement position
if(num_in & (1 << index)) { // if the current bit is on
pow2a = index; // record its location as the log2 characteristic
break; // and exit
}
}
//endA = __current_pc();
//startB = __current_pc();
// pow2b = (int)log2(num_in); // use log tables? log(num_in)/log(2)? cast to integer
pow2b = (int)(log(num_in)/log(2)); // guess that answers ^
//endB = __current_pc();
//print("Result A: %d time: %d Result B: %d time: %d \n", pow2a, (endA-startA), pow2b, (endB-startB)); // display the results
print("Result A: %d Result B: %d \n", pow2a, pow2b); // display the results
}
}
and the results:
Provide an integer: 8675309
Result A: 23 Result B: 23
Provide an integer: 0
Result A: -1 Result B: -2147483648
Provide an integer: 1
Result A: 0 Result B: 0
Provide an integer: 24601
Result A: 14 Result B: 14
Provide an integer: 2
Result A: 1 Result B: 1
Provide an integer: 3
Result A: 1 Result B: 1
Provide an integer: 4
Result A: 2 Result B: 2
Provide an integer: 90210
Result A: 16 Result B: 16
I get the expected results including a negative value for undefined: log2(0).
There are a few issues:
No log2 in this gcc?
How do I read the PC so I can test the speed in clocks? Is this a valid method of timing?
Is there a better(faster) way than either of these? I read through the propeller manual and I did not find anything on log2 or bsr.
Comments
The propeller doesn't have a BSR instruction. Finding the most significant 1 bit has to be done in software. The prop does have a log lookup table, but I think it's only accurate to 12 bits. So if that accuracy is OK you could use the PASM code that is in the Prop manual to implement an integer version of log2.
The propeller system counter is accessible through the CNT variable. Include propeller.h to get the definition for it.
#include "simpletools.h" // scan(), print() #include "propeller.h" // CNT int main() { // Entry point int index; // location within the binary integer int pow2a; // Characteristic sans mantissa, first test int pow2b; // Characteristic sans mantissa, second test int num_in; // log2 argument unsigned int startA; unsigned int endA; unsigned int startB; unsigned int endB; while(1) { // Primary loop print("Provide an integer: "); // Prompt user scan("%d\n", &num_in); // Collect input startA = CNT; index = 32; // length of int in bits pow2a = -1; // flag value while((pow2a == -1) & (index-- > 0)) { // While no characteristic and bits remain, decrement position if(num_in & (1 << index)) { // if the current bit is on pow2a = index; // record its location as the log2 characteristic break; // and exit } } endA = CNT; startB = CNT; // pow2b = (int)log2(num_in); // use log tables? log(num_in)/log(2)? cast to integer pow2b = (int)(log(num_in)/log(2)); // guess that answers ^ endB = CNT; print("Result A: %d time: %d Result B: %d time: %d \n", pow2a, (endA-startA), pow2b, (endB-startB)); // display the results //print("Result A: %d Result B: %d \n", pow2a, pow2b); // display the results } }
Oh so slow:
Provide an integer: 8675309 Result A: 23 time: 9312 Result B: 23 time: 319840 Provide an integer: 24601 Result A: 14 time: 15648 Result B: 14 time: 327136 Provide an integer: 2 Result A: 1 time: 24800 Result B: 1 time: 55072 Provide an integer: 1 Result A: 0 time: 25504 Result B: 0 time: 16032 Provide an integer: 2000000000 Result A: 30 time: 7392 Result B: 30 time: 326752
Does CNT give an accurate reading here? If so, What a difference!
Would this be faster in PASM? I think the answer is almost certainly yes. If so I can figure it out and use this guide to inject the PASM directly.
Thanks for your help. Any pitfalls that you can see or recommended reading, please?
INLINE__ int32_t BitEncode__(uint32_t a) { int r=0; while (a != 0) { a = a>>1; r++; } return r; }
from:
Which is pretty much the same as mine.
I think bitwise encoding -1 in PASM is likely to be the fastest way.
int pow2c; int num_in; __asm__( /* TODO: bitwise encode num_in subtract 1 store in pow2c */ : /* output out = pow2c, write only */ [out] "=r" (pow2c) : /* input in = num_in */ [in] "r" (num_in) );
Now I just have to figure out how to fill in the TODO.You don't mention the scope of your calculation... that is how many potential populations are you interested in. I found that for 8 bit images... at least for measuring information... it is possible to do the frequency based calculations once and store the results... multiply them out to remove the decimals and then put the results in a file, which is read in at run time. That reduces the problem to a sum over the population... about a 8-12 clocks per calculation before parsing the result into a final float.
I have many uses for this in my float-free math library that I'm trying to cram into a propeller.
One specific use is converting integers to array indexes for my partial predictive factoring method.
In this the highest power of two points to 1 of 32 columns in a 2D array of ints, where the row is selected by the int's hamming weight mod(4), which is also a simple bitwise opertion.
I'm not opposed to a file. I'm looking for the smallest fastest option; could be code, could be a table.
Assuming a 32bit word, with bits numbered 1-32 I think this might work.
Add the operand to ff:ff:00:00. Update C but not Z
If it sets carry, then the first nonzero bit is at location 17 or more so add 16 to your result (do not update c or z)
If c was not set then shift the operand left by 16 places. (And update z)
Add ff:00:00:00 to the already shifted operand and update c but not z
If it sets carry, add 8 to result (don't update c or z)
If not, rotate left by 8
Add f0:00:00:00 and update c but not z
If c, then add 4 to result
If not shift left 4
Add c0:00:00:00 and update c but not z
If c then add two to result
If not, shift left by two
Add 80:00:00:00 and update c but not z
If c then add 1 to result
If not then if z is not set (from the first shift, that's why we don't update it) then add one more.
Unless I've made some error in logic, that should get you the index of the first nonzero bit, or zero if they are all zero and should take something like 3*log2(n) instructions on a propeller. (32 bits is 2^5, so I takes about 5 rounds or 15 instructions to find it.
https://graphics.stanford.edu/~seander/bithacks.html#IntegerLog
Looks like they could be done in PASM pretty neatly. The first one could do with some loop unrolling.
Better than brute force shifting through 32 bits.
@Heater
Thanks a bunch, that's the stuff I was looking for. Now I've got plenty to work on over Christmas :-D
Results to follow
'Find msb of x; answer in y (1 - 32, or zero if no "one" bits). mov y,#0 test x,_ffff0000 wz if_nz add y,#16 test x,_ff00ff00 wz if_nz add y,#8 test x,_f0f0f0f0 wz if_nz add y,#4 test x,_cccccccc wz if_nz add y,#2 test x,_aaaaaaaa wz if_nz add y,#1 test x,x wz if_nz add y,#1 _ffff0000 long $ffff0000 _ff00ff00 long $ff00ff00 _f0f0f0f0 long $f0f0f0f0 _cccccccc long $cccccccc _aaaaaaaa long $aaaaaaaa
-Phil
But don't we have to AND the masks with x rather than TEST?
TEST will leave the x unchanged. Which means any lower bits it has will still be there for later TESTs. Which will then be non-zero when they should not be.
Sadly I'm way out in the Finnish forest far away from any Propellers to test this with.
Thank you very much @Phil
I got it to work inline and it's nice and fast, here are the results:
Provide an integer: 8675309 bsr) Arg: 8675309 Characteristic: 23 time: 6448 log/log) Arg: 8675309 Characteristic: 23 time: 363072 inline PASM) Arg: 8675309 Characteristic: 32 time: 3072 Provide an integer: 2000000000 bsr) Arg: 2000000000 Characteristic: 30 time: 1520 log/log) Arg: 2000000000 Characteristic: 30 time: 375248 inline PASM) Arg: 2000000000 Characteristic: 32 time: 3072 Provide an integer: 0 bsr) Arg: 0 Characteristic: -1 time: 22960 log/log) Arg: 0 Characteristic: -2147483648 time: 18800 inline PASM) Arg: 0 Characteristic: 0 time: 2384 Provide an integer: 1 bsr) Arg: 1 Characteristic: 0 time: 22640 log/log) Arg: 1 Characteristic: 0 time: 26544 inline PASM) Arg: 1 Characteristic: 1 time: 2496 Provide an integer: 36 bsr) Arg: 36 Characteristic: 5 time: 19120 log/log) Arg: 36 Characteristic: 5 time: 368000 inline PASM) Arg: 36 Characteristic: 8 time: 2832
Results are a bit off, though
and here is the code:
#include "simpletools.h" // scan(), print() #include "propeller.h" // CNT int main() { // Entry point int index; // location within the binary integer int pow2; // Characteristic sans mantissa, first test int num_in; // log2 argument unsigned int start; unsigned int end; long _ffff0000 = 0xffff0000; long _ff00ff00 = 0xff00ff00; long _f0f0f0f0 = 0xf0f0f0f0; long _cccccccc = 0xcccccccc; long _aaaaaaaa = 0xaaaaaaaa; while(1) { // Primary loop print("Provide an integer: "); // Prompt user scan("%d\n", &num_in); // Collect input start = CNT; index = 32; // length of int in bits pow2 = -1; // flag value while((pow2 == -1) & (index-- > 0)) { // While no characteristic and bits remain, decrement position if(num_in & (1 << index)) { // if the current bit is on pow2 = index; // record its location as the log2 characteristic break; // and exit } } end = CNT; print("bsr) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results start = CNT; pow2 = (int)(log(num_in)/log(2)); // guess that answers ^ end = CNT; print("log/log) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results start = CNT; __asm__ volatile ( "mov %[y],#0\n\t" "test %[x],%[_ffff0000] wz\n\t" "if_nz add %[y],#16\n\t" "test %[x],%[_ff00ff00] wz\n\t" "if_nz add %[y],#8\n\t" "test %[x],%[_f0f0f0f0] wz\n\t" "if_nz add %[y],#4\n\t" "test %[x],%[_cccccccc] wz\n\t" "if_nz add %[y],#2\n\t" "test %[x],%[_aaaaaaaa] wz\n\t" "if_nz add %[y],#1\n\t" "test %[x],%[x] wz\n\t" "if_nz add %[y],#1\n\t" : /*outputs (+inputs) */ [y] "+r" (pow2) : /*inputs */ [x] "r" (num_in), [_ffff0000] "r" (_ffff0000), [_ff00ff00] "r" (_ff00ff00), [_f0f0f0f0] "r" (_f0f0f0f0), [_cccccccc] "r" (_cccccccc), [_aaaaaaaa] "r" (_aaaaaaaa) ); end = CNT; print("inline PASM) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results } }
In my inexperience I couldn't get "_ffff0000 long $ffff0000" and the rest to work, compiler said unknown command, but I adapted. Does that impact the performance?
Either way it's a lot better than mine :cool:
Thanks again
With a functioning example I think I can move forward.
Could you try changing those "test" instructions to "and".
That will overwrite y so you will need to save that first. Or copy it to a temporary "t" and work on that.
Move forward as in fixing it :thumb:
I'll make a table tomorrow and apply your suggestions, thank you.
I should have interpreted "functioning example" as "compiles and running but malfunctioning example"
Hmm, yes quite
x = $00018000
The correct answer is 16. But that second one bit screws things up, causing an additional 8, 4, 2, 1, and 1 to be added to y. Dang!
Try this instead:
'Find msb of x; answer in y (1 - 32, or zero if no "one" bits). mov y,#32 test x,_ffff0000 wz if_z sub y,#16 test x,_ff00ff00 wz if_z sub y,#8 test x,_f0f0f0f0 wz if_z sub y,#4 test x,_cccccccc wz if_z sub y,#2 test x,_aaaaaaaa wz if_z sub y,#1 test x,x wz if_z sub y,#1 _ffff0000 long $ffff0000 _ff00ff00 long $ff00ff00 _f0f0f0f0 long $f0f0f0f0 _cccccccc long $cccccccc _aaaaaaaa long $aaaaaaaa
This is actually the program I started with. But I thought it would be less tricky-looking to use add and if_nz. I neglected to consider that z and nz are not symmetrical opposites.
-Phil
Provide an integer: 2000000000 bsr) Arg: 2000000000 Characteristic: 30 time: 1616 log/log) Arg: 2000000000 Characteristic: 30 time: 375824 inline PASM) Arg: 2000000000 Characteristic: 32 time: 2416 Provide an integer: 20000000 bsr) Arg: 20000000 Characteristic: 24 time: 5840 log/log) Arg: 20000000 Characteristic: 24 time: 369776 inline PASM) Arg: 20000000 Characteristic: 32 time: 2416 Provide an integer: 1984 bsr) Arg: 1984 Characteristic: 10 time: 15696 log/log) Arg: 1984 Characteristic: 10 time: 367808 inline PASM) Arg: 1984 Characteristic: 16 time: 2544 Provide an integer: 1 bsr) Arg: 1 Characteristic: 0 time: 22736 log/log) Arg: 1 Characteristic: 0 time: 27120 inline PASM) Arg: 1 Characteristic: 1 time: 2992 Provide an integer: 2 bsr) Arg: 2 Characteristic: 1 time: 22032 log/log) Arg: 2 Characteristic: 1 time: 97632 inline PASM) Arg: 2 Characteristic: 2 time: 2880 Provide an integer: 4 bsr) Arg: 4 Characteristic: 2 time: 21328 log/log) Arg: 4 Characteristic: 2 time: 97632 inline PASM) Arg: 4 Characteristic: 3 time: 2880 Provide an integer: 0 bsr) Arg: 0 Characteristic: -1 time: 23280 log/log) Arg: 0 Characteristic: -2147483648 time: 19376 inline PASM) Arg: 0 Characteristic: 0 time: 3104 Provide an integer: 2147483647 bsr) Arg: 2147483647 Characteristic: 30 time: 1616 log/log) Arg: 2147483647 Characteristic: 31 time: 107968 inline PASM) Arg: 2147483647 Characteristic: 32 time: 2416
even found a rounding error in the log/log method :-)__asm__ volatile ( "mov %[y],#32\n\t" "test %[x],%[_ffff0000] wz\n\t" "if_z sub %[y],#16\n\t" "test %[x],%[_ff00ff00] wz\n\t" "if_z sub %[y],#8\n\t" "test %[x],%[_f0f0f0f0] wz\n\t" "if_z sub %[y],#4\n\t" "test %[x],%[_cccccccc] wz\n\t" "if_z sub %[y],#2\n\t" "test %[x],%[_aaaaaaaa] wz\n\t" "if_z sub %[y],#1\n\t" "test %[x],%[x] wz\n\t" "if_z sub %[y],#1\n\t" : /*outputs (+inputs) */ [y] "+r" (pow2) : /*inputs */ [x] "r" (num_in), [_ffff0000] "r" (_ffff0000), [_ff00ff00] "r" (_ff00ff00), [_f0f0f0f0] "r" (_f0f0f0f0), [_cccccccc] "r" (_cccccccc), [_aaaaaaaa] "r" (_aaaaaaaa) );
__attribute__((fcache)) int topOne(unsigned int a) { int r=32; if(a==0) return 0; if(!(a & 0xFFFF0000)) {r -= 16; a <<= 16;} if(!(a & 0xFF000000)) {r -= 8; a <<= 8;} if(!(a & 0xF0000000)) {r -= 4; a <<= 4;} if(!(a & 0xC0000000)) {r -= 2; a <<= 2;} if(!(a & 0x80000000)) r -= 1; return r; }
It may be a bit faster if translated to PASM by hand.Andy
Thanks a bunch, Andy;
This taught me a good bit more about C
I put it in directly, it's fast and accurate:
Provide an integer: 2000000000 bsr) Arg: 2000000000 Characteristic: 30 time: 1888 log/log) Arg: 2000000000 Characteristic: 30 time: 375136 inline PASM) Arg: 2000000000 Characteristic: 32 time: 2416 Ariba) Arg: 2000000000 Characteristic: 30 time: 2528 Provide an integer: 200000000 bsr) Arg: 200000000 Characteristic: 27 time: 4000 log/log) Arg: 200000000 Characteristic: 27 time: 375232 inline PASM) Arg: 200000000 Characteristic: 32 time: 2416 Ariba) Arg: 200000000 Characteristic: 27 time: 2752 Provide an integer: 8192 bsr) Arg: 8192 Characteristic: 13 time: 13856 log/log) Arg: 8192 Characteristic: 12 time: 102608 inline PASM) Arg: 8192 Characteristic: 14 time: 2656 Ariba) Arg: 8192 Characteristic: 13 time: 2880 Provide an integer: 0 bsr) Arg: 0 Characteristic: -1 time: 23552 log/log) Arg: 0 Characteristic: -2147483648 time: 19088 inline PASM) Arg: 0 Characteristic: 0 time: 3104 Ariba) Arg: 0 Characteristic: -1 time: 512 Provide an integer: 1 bsr) Arg: 1 Characteristic: 0 time: 23008 log/log) Arg: 1 Characteristic: 0 time: 26464 inline PASM) Arg: 1 Characteristic: 1 time: 2992 Ariba) Arg: 1 Characteristic: 0 time: 3328 Provide an integer: 2 bsr) Arg: 2 Characteristic: 1 time: 22304 log/log) Arg: 2 Characteristic: 1 time: 96976 inline PASM) Arg: 2 Characteristic: 2 time: 2880 Ariba) Arg: 2 Characteristic: 1 time: 3328 Provide an integer: 4 bsr) Arg: 4 Characteristic: 2 time: 21600 log/log) Arg: 4 Characteristic: 2 time: 96976 inline PASM) Arg: 4 Characteristic: 3 time: 2880 Ariba) Arg: 4 Characteristic: 2 time: 3104 Provide an integer: 8 bsr) Arg: 8 Characteristic: 3 time: 20896 log/log) Arg: 8 Characteristic: 3 time: 98656 inline PASM) Arg: 8 Characteristic: 4 time: 2768 Ariba) Arg: 8 Characteristic: 3 time: 3104
Case 8192 has me curious.
Here's the updated code:
#include "simpletools.h" // scan(), print() #include "propeller.h" // CNT int main() { // Entry point int index; // location within the binary integer int pow2; // Characteristic sans mantissa, first test unsigned int num_in; // log2 argument unsigned int start; unsigned int end; long _ffff0000 = 0xffff0000; long _ff00ff00 = 0xff00ff00; long _f0f0f0f0 = 0xf0f0f0f0; long _cccccccc = 0xcccccccc; long _aaaaaaaa = 0xaaaaaaaa; while(1) { // Primary loop print("Provide an integer: "); // Prompt user scan("%d\n", &num_in); // Collect input start = CNT; index = 32; // length of int in bits pow2 = -1; // flag value while((pow2 == -1) & (index-- > 0)) { // While no characteristic and bits remain, decrement position if(num_in & (1 << index)) { // if the current bit is on pow2 = index; // record its location as the log2 characteristic break; // and exit } } end = CNT; print("bsr) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results start = CNT; pow2 = (int)(log(num_in)/log(2)); end = CNT; print("log/log) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results start = CNT; __asm__ volatile ( "mov %[y],#32\n\t" "test %[x],%[_ffff0000] wz\n\t" "if_z sub %[y],#16\n\t" "test %[x],%[_ff00ff00] wz\n\t" "if_z sub %[y],#8\n\t" "test %[x],%[_f0f0f0f0] wz\n\t" "if_z sub %[y],#4\n\t" "test %[x],%[_cccccccc] wz\n\t" "if_z sub %[y],#2\n\t" "test %[x],%[_aaaaaaaa] wz\n\t" "if_z sub %[y],#1\n\t" "test %[x],%[x] wz\n\t" "if_z sub %[y],#1\n\t" : /*outputs (+inputs) */ [y] "+r" (pow2) : /*inputs */ [x] "r" (num_in), [_ffff0000] "r" (_ffff0000), [_ff00ff00] "r" (_ff00ff00), [_f0f0f0f0] "r" (_f0f0f0f0), [_cccccccc] "r" (_cccccccc), [_aaaaaaaa] "r" (_aaaaaaaa) ); end = CNT; print("inline PASM) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results unsigned int a = num_in; start = CNT; pow2 = 32; if(a == 0) {pow2 = 0;} else { if(!(a & 0xFFFF0000)) {pow2 -= 16; a <<= 16;} if(!(a & 0xFF000000)) {pow2 -= 8; a <<= 8;} if(!(a & 0xF0000000)) {pow2 -= 4; a <<= 4;} if(!(a & 0xC0000000)) {pow2 -= 2; a <<= 2;} if(!(a & 0x80000000)) pow2 -= 1; } pow2--; end = CNT; print("Ariba) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results } }
Challenge acceptedSwitch to LMM will cause the compiler to generate hardware assembly and run much faster but use more memory.
Mike
Provide an integer: 2000000000 bsr) Arg: 2000000000 Characteristic: 30 time: 352 log/log) Arg: 2000000000 Characteristic: 30 time: 71104 inline PASM) Arg: 2000000000 Characteristic: 32 time: 480 Ariba) Arg: 2000000000 Characteristic: 30 time: 512 Provide an integer: 1 bsr) Arg: 1 Characteristic: 0 time: 4144 log/log) Arg: 1 Characteristic: 0 time: 5776 inline PASM) Arg: 1 Characteristic: 1 time: 480 Ariba) Arg: 1 Characteristic: 0 time: 512 Provide an integer: 0 bsr) Arg: 0 Characteristic: -1 time: 4240 log/log) Arg: 0 Characteristic: -2147483648 time: 4640 inline PASM) Arg: 0 Characteristic: 0 time: 480 Ariba) Arg: 0 Characteristic: -1 time: 128 Provide an integer: 8675309 bsr) Arg: 8675309 Characteristic: 23 time: 1264 log/log) Arg: 8675309 Characteristic: 23 time: 69200 inline PASM) Arg: 8675309 Characteristic: 32 time: 480 Ariba) Arg: 8675309 Characteristic: 23 time: 512 Provide an integer: 8192 bsr) Arg: 8192 Characteristic: 13 time: 2512 log/log) Arg: 8192 Characteristic: 12 time: 16496 inline PASM) Arg: 8192 Characteristic: 14 time: 480 Ariba) Arg: 8192 Characteristic: 13 time: 512
Thank you.
Reading more on the subject here. Is there any more reading that you'd recommend? Specifically a sample of C that compiles into the COG memory mode and any theory.
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 VAR long new, value, log OBJ sio : "Parallax Serial Terminal" PUB start sio.start(115200) cognew(@log2, @new) repeat value := sio.hexin 'Enter a number in hexadecimal new~~ repeat while new sio.hex(value, 8) 'Print out original number. sio.char(" ") sio.dec(log) 'Print bit position. sio.char(13) DAT 'Find msb "one" bit of x; answer in y (0 - 31, or -1 if x == 0). log2 mov xaddr,par add xaddr,#4 mov yaddr,par add yaddr,#8 :loop rdlong x,par wz if_z jmp #:loop rdlong x,xaddr wz 'Begin computation (15 instructions)... rev x,#0 'Reverse the bits in x. neg y,x 'Isolate least-significant "one" bit. and x,y mov y,#31 if_z sub y,#1 'Locate that "one" bit, if any. test x,_0000ffff wz if_z sub y,#16 test x,_00ff00ff wz if_z sub y,#8 test x,_0f0f0f0f wz if_z sub y,#4 test x,_33333333 wz if_z sub y,#2 test x,_55555555 wz if_z sub y,#1 '... End computation. wrlong y,yaddr mov x,#0 wrlong x,par jmp #:loop _0000ffff long $0000ffff _00ff00ff long $00ff00ff _0f0f0f0f long $0f0f0f0f _33333333 long $33333333 _55555555 long $55555555 x res 1 y res 1 xaddr res 1 yaddr res 1
-Phil
Excellent, thank you
I look forward to testing it.
I'm also making a binary search model for the same goal just to see what I can see
I'll keep posting code and results
Merry Christmas
-Phil
'Begin computation (15 instructions)... rev x,#0 'Reverse the bits in x. neg y,x 'Isolate least-significant "one" bit. and x,y wz '<======= Add the wz here.
That will isolate the computation from the parameter acquisition, should they become more separated.
-Phil
I made the BST method fully unrolled to begin with and tucked the '0' case way down in and I think the nesting of logic has a major speed impact: (This would need to be condensed to a loop in order to be size and speed effective)
start = CNT; if(num_in < 0x00010000) { // 0-15 if(num_in < 0x00000100) { // 0-7 if(num_in < 0x00000010) { // 0-3 if(num_in < 0x0000004) { // 0-1 if(num_in < 0x00000002) { if(num_in == 0) { pow2 = -1; } else { pow2 = 0; } } else { pow2 = 1; } } else { // 2-3 if(num_in < 0x00000008) { pow2 = 2; } else { pow2 = 3; } } } else { // 4-7 if(num_in < 0x00000040) { // 4-5 if(num_in < 0x00000020) { pow2 = 4; } else { pow2 = 5; } } else { // 6-7 if(num_in < 0x00000080) { pow2 = 6; } else { pow2 = 7; } } } } else { // 8-15 if(num_in < 0x00001000) { // 8-11 if(num_in < 0x00000400) { // 8-9 if(num_in < 0x00000200) { pow2 = 8; } else { pow2 = 9; } } else { // 10-11 if(num_in < 0x00000800) { pow2 = 10; } else { pow2 = 11; } } } else { // 12-15 if(num_in < 0x00004000) { // 12-13 if(num_in < 0x00002000) { pow2 = 12; } else { pow2 = 13; } } else { // 14-15 if(num_in < 0x00008000) { pow2 = 14; } else { pow2 = 15; } } } } } else { // 16-31 if(num_in < 0x01000000) { // 16-23 if(num_in < 0x00100000) { // 16-19 if(num_in < 0x0040000) { // 16-17 if(num_in < 0x00020000) { pow2 = 16; } else { pow2 = 17; } } else { // 18-19 if(num_in < 0x00080000) { pow2 = 18; } else { pow2 = 19; } } } else { // 20-23 if(num_in < 0x00400000) { // 20-21 if(num_in < 0x00200000) { pow2 = 20; } else { pow2 = 21; } } else { // 22-23 if(num_in < 0x00800000) { pow2 = 22; } else { pow2 = 23; } } } } else { // 24-31 if(num_in < 0x10000000) { // 24-27 if(num_in < 0x04000000) { // 24-25 if(num_in < 0x02000000) { pow2 = 24; } else { pow2 = 25; } } else { // 26-27 if(num_in < 0x08000000) { pow2 = 26; } else { pow2 = 27; } } } else { // 28-31 if(num_in < 0x40000000) { // 28-29 if(num_in < 0x20000000) { pow2 = 28; } else { pow2 = 29; } } else { // 30-31 if(num_in < 0x80000000) { pow2 = 30; } else { pow2 = 31; } } } } } end = CNT;
It's the fastest that I had made up to that point. Then I looked at @Ariba code again and removed the nested logic and it sped up by a factor of 10 and turned it into a big theta, I like big theta.
unsigned int a = num_in; start = CNT; pow2 = 31; // if(a == 0) {pow2 = 0;} // else { if(!(a & 0xFFFF0000)) {pow2 -= 16; a <<= 16;} if(!(a & 0xFF000000)) {pow2 -= 8; a <<= 8;} if(!(a & 0xF0000000)) {pow2 -= 4; a <<= 4;} if(!(a & 0xC0000000)) {pow2 -= 2; a <<= 2;} if(!(a & 0x80000000)) pow2 -= 1; if(a == 0) pow2 = -1; // } end = CNT;
So far the modified Ariba is the winner.Provide an integer: 0 bsr) Arg: 0 Characteristic: -1 time: 160 log/log) Arg: 0 Characteristic: -2147483648 time: 4528 Ariba) Arg: 0 Characteristic: -1 time: 48 BST) Arg: 0 Characteristic: -1 time: 368 Provide an integer: 1 bsr) Arg: 1 Characteristic: 0 time: 1264 log/log) Arg: 1 Characteristic: 0 time: 5664 Ariba) Arg: 1 Characteristic: 0 time: 48 BST) Arg: 1 Characteristic: 0 time: 368 Provide an integer: 2 bsr) Arg: 2 Characteristic: 1 time: 1136 log/log) Arg: 2 Characteristic: 1 time: 16080 Ariba) Arg: 2 Characteristic: 1 time: 48 BST) Arg: 2 Characteristic: 1 time: 304 Provide an integer: 4 bsr) Arg: 4 Characteristic: 2 time: 1008 log/log) Arg: 4 Characteristic: 2 time: 16080 Ariba) Arg: 4 Characteristic: 2 time: 48 BST) Arg: 4 Characteristic: 2 time: 320 Provide an integer: 8 bsr) Arg: 8 Characteristic: 3 time: 896 log/log) Arg: 8 Characteristic: 3 time: 16240 Ariba) Arg: 8 Characteristic: 3 time: 48 BST) Arg: 8 Characteristic: 3 time: 320 Provide an integer: bsr) Arg: 8 Characteristic: 3 time: 880 log/log) Arg: 8 Characteristic: 3 time: 16240 Ariba) Arg: 8 Characteristic: 3 time: 48 BST) Arg: 8 Characteristic: 3 time: 320 Provide an integer: 16 bsr) Arg: 16 Characteristic: 4 time: 768 log/log) Arg: 16 Characteristic: 4 time: 16080 Ariba) Arg: 16 Characteristic: 4 time: 48 BST) Arg: 16 Characteristic: 4 time: 320 Provide an integer: 32 bsr) Arg: 32 Characteristic: 5 time: 640 log/log) Arg: 32 Characteristic: 5 time: 16080 Ariba) Arg: 32 Characteristic: 5 time: 48 BST) Arg: 32 Characteristic: 5 time: 320 Provide an integer: 64 bsr) Arg: 64 Characteristic: 6 time: 512 log/log) Arg: 64 Characteristic: 6 time: 16240 Ariba) Arg: 64 Characteristic: 6 time: 48 BST) Arg: 64 Characteristic: 6 time: 320 Provide an integer: 128 bsr) Arg: 128 Characteristic: 7 time: 368 log/log) Arg: 128 Characteristic: 7 time: 16352 Ariba) Arg: 128 Characteristic: 7 time: 48 BST) Arg: 128 Characteristic: 7 time: 320 Provide an integer: 256 bsr) Arg: 256 Characteristic: 8 time: 1248 log/log) Arg: 256 Characteristic: 8 time: 16080 Ariba) Arg: 256 Characteristic: 8 time: 48 BST) Arg: 256 Characteristic: 8 time: 368 Provide an integer: 512 bsr) Arg: 512 Characteristic: 9 time: 1120 log/log) Arg: 512 Characteristic: 9 time: 16080 Ariba) Arg: 512 Characteristic: 9 time: 48 BST) Arg: 512 Characteristic: 9 time: 368 Provide an integer: 1024 bsr) Arg: 1024 Characteristic: 10 time: 992 log/log) Arg: 1024 Characteristic: 10 time: 16080 Ariba) Arg: 1024 Characteristic: 10 time: 48 BST) Arg: 1024 Characteristic: 10 time: 400 Provide an integer: 2000000000 bsr) Arg: 2000000000 Characteristic: 30 time: 528 log/log) Arg: 2000000000 Characteristic: 30 time: 70992 Ariba) Arg: 2000000000 Characteristic: 30 time: 48 BST) Arg: 2000000000 Characteristic: 30 time: 384 Provide an integer: 66000 bsr) Arg: 66000 Characteristic: 16 time: 1264 log/log) Arg: 66000 Characteristic: 16 time: 68848 Ariba) Arg: 66000 Characteristic: 16 time: 48 BST) Arg: 66000 Characteristic: 16 time: 432
My next step is to test the code that you were kind enough to provide. I will share my results-Phil
rev x,#0 'Reverse the bits in x. neg y,x 'Isolate least-significant "one" bit. and x,y wz mov y,#31 if_z sub y,#1 'Locate that "one" bit, if any. test x,_0000ffff wz if_z sub y,#16 test x,_00ff00ff wz if_z sub y,#8 test x,_0f0f0f0f wz if_z sub y,#4 test x,_33333333 wz if_z sub y,#2 test x,_55555555 wz if_z sub y,#1