bsr or (int)log2 in Prop C / PASM help too (Optimization Problem)

mikeologist · 2017-12-20 21:06

I need a bsr type function for integers in Prop C.
In the past I've used (int)log2(arg) because it works quickly on systems with FPUs. I assume this uses the processor's log tables to perform log(arg)/log(2).

Since the propeller has no FPU I am concerned about how long this takes, so I'm trying to test it.
Here is my code so far:

#include "simpletools.h"                      // Include simple tools
//#include <math.h>

int main() {                                  // Entry point
  int index;                                  // location within the binary integer
  int pow2a;                                  // Characteristic sans mantissa, first test
  int pow2b;                                  // Characteristic sans mantissa, second test 
  int num_in;                                 // log2 argument
  unsigned int startA;
  unsigned int endA;
  unsigned int startB;
  unsigned int endB;
   
  while(1) {                                  // Primary loop
    print("Provide an integer: ");            // Prompt user
    scan("%d\n", &num_in);                    // Collect input
    
    //startA = __current_pc();
    index = 32;                               // length of int in bits
    pow2a = -1;                               // flag value
    while((pow2a == -1) & (index-- > 0)) {    // While no characteristic and bits remain, decrement position
      if(num_in & (1 << index)) {             // if the current bit is on
        pow2a = index;                        // record its location as the log2 characteristic
        break;                                // and exit
      }
    }
    //endA = __current_pc();
    
    //startB = __current_pc();
    // pow2b = (int)log2(num_in);              // use log tables? log(num_in)/log(2)? cast to integer
    pow2b = (int)(log(num_in)/log(2));         // guess that answers ^
    //endB = __current_pc();
    
    //print("Result A: %d time: %d Result B: %d time: %d \n", pow2a, (endA-startA), pow2b, (endB-startB)); // display the results
    print("Result A: %d Result B: %d \n", pow2a, pow2b); // display the results
  }      
}

and the results:

Provide an integer: 8675309
Result A: 23 Result B: 23 
Provide an integer: 0
Result A: -1 Result B: -2147483648 
Provide an integer: 1
Result A: 0 Result B: 0 
Provide an integer: 24601
Result A: 14 Result B: 14 
Provide an integer: 2
Result A: 1 Result B: 1 
Provide an integer: 3
Result A: 1 Result B: 1 
Provide an integer: 4
Result A: 2 Result B: 2 
Provide an integer: 90210
Result A: 16 Result B: 16

I get the expected results including a negative value for undefined: log2(0).

There are a few issues:
No log2 in this gcc?
How do I read the PC so I can test the speed in clocks? Is this a valid method of timing?
Is there a better(faster) way than either of these? I read through the propeller manual and I did not find anything on log2 or bsr.

Dave Hein · 2017-12-20 22:14

I looked in the library source code, and there is code for a log2. However, there doesn't seem to be an object file for log2 in the pre-built libraries, so log(x)/log(2) would be the way to do it with the current library. The problem with this is that it uses floating point, so it's going to be slow.

The propeller doesn't have a BSR instruction. Finding the most significant 1 bit has to be done in software. The prop does have a log lookup table, but I think it's only accurate to 12 bits. So if that accuracy is OK you could use the PASM code that is in the Prop manual to implement an integer version of log2.

The propeller system counter is accessible through the CNT variable. Include propeller.h to get the definition for it.

mikeologist · 2017-12-20 23:14

Updated code:

#include "simpletools.h"                      // scan(), print()
#include "propeller.h"                        // CNT

int main() {                                  // Entry point
  int index;                                  // location within the binary integer
  int pow2a;                                  // Characteristic sans mantissa, first test
  int pow2b;                                  // Characteristic sans mantissa, second test 
  int num_in;                                 // log2 argument
  unsigned int startA;
  unsigned int endA;
  unsigned int startB;
  unsigned int endB;
   
  while(1) {                                  // Primary loop
    print("Provide an integer: ");            // Prompt user
    scan("%d\n", &num_in);                    // Collect input
    
    startA = CNT;
    index = 32;                               // length of int in bits
    pow2a = -1;                               // flag value
    while((pow2a == -1) & (index-- > 0)) {    // While no characteristic and bits remain, decrement position
      if(num_in & (1 << index)) {             // if the current bit is on
        pow2a = index;                        // record its location as the log2 characteristic
        break;                                // and exit
      }
    }
    endA = CNT;
    
    startB = CNT;
    // pow2b = (int)log2(num_in);              // use log tables? log(num_in)/log(2)? cast to integer
    pow2b = (int)(log(num_in)/log(2));         // guess that answers ^
    endB = CNT;
    
    print("Result A: %d time: %d Result B: %d time: %d \n", pow2a, (endA-startA), pow2b, (endB-startB)); // display the results
    //print("Result A: %d Result B: %d \n", pow2a, pow2b); // display the results
  }      
}

Dave Hein wrote: »

I looked in the library source code, and there is code for a log2. However, there doesn't seem to be an object file for log2 in the pre-built libraries, so log(x)/log(2) would be the way to do it with the current library. The problem with this is that it uses floating point, so it's going to be slow.

Oh so slow:

Provide an integer: 8675309
Result A: 23 time: 9312 Result B: 23 time: 319840 
Provide an integer: 24601
Result A: 14 time: 15648 Result B: 14 time: 327136 
Provide an integer: 2
Result A: 1 time: 24800 Result B: 1 time: 55072 
Provide an integer: 1
Result A: 0 time: 25504 Result B: 0 time: 16032 
Provide an integer: 2000000000
Result A: 30 time: 7392 Result B: 30 time: 326752

The propeller doesn't have a BSR instruction. Finding the most significant 1 bit has to be done in software. The prop does have a log lookup table, but I think it's only accurate to 12 bits. So if that accuracy is OK you could use the PASM code that is in the Prop manual to implement an integer version of log2.

The propeller system counter is accessible through the CNT variable. Include propeller.h to get the definition for it.

Does CNT give an accurate reading here? If so, What a difference!
Would this be faster in PASM? I think the answer is almost certainly yes. If so I can figure it out and use this guide to inject the PASM directly.

Thanks for your help. Any pitfalls that you can see or recommended reading, please?

mikeologist · 2017-12-21 05:18

I used spin2cpp to generate:

INLINE__ int32_t BitEncode__(uint32_t a) {
  int r=0;
  while (a != 0) { a = a>>1; r++; }
  return r;
}

from:

pow2c := >|num_in

Which is pretty much the same as mine.

I think bitwise encoding -1 in PASM is likely to be the fastest way.

  int pow2c;
  int num_in;

     __asm__(
      /* TODO: 
               bitwise encode num_in
               subtract 1
               store in pow2c */

    : /* output out = pow2c, write only */
      [out]   "=r"  (pow2c)
    : /* input in = num_in              */
      [in]    "r"   (num_in)
    );

Now I just have to figure out how to fill in the TODO.

rjo__ · 2017-12-21 05:29

I never tried this on a P1 but it does work on the P2. My interest was in measuring the information of various image areas.

You don't mention the scope of your calculation... that is how many potential populations are you interested in. I found that for 8 bit images... at least for measuring information... it is possible to do the frequency based calculations once and store the results... multiply them out to remove the decimals and then put the results in a file, which is read in at run time. That reduces the problem to a sum over the population... about a 8-12 clocks per calculation before parsing the result into a final float.

mikeologist · 2017-12-21 05:51

rjo__ wrote: »

I never tried this on a P1 but it does work on the P2. My interest was in measuring the information of various image areas.

You don't mention the scope of your calculation... that is how many potential populations are you interested in. I found that for 8 bit images... at least for measuring information... it is possible to do the frequency based calculations once and store the results... multiply them out to remove the decimals and then put the results in a file, which is read in at run time. That reduces the problem to a sum over the population... about a 8-12 clocks per calculation before parsing the result into a final float.

I have many uses for this in my float-free math library that I'm trying to cram into a propeller.
One specific use is converting integers to array indexes for my partial predictive factoring method.
In this the highest power of two points to 1 of 32 columns in a 2D array of ints, where the row is selected by the int's hamming weight mod(4), which is also a simple bitwise opertion.

I'm not opposed to a file. I'm looking for the smallest fastest option; could be code, could be a table.

MIchael_Michalski · 2017-12-21 09:42

The brute force way is to just shift left until you find the leading 1 bit. But right away it's obvious how to half the work.

Assuming a 32bit word, with bits numbered 1-32 I think this might work.

Add the operand to ff:ff:00:00. Update C but not Z
If it sets carry, then the first nonzero bit is at location 17 or more so add 16 to your result (do not update c or z)

If c was not set then shift the operand left by 16 places. (And update z)

Add ff:00:00:00 to the already shifted operand and update c but not z
If it sets carry, add 8 to result (don't update c or z)
If not, rotate left by 8

Add f0:00:00:00 and update c but not z
If c, then add 4 to result
If not shift left 4

Add c0:00:00:00 and update c but not z
If c then add two to result
If not, shift left by two

Add 80:00:00:00 and update c but not z
If c then add 1 to result
If not then if z is not set (from the first shift, that's why we don't update it) then add one more.

Unless I've made some error in logic, that should get you the index of the first nonzero bit, or zero if they are all zero and should take something like 3*log2(n) instructions on a propeller. (32 bits is 2^5, so I takes about 5 rounds or 15 instructions to find it.

Heater. · 2017-12-21 10:39

How about one of these algorithms to find the log base 2 of an N-bit integer in O(lg(N)) operations:

https://graphics.stanford.edu/~seander/bithacks.html#IntegerLog

Looks like they could be done in PASM pretty neatly. The first one could do with some loop unrolling.

Better than brute force shifting through 32 bits.

mikeologist · 2017-12-21 13:41

@MIchael_Michalski
@Heater

Thanks a bunch, that's the stuff I was looking for. Now I've got plenty to work on over Christmas :-D
Results to follow

Phil Pilgrim (PhiPi) · 2017-12-21 20:39

Here's a PASM routine (untested) for finding the most significant "one" bit of a long that executes in 13 instructions:

'Find msb of x; answer in y (1 - 32, or zero if no "one" bits).

              mov       y,#0
              test      x,_ffff0000 wz
        if_nz add       y,#16
              test      x,_ff00ff00 wz
        if_nz add       y,#8
              test      x,_f0f0f0f0 wz
        if_nz add       y,#4
              test      x,_cccccccc wz
        if_nz add       y,#2
              test      x,_aaaaaaaa wz
        if_nz add       y,#1
              test      x,x wz
        if_nz add       y,#1

_ffff0000     long      $ffff0000
_ff00ff00     long      $ff00ff00
_f0f0f0f0     long      $f0f0f0f0
_cccccccc     long      $cccccccc
_aaaaaaaa     long      $aaaaaaaa

-Phil

Heater. · 2017-12-21 20:58

Excellent, Phil, I just started day dreaming about such a binary search technique in PASM after looking at the C versions I posted above. You beat me to it.

But don't we have to AND the masks with x rather than TEST?

TEST will leave the x unchanged. Which means any lower bits it has will still be there for later TESTs. Which will then be non-zero when they should not be.

Sadly I'm way out in the Finnish forest far away from any Propellers to test this with.

mikeologist · 2017-12-21 23:14

Phil Pilgrim (PhiPi) wrote: »

Here's a PASM routine (untested) for finding the most significant "one" bit of a long that executes in 13 instructions:

'Find msb of x; answer in y (1 - 32, or zero if no "one" bits).

              mov       y,#0
              test      x,_ffff0000 wz
        if_nz add       y,#16
              test      x,_ff00ff00 wz
        if_nz add       y,#8
              test      x,_f0f0f0f0 wz
        if_nz add       y,#4
              test      x,_cccccccc wz
        if_nz add       y,#2
              test      x,_aaaaaaaa wz
        if_nz add       y,#1
              test      x,x wz
        if_nz add       y,#1

_ffff0000     long      $ffff0000
_ff00ff00     long      $ff00ff00
_f0f0f0f0     long      $f0f0f0f0
_cccccccc     long      $cccccccc
_aaaaaaaa     long      $aaaaaaaa

-Phil

Thank you very much @Phil

I got it to work inline and it's nice and fast, here are the results:

Provide an integer: 8675309
bsr)         Arg: 8675309 Characteristic: 23 time: 6448
log/log)     Arg: 8675309 Characteristic: 23 time: 363072
inline PASM) Arg: 8675309 Characteristic: 32 time: 3072
Provide an integer: 2000000000
bsr)         Arg: 2000000000 Characteristic: 30 time: 1520
log/log)     Arg: 2000000000 Characteristic: 30 time: 375248
inline PASM) Arg: 2000000000 Characteristic: 32 time: 3072
Provide an integer: 0
bsr)         Arg: 0 Characteristic: -1 time: 22960
log/log)     Arg: 0 Characteristic: -2147483648 time: 18800
inline PASM) Arg: 0 Characteristic: 0 time: 2384
Provide an integer: 1
bsr)         Arg: 1 Characteristic: 0 time: 22640
log/log)     Arg: 1 Characteristic: 0 time: 26544
inline PASM) Arg: 1 Characteristic: 1 time: 2496
Provide an integer: 36
bsr)         Arg: 36 Characteristic: 5 time: 19120
log/log)     Arg: 36 Characteristic: 5 time: 368000
inline PASM) Arg: 36 Characteristic: 8 time: 2832

Results are a bit off, though

and here is the code:

#include "simpletools.h"                      // scan(), print()
#include "propeller.h"                        // CNT

int main() {                                  // Entry point
  int index;                                  // location within the binary integer
  int pow2;                                   // Characteristic sans mantissa, first test
  int num_in;                                 // log2 argument
  unsigned int start;
  unsigned int end;
  long _ffff0000 = 0xffff0000;
  long _ff00ff00 = 0xff00ff00;
  long _f0f0f0f0 = 0xf0f0f0f0;
  long _cccccccc = 0xcccccccc;
  long _aaaaaaaa = 0xaaaaaaaa;
   
  while(1) {                                  // Primary loop
    print("Provide an integer: ");            // Prompt user
    scan("%d\n", &num_in);                    // Collect input
    
    start = CNT;
    index = 32;                               // length of int in bits
    pow2 = -1;                               // flag value
    while((pow2 == -1) & (index-- > 0)) {    // While no characteristic and bits remain, decrement position
      if(num_in & (1 << index)) {             // if the current bit is on
        pow2 = index;                        // record its location as the log2 characteristic
        break;                                // and exit
      }
    }
    end = CNT;
    
    print("bsr)         Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
    
    start = CNT;
    pow2 = (int)(log(num_in)/log(2));         // guess that answers ^
    end = CNT; 
    
    print("log/log)     Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
   
    start = CNT;
    __asm__ volatile (
              "mov       %[y],#0\n\t"
              "test      %[x],%[_ffff0000] wz\n\t"
        "if_nz add       %[y],#16\n\t"
              "test      %[x],%[_ff00ff00] wz\n\t"
        "if_nz add       %[y],#8\n\t"
              "test      %[x],%[_f0f0f0f0] wz\n\t"
        "if_nz add       %[y],#4\n\t"
              "test      %[x],%[_cccccccc] wz\n\t"
        "if_nz add       %[y],#2\n\t"
              "test      %[x],%[_aaaaaaaa] wz\n\t"
        "if_nz add       %[y],#1\n\t"
              "test      %[x],%[x] wz\n\t"
        "if_nz add       %[y],#1\n\t"

      
    : /*outputs (+inputs) */
      [y]         "+r" (pow2)
    : /*inputs */
      [x]         "r"  (num_in),
      [_ffff0000] "r"  (_ffff0000),
      [_ff00ff00] "r"  (_ff00ff00),
      [_f0f0f0f0] "r"  (_f0f0f0f0),
      [_cccccccc] "r"  (_cccccccc),
      [_aaaaaaaa] "r"  (_aaaaaaaa)
      
    );
    end = CNT;
  
    print("inline PASM) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
  }      
}

In my inexperience I couldn't get "_ffff0000 long $ffff0000" and the rest to work, compiler said unknown command, but I adapted. Does that impact the performance?

Either way it's a lot better than mine :cool:
Thanks again

mikeologist · 2017-12-21 23:28

I'll have to play around with it
With a functioning example I think I can move forward.

Heater. · 2017-12-22 04:49

Don't move forward. It's not working yet. As you say, results are a bit off.

Could you try changing those "test" instructions to "and".

That will overwrite y so you will need to save that first. Or copy it to a temporary "t" and work on that.

mikeologist · 2017-12-22 06:03

Heater. wrote: »

Don't move forward. It's not working yet. As you say, results are a bit off.

Could you try changing those "test" instructions to "and".

That will overwrite y so you will need to save that first. Or copy it to a temporary "t" and work on that.

Move forward as in fixing it :thumb:
I'll make a table tomorrow and apply your suggestions, thank you.

Heater. · 2017-12-22 06:06

Ah yes, OK.

I should have interpreted "functioning example" as "compiles and running but malfunctioning example"

mikeologist · 2017-12-22 06:08

Heater. wrote: »

Ah yes, OK.

I should have interpreted "functioning example" as "compiles and running but malfunctioning example"

Hmm, yes quite

Phil Pilgrim (PhiPi) · 2017-12-22 06:29

'Sorry for the red herring, guys. Here's a quick counter-example where my algo fails:

x = $00018000

The correct answer is 16. But that second one bit screws things up, causing an additional 8, 4, 2, 1, and 1 to be added to y. Dang!

Try this instead:

'Find msb of x; answer in y (1 - 32, or zero if no "one" bits).

              mov       y,#32
              test      x,_ffff0000 wz
        if_z  sub       y,#16
              test      x,_ff00ff00 wz
        if_z  sub       y,#8
              test      x,_f0f0f0f0 wz
        if_z  sub       y,#4
              test      x,_cccccccc wz
        if_z  sub       y,#2
              test      x,_aaaaaaaa wz
        if_z  sub       y,#1
              test      x,x wz
        if_z  sub       y,#1

_ffff0000     long      $ffff0000
_ff00ff00     long      $ff00ff00
_f0f0f0f0     long      $f0f0f0f0
_cccccccc     long      $cccccccc
_aaaaaaaa     long      $aaaaaaaa

This is actually the program I started with. But I thought it would be less tricky-looking to use add and if_nz. I neglected to consider that z and nz are not symmetrical opposites.

-Phil

mikeologist · 2017-12-22 07:11

I'm still having some issues:

Provide an integer: 2000000000
bsr)         Arg: 2000000000 Characteristic: 30 time: 1616
log/log)     Arg: 2000000000 Characteristic: 30 time: 375824
inline PASM) Arg: 2000000000 Characteristic: 32 time: 2416
Provide an integer: 20000000
bsr)         Arg: 20000000 Characteristic: 24 time: 5840
log/log)     Arg: 20000000 Characteristic: 24 time: 369776
inline PASM) Arg: 20000000 Characteristic: 32 time: 2416
Provide an integer: 1984
bsr)         Arg: 1984 Characteristic: 10 time: 15696
log/log)     Arg: 1984 Characteristic: 10 time: 367808
inline PASM) Arg: 1984 Characteristic: 16 time: 2544
Provide an integer: 1
bsr)         Arg: 1 Characteristic: 0 time: 22736
log/log)     Arg: 1 Characteristic: 0 time: 27120
inline PASM) Arg: 1 Characteristic: 1 time: 2992
Provide an integer: 2
bsr)         Arg: 2 Characteristic: 1 time: 22032
log/log)     Arg: 2 Characteristic: 1 time: 97632
inline PASM) Arg: 2 Characteristic: 2 time: 2880
Provide an integer: 4
bsr)         Arg: 4 Characteristic: 2 time: 21328
log/log)     Arg: 4 Characteristic: 2 time: 97632
inline PASM) Arg: 4 Characteristic: 3 time: 2880
Provide an integer: 0
bsr)         Arg: 0 Characteristic: -1 time: 23280
log/log)     Arg: 0 Characteristic: -2147483648 time: 19376
inline PASM) Arg: 0 Characteristic: 0 time: 3104
Provide an integer: 2147483647
bsr)         Arg: 2147483647 Characteristic: 30 time: 1616
log/log)     Arg: 2147483647 Characteristic: 31 time: 107968
inline PASM) Arg: 2147483647 Characteristic: 32 time: 2416

even found a rounding error in the log/log method :-)

    __asm__ volatile (
              "mov       %[y],#32\n\t"
              "test      %[x],%[_ffff0000] wz\n\t"
        "if_z  sub       %[y],#16\n\t"
              "test      %[x],%[_ff00ff00] wz\n\t"
        "if_z  sub       %[y],#8\n\t"
              "test      %[x],%[_f0f0f0f0] wz\n\t"
        "if_z  sub       %[y],#4\n\t"
              "test      %[x],%[_cccccccc] wz\n\t"
        "if_z  sub       %[y],#2\n\t"
              "test      %[x],%[_aaaaaaaa] wz\n\t"
        "if_z  sub       %[y],#1\n\t"
              "test      %[x],%[x] wz\n\t"
        "if_z  sub       %[y],#1\n\t"
      
    : /*outputs (+inputs) */
      [y]         "+r" (pow2)
    : /*inputs */
      [x]         "r"  (num_in),
      [_ffff0000] "r"  (_ffff0000),
      [_ff00ff00] "r"  (_ff00ff00),
      [_f0f0f0f0] "r"  (_f0f0f0f0),
      [_cccccccc] "r"  (_cccccccc),
      [_aaaaaaaa] "r"  (_aaaaaaaa)
      
    );

mikeologist · 2017-12-22 07:12

thanks for the continued help :-D

Ariba · 2017-12-22 09:17

This should be quite fast, it compiles to native Assembly hold in the cogram cache:

__attribute__((fcache))
int topOne(unsigned int a)
{
  int r=32;
  if(a==0) return 0;
  if(!(a & 0xFFFF0000)) {r -= 16; a <<= 16;}
  if(!(a & 0xFF000000)) {r -= 8; a <<= 8;}
  if(!(a & 0xF0000000)) {r -= 4; a <<= 4;}
  if(!(a & 0xC0000000)) {r -= 2; a <<= 2;}
  if(!(a & 0x80000000)) r -= 1;
  return r;
}

It may be a bit faster if translated to PASM by hand.

Andy

mikeologist · 2017-12-22 09:56

Ariba wrote: »
This should be quite fast, it compiles to native Assembly hold in the cogram cache:
__attribute__((fcache))
int topOne(unsigned int a)
{
  int r=32;
  if(a==0) return 0;
  if(!(a & 0xFFFF0000)) {r -= 16; a <<= 16;}
  if(!(a & 0xFF000000)) {r -= 8; a <<= 8;}
  if(!(a & 0xF0000000)) {r -= 4; a <<= 4;}
  if(!(a & 0xC0000000)) {r -= 2; a <<= 2;}
  if(!(a & 0x80000000)) r -= 1;
  return r;
}
It may be a bit faster if translated to PASM by hand.

Andy

Thanks a bunch, Andy;
This taught me a good bit more about C

I put it in directly, it's fast and accurate:

Provide an integer: 2000000000
bsr)         Arg: 2000000000 Characteristic: 30 time: 1888
log/log)     Arg: 2000000000 Characteristic: 30 time: 375136
inline PASM) Arg: 2000000000 Characteristic: 32 time: 2416
Ariba)       Arg: 2000000000 Characteristic: 30 time: 2528

Provide an integer: 200000000
bsr)         Arg: 200000000 Characteristic: 27 time: 4000
log/log)     Arg: 200000000 Characteristic: 27 time: 375232
inline PASM) Arg: 200000000 Characteristic: 32 time: 2416
Ariba)       Arg: 200000000 Characteristic: 27 time: 2752

Provide an integer: 8192
bsr)         Arg: 8192 Characteristic: 13 time: 13856
log/log)     Arg: 8192 Characteristic: 12 time: 102608
inline PASM) Arg: 8192 Characteristic: 14 time: 2656
Ariba)       Arg: 8192 Characteristic: 13 time: 2880

Provide an integer: 0
bsr)         Arg: 0 Characteristic: -1 time: 23552
log/log)     Arg: 0 Characteristic: -2147483648 time: 19088
inline PASM) Arg: 0 Characteristic: 0 time: 3104
Ariba)       Arg: 0 Characteristic: -1 time: 512

Provide an integer: 1
bsr)         Arg: 1 Characteristic: 0 time: 23008
log/log)     Arg: 1 Characteristic: 0 time: 26464
inline PASM) Arg: 1 Characteristic: 1 time: 2992
Ariba)       Arg: 1 Characteristic: 0 time: 3328

Provide an integer: 2
bsr)         Arg: 2 Characteristic: 1 time: 22304
log/log)     Arg: 2 Characteristic: 1 time: 96976
inline PASM) Arg: 2 Characteristic: 2 time: 2880
Ariba)       Arg: 2 Characteristic: 1 time: 3328

Provide an integer: 4
bsr)         Arg: 4 Characteristic: 2 time: 21600
log/log)     Arg: 4 Characteristic: 2 time: 96976
inline PASM) Arg: 4 Characteristic: 3 time: 2880
Ariba)       Arg: 4 Characteristic: 2 time: 3104

Provide an integer: 8
bsr)         Arg: 8 Characteristic: 3 time: 20896
log/log)     Arg: 8 Characteristic: 3 time: 98656
inline PASM) Arg: 8 Characteristic: 4 time: 2768
Ariba)       Arg: 8 Characteristic: 3 time: 3104

Case 8192 has me curious.

Here's the updated code:

#include "simpletools.h"                      // scan(), print()
#include "propeller.h"                        // CNT

int main() {                                  // Entry point
  int index;                                  // location within the binary integer
  int pow2;                                   // Characteristic sans mantissa, first test
  unsigned int num_in;                 // log2 argument
  unsigned int start;
  unsigned int end;
  long _ffff0000 = 0xffff0000;
  long _ff00ff00 = 0xff00ff00;
  long _f0f0f0f0 = 0xf0f0f0f0;
  long _cccccccc = 0xcccccccc;
  long _aaaaaaaa = 0xaaaaaaaa;
   
  while(1) {                                  // Primary loop
    print("Provide an integer: ");            // Prompt user
    scan("%d\n", &num_in);                    // Collect input
    
    start = CNT;
    index = 32;                               // length of int in bits
    pow2 = -1;                                // flag value
    while((pow2 == -1) & (index-- > 0)) {     // While no characteristic and bits remain, decrement position
      if(num_in & (1 << index)) {             // if the current bit is on
        pow2 = index;                         // record its location as the log2 characteristic
        break;                                // and exit
      }
    }
    end = CNT;
    
    print("bsr)         Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
    
    start = CNT;
    pow2 = (int)(log(num_in)/log(2));
    end = CNT; 
    
    print("log/log)     Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
   
    start = CNT;
    __asm__ volatile (
              "mov       %[y],#32\n\t"
              "test      %[x],%[_ffff0000] wz\n\t"
        "if_z  sub       %[y],#16\n\t"
              "test      %[x],%[_ff00ff00] wz\n\t"
        "if_z  sub       %[y],#8\n\t"
              "test      %[x],%[_f0f0f0f0] wz\n\t"
        "if_z  sub       %[y],#4\n\t"
              "test      %[x],%[_cccccccc] wz\n\t"
        "if_z  sub       %[y],#2\n\t"
              "test      %[x],%[_aaaaaaaa] wz\n\t"
        "if_z  sub       %[y],#1\n\t"
              "test      %[x],%[x] wz\n\t"
        "if_z  sub       %[y],#1\n\t"
      
    : /*outputs (+inputs) */
      [y]         "+r" (pow2)
    : /*inputs */
      [x]         "r"  (num_in),
      [_ffff0000] "r"  (_ffff0000),
      [_ff00ff00] "r"  (_ff00ff00),
      [_f0f0f0f0] "r"  (_f0f0f0f0),
      [_cccccccc] "r"  (_cccccccc),
      [_aaaaaaaa] "r"  (_aaaaaaaa)
      
    );
    end = CNT;
  
    print("inline PASM) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
    
    unsigned int a = num_in;
    start = CNT;
    pow2 = 32;
    if(a == 0) {pow2 = 0;} 
    else {
      if(!(a & 0xFFFF0000)) {pow2 -= 16; a <<= 16;}
      if(!(a & 0xFF000000)) {pow2 -= 8; a <<= 8;}
      if(!(a & 0xF0000000)) {pow2 -= 4; a <<= 4;}
      if(!(a & 0xC0000000)) {pow2 -= 2; a <<= 2;}
      if(!(a & 0x80000000)) pow2 -= 1;
    }
    pow2--;
    end = CNT;
    
    print("Ariba)       Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results      
  }         
}

It may be a bit faster if translated to PASM by hand.

Challenge accepted

iseries · 2017-12-22 17:32

Please note that if the compiler is set to CMM it will generate interpretive assembly.

Switch to LMM will cause the compiler to generate hardware assembly and run much faster but use more memory.

Mike

mikeologist · 2017-12-22 18:24

Woah

Provide an integer: 2000000000
bsr)         Arg: 2000000000 Characteristic: 30 time: 352
log/log)     Arg: 2000000000 Characteristic: 30 time: 71104
inline PASM) Arg: 2000000000 Characteristic: 32 time: 480
Ariba)       Arg: 2000000000 Characteristic: 30 time: 512

Provide an integer: 1
bsr)         Arg: 1 Characteristic: 0 time: 4144
log/log)     Arg: 1 Characteristic: 0 time: 5776
inline PASM) Arg: 1 Characteristic: 1 time: 480
Ariba)       Arg: 1 Characteristic: 0 time: 512

Provide an integer: 0
bsr)         Arg: 0 Characteristic: -1 time: 4240
log/log)     Arg: 0 Characteristic: -2147483648 time: 4640
inline PASM) Arg: 0 Characteristic: 0 time: 480
Ariba)       Arg: 0 Characteristic: -1 time: 128

Provide an integer: 8675309
bsr)         Arg: 8675309 Characteristic: 23 time: 1264
log/log)     Arg: 8675309 Characteristic: 23 time: 69200
inline PASM) Arg: 8675309 Characteristic: 32 time: 480
Ariba)       Arg: 8675309 Characteristic: 23 time: 512

Provide an integer: 8192
bsr)         Arg: 8192 Characteristic: 13 time: 2512
log/log)     Arg: 8192 Characteristic: 12 time: 16496
inline PASM) Arg: 8192 Characteristic: 14 time: 480
Ariba)       Arg: 8192 Characteristic: 13 time: 512

Thank you.

Reading more on the subject here. Is there any more reading that you'd recommend? Specifically a sample of C that compiles into the COG memory mode and any theory.

Phil Pilgrim (PhiPi) · 2017-12-25 20:48

Third time's a charm. (I've actually tested this one.) The key is to reverse the bits in x, since isolating the former most-significant is quicker when it's the least-significant one. Then it's a matter of using the parallel method of locating it. Here's the code (derived, in part, from the site that Heater cited above):

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

VAR

  long new, value, log

OBJ

  sio   : "Parallax Serial Terminal"

PUB start

  sio.start(115200)
  cognew(@log2, @new)
  repeat
    value := sio.hexin   'Enter a number in hexadecimal
    new~~
    repeat while new
    sio.hex(value, 8)    'Print out original number.
    sio.char(" ")
    sio.dec(log)         'Print bit position.
    sio.char(13)

DAT

'Find msb "one" bit of x; answer in y (0 - 31, or -1 if x == 0).

log2          mov       xaddr,par
              add       xaddr,#4
              mov       yaddr,par
              add       yaddr,#8
              
:loop         rdlong    x,par wz
        if_z  jmp       #:loop

              rdlong    x,xaddr wz
              
'Begin computation (15 instructions)...

              rev       x,#0                    'Reverse the bits in x.
              neg       y,x                     'Isolate least-significant "one" bit.
              and       x,y
              mov       y,#31
        if_z  sub       y,#1                    'Locate that "one" bit, if any.
              test      x,_0000ffff wz
        if_z  sub       y,#16
              test      x,_00ff00ff wz
        if_z  sub       y,#8
              test      x,_0f0f0f0f wz
        if_z  sub       y,#4
              test      x,_33333333 wz
        if_z  sub       y,#2
              test      x,_55555555 wz
        if_z  sub       y,#1
        
'... End computation.
 
              wrlong    y,yaddr
              mov       x,#0
              wrlong    x,par
              jmp       #:loop

_0000ffff     long      $0000ffff
_00ff00ff     long      $00ff00ff
_0f0f0f0f     long      $0f0f0f0f
_33333333     long      $33333333
_55555555     long      $55555555

x             res       1
y             res       1
xaddr         res       1
yaddr         res       1

-Phil

mikeologist · 2017-12-25 21:31

Phil Pilgrim (PhiPi) wrote: »

Third time's a charm. (I've actually tested this one.) The key is to reverse the bits in x, since isolating the former most-significant is quicker when it's the least-significant one. Then it's a matter of using the parallel method of locating it. Here's the code (derived, in part, from the site that Heater cited above):

~

-Phil

Excellent, thank you
I look forward to testing it.
I'm also making a binary search model for the same goal just to see what I can see

I'll keep posting code and results

Merry Christmas

Phil Pilgrim (PhiPi) · 2017-12-25 21:56

mikeologist wrote:

... I'm also making a binary search model for the same goal just to see what I can see ...

That should work. PASM's sumxx operators will come in handy for that.

-Phil

Phil Pilgrim (PhiPi) · 2017-12-25 22:06

One minor change: I relied upon writing the zero flag in the rdlong x,xaddr instruction, but I didn't have to. Do this instead:

'Begin computation (15 instructions)...

              rev       x,#0                    'Reverse the bits in x.
              neg       y,x                     'Isolate least-significant "one" bit.
              and       x,y wz '<======= Add the wz here.

That will isolate the computation from the parameter acquisition, should they become more separated.

-Phil

mikeologist · 2017-12-28 00:19

Phil Pilgrim (PhiPi) wrote: »

mikeologist wrote:

... I'm also making a binary search model for the same goal just to see what I can see ...

That should work. PASM's sumxx operators will come in handy for that.

-Phil

I made the BST method fully unrolled to begin with and tucked the '0' case way down in and I think the nesting of logic has a major speed impact: (This would need to be condensed to a loop in order to be size and speed effective)

start = CNT;
    if(num_in < 0x00010000) {             // 0-15
      if(num_in < 0x00000100) {           // 0-7
        if(num_in < 0x00000010) {         // 0-3
          if(num_in < 0x0000004) {        // 0-1
            if(num_in < 0x00000002) { 
                if(num_in == 0) { pow2 = -1; }
                else { pow2 = 0; } 
              }                  
              else { pow2 = 1; }                                
          } else {                        // 2-3
            if(num_in < 0x00000008) { pow2 = 2; }
              else { pow2 = 3; }                                
          }                            
        } else {                          // 4-7
          if(num_in < 0x00000040) {       // 4-5
            if(num_in < 0x00000020) { pow2 = 4; } 
              else { pow2 = 5; }                                
          } else {                        // 6-7
            if(num_in < 0x00000080) { pow2 = 6; } 
              else { pow2 = 7; }                                
          }       
        }                        
      } else {                            // 8-15
        if(num_in < 0x00001000) {         // 8-11
          if(num_in < 0x00000400) {       // 8-9
            if(num_in < 0x00000200) { pow2 = 8; } 
              else { pow2 = 9; }                                
          } else {                        // 10-11
            if(num_in < 0x00000800) { pow2 = 10; } 
              else { pow2 = 11; }
          }                            
        } else {                          // 12-15
          if(num_in < 0x00004000) {       // 12-13
            if(num_in < 0x00002000) { pow2 = 12; } 
              else { pow2 = 13; }                                
          } else {                        // 14-15
            if(num_in < 0x00008000) { pow2 = 14; } 
              else { pow2 = 15; }
          } 
        }                        
      }                    
    } else {                              // 16-31
      if(num_in < 0x01000000) {           // 16-23
        if(num_in < 0x00100000) {         // 16-19
          if(num_in < 0x0040000) {        // 16-17
            if(num_in < 0x00020000) { pow2 = 16; } 
              else { pow2 = 17; }                                
          } else {                        // 18-19
            if(num_in < 0x00080000) { pow2 = 18; } 
              else { pow2 = 19; }                                
          }                            
        } else {                          // 20-23
          if(num_in < 0x00400000) {       // 20-21
            if(num_in < 0x00200000) { pow2 = 20; } 
              else { pow2 = 21; }                                
          } else {                        // 22-23
            if(num_in < 0x00800000) { pow2 = 22; } 
              else { pow2 = 23; }                                
          }       
        }                        
      } else {                            // 24-31
        if(num_in < 0x10000000) {         // 24-27
          if(num_in < 0x04000000) {       // 24-25
            if(num_in < 0x02000000) { pow2 = 24; } 
              else { pow2 = 25; }                                
          } else {                        // 26-27
            if(num_in < 0x08000000) { pow2 = 26; } 
              else { pow2 = 27; }
          }                            
        } else {                          // 28-31
          if(num_in < 0x40000000) {       // 28-29
            if(num_in < 0x20000000) { pow2 = 28; } 
              else { pow2 = 29; }                                
          } else {                        // 30-31
            if(num_in < 0x80000000) { pow2 = 30; } 
              else { pow2 = 31; }
          } 
        }                        
      }
    }                           
    end = CNT;

It's the fastest that I had made up to that point. Then I looked at @Ariba code again and removed the nested logic and it sped up by a factor of 10 and turned it into a big theta, I like big theta.

unsigned int a = num_in;
    start = CNT;
    pow2 = 31;
//    if(a == 0) {pow2 = 0;} 
//    else {
      if(!(a & 0xFFFF0000)) {pow2 -= 16; a <<= 16;}
      if(!(a & 0xFF000000)) {pow2 -= 8; a <<= 8;}
      if(!(a & 0xF0000000)) {pow2 -= 4; a <<= 4;}
      if(!(a & 0xC0000000)) {pow2 -= 2; a <<= 2;}
      if(!(a & 0x80000000)) pow2 -= 1;
      if(a == 0) pow2 = -1;
//    }
    end = CNT;

So far the modified Ariba is the winner.

Provide an integer: 0
bsr)         Arg: 0 Characteristic: -1 time: 160
log/log)     Arg: 0 Characteristic: -2147483648 time: 4528
Ariba)       Arg: 0 Characteristic: -1 time: 48
BST)         Arg: 0 Characteristic: -1 time: 368
Provide an integer: 1
bsr)         Arg: 1 Characteristic: 0 time: 1264
log/log)     Arg: 1 Characteristic: 0 time: 5664
Ariba)       Arg: 1 Characteristic: 0 time: 48
BST)         Arg: 1 Characteristic: 0 time: 368
Provide an integer: 2
bsr)         Arg: 2 Characteristic: 1 time: 1136
log/log)     Arg: 2 Characteristic: 1 time: 16080
Ariba)       Arg: 2 Characteristic: 1 time: 48
BST)         Arg: 2 Characteristic: 1 time: 304
Provide an integer: 4
bsr)         Arg: 4 Characteristic: 2 time: 1008
log/log)     Arg: 4 Characteristic: 2 time: 16080
Ariba)       Arg: 4 Characteristic: 2 time: 48
BST)         Arg: 4 Characteristic: 2 time: 320
Provide an integer: 8
bsr)         Arg: 8 Characteristic: 3 time: 896
log/log)     Arg: 8 Characteristic: 3 time: 16240
Ariba)       Arg: 8 Characteristic: 3 time: 48
BST)         Arg: 8 Characteristic: 3 time: 320
Provide an integer: 
bsr)         Arg: 8 Characteristic: 3 time: 880
log/log)     Arg: 8 Characteristic: 3 time: 16240
Ariba)       Arg: 8 Characteristic: 3 time: 48
BST)         Arg: 8 Characteristic: 3 time: 320
Provide an integer: 16
bsr)         Arg: 16 Characteristic: 4 time: 768
log/log)     Arg: 16 Characteristic: 4 time: 16080
Ariba)       Arg: 16 Characteristic: 4 time: 48
BST)         Arg: 16 Characteristic: 4 time: 320
Provide an integer: 32
bsr)         Arg: 32 Characteristic: 5 time: 640
log/log)     Arg: 32 Characteristic: 5 time: 16080
Ariba)       Arg: 32 Characteristic: 5 time: 48
BST)         Arg: 32 Characteristic: 5 time: 320
Provide an integer: 64
bsr)         Arg: 64 Characteristic: 6 time: 512
log/log)     Arg: 64 Characteristic: 6 time: 16240
Ariba)       Arg: 64 Characteristic: 6 time: 48
BST)         Arg: 64 Characteristic: 6 time: 320
Provide an integer: 128
bsr)         Arg: 128 Characteristic: 7 time: 368
log/log)     Arg: 128 Characteristic: 7 time: 16352
Ariba)       Arg: 128 Characteristic: 7 time: 48
BST)         Arg: 128 Characteristic: 7 time: 320
Provide an integer: 256
bsr)         Arg: 256 Characteristic: 8 time: 1248
log/log)     Arg: 256 Characteristic: 8 time: 16080
Ariba)       Arg: 256 Characteristic: 8 time: 48
BST)         Arg: 256 Characteristic: 8 time: 368
Provide an integer: 512
bsr)         Arg: 512 Characteristic: 9 time: 1120
log/log)     Arg: 512 Characteristic: 9 time: 16080
Ariba)       Arg: 512 Characteristic: 9 time: 48
BST)         Arg: 512 Characteristic: 9 time: 368
Provide an integer: 1024
bsr)         Arg: 1024 Characteristic: 10 time: 992
log/log)     Arg: 1024 Characteristic: 10 time: 16080
Ariba)       Arg: 1024 Characteristic: 10 time: 48
BST)         Arg: 1024 Characteristic: 10 time: 400
Provide an integer: 2000000000
bsr)         Arg: 2000000000 Characteristic: 30 time: 528
log/log)     Arg: 2000000000 Characteristic: 30 time: 70992
Ariba)       Arg: 2000000000 Characteristic: 30 time: 48
BST)         Arg: 2000000000 Characteristic: 30 time: 384
Provide an integer: 66000
bsr)         Arg: 66000 Characteristic: 16 time: 1264
log/log)     Arg: 66000 Characteristic: 16 time: 68848
Ariba)       Arg: 66000 Characteristic: 16 time: 48
BST)         Arg: 66000 Characteristic: 16 time: 432

My next step is to test the code that you were kind enough to provide. I will share my results

Phil Pilgrim (PhiPi) · 2017-12-28 00:26

If you convert my PASM to C, I'm not sure how you will be able to express the rev operator.

-Phil

mikeologist · 2017-12-28 00:31

Phil Pilgrim (PhiPi) wrote: »

If you convert my PASM to C, I'm not sure how you will be able to express the rev operator.

-Phil

Since the PASM is so nice I was just planning on putting this portion inline:

              rev       x,#0                    'Reverse the bits in x.
              neg       y,x                     'Isolate least-significant "one" bit.
              and       x,y wz
              mov       y,#31
        if_z  sub       y,#1                    'Locate that "one" bit, if any.
              test      x,_0000ffff wz
        if_z  sub       y,#16
              test      x,_00ff00ff wz
        if_z  sub       y,#8
              test      x,_0f0f0f0f wz
        if_z  sub       y,#4
              test      x,_33333333 wz
        if_z  sub       y,#2
              test      x,_55555555 wz
        if_z  sub       y,#1

bsr or (int)log2 in Prop C / PASM help too (Optimization Problem)

Comments