Shop OBEX P1 Docs P2 Docs Learn Events
bsr or (int)log2 in Prop C / PASM help too (Optimization Problem) — Parallax Forums

bsr or (int)log2 in Prop C / PASM help too (Optimization Problem)

mikeologistmikeologist Posts: 337
edited 2017-12-22 10:45 in Propeller 1
I need a bsr type function for integers in Prop C.
In the past I've used (int)log2(arg) because it works quickly on systems with FPUs. I assume this uses the processor's log tables to perform log(arg)/log(2).

Since the propeller has no FPU I am concerned about how long this takes, so I'm trying to test it.
Here is my code so far:
#include "simpletools.h"                      // Include simple tools
//#include <math.h>

int main() {                                  // Entry point
  int index;                                  // location within the binary integer
  int pow2a;                                  // Characteristic sans mantissa, first test
  int pow2b;                                  // Characteristic sans mantissa, second test 
  int num_in;                                 // log2 argument
  unsigned int startA;
  unsigned int endA;
  unsigned int startB;
  unsigned int endB;
   
  while(1) {                                  // Primary loop
    print("Provide an integer: ");            // Prompt user
    scan("%d\n", &num_in);                    // Collect input
    
    //startA = __current_pc();
    index = 32;                               // length of int in bits
    pow2a = -1;                               // flag value
    while((pow2a == -1) & (index-- > 0)) {    // While no characteristic and bits remain, decrement position
      if(num_in & (1 << index)) {             // if the current bit is on
        pow2a = index;                        // record its location as the log2 characteristic
        break;                                // and exit
      }
    }
    //endA = __current_pc();
    
    //startB = __current_pc();
    // pow2b = (int)log2(num_in);              // use log tables? log(num_in)/log(2)? cast to integer
    pow2b = (int)(log(num_in)/log(2));         // guess that answers ^
    //endB = __current_pc();
    
    //print("Result A: %d time: %d Result B: %d time: %d \n", pow2a, (endA-startA), pow2b, (endB-startB)); // display the results
    print("Result A: %d Result B: %d \n", pow2a, pow2b); // display the results
  }      
}

and the results:
Provide an integer: 8675309
Result A: 23 Result B: 23 
Provide an integer: 0
Result A: -1 Result B: -2147483648 
Provide an integer: 1
Result A: 0 Result B: 0 
Provide an integer: 24601
Result A: 14 Result B: 14 
Provide an integer: 2
Result A: 1 Result B: 1 
Provide an integer: 3
Result A: 1 Result B: 1 
Provide an integer: 4
Result A: 2 Result B: 2 
Provide an integer: 90210
Result A: 16 Result B: 16

I get the expected results including a negative value for undefined: log2(0).

There are a few issues:
No log2 in this gcc?
How do I read the PC so I can test the speed in clocks? Is this a valid method of timing?
Is there a better(faster) way than either of these? I read through the propeller manual and I did not find anything on log2 or bsr.
«1

Comments

  • I looked in the library source code, and there is code for a log2. However, there doesn't seem to be an object file for log2 in the pre-built libraries, so log(x)/log(2) would be the way to do it with the current library. The problem with this is that it uses floating point, so it's going to be slow.

    The propeller doesn't have a BSR instruction. Finding the most significant 1 bit has to be done in software. The prop does have a log lookup table, but I think it's only accurate to 12 bits. So if that accuracy is OK you could use the PASM code that is in the Prop manual to implement an integer version of log2.

    The propeller system counter is accessible through the CNT variable. Include propeller.h to get the definition for it.
  • Updated code:
    #include "simpletools.h"                      // scan(), print()
    #include "propeller.h"                        // CNT
    
    int main() {                                  // Entry point
      int index;                                  // location within the binary integer
      int pow2a;                                  // Characteristic sans mantissa, first test
      int pow2b;                                  // Characteristic sans mantissa, second test 
      int num_in;                                 // log2 argument
      unsigned int startA;
      unsigned int endA;
      unsigned int startB;
      unsigned int endB;
       
      while(1) {                                  // Primary loop
        print("Provide an integer: ");            // Prompt user
        scan("%d\n", &num_in);                    // Collect input
        
        startA = CNT;
        index = 32;                               // length of int in bits
        pow2a = -1;                               // flag value
        while((pow2a == -1) & (index-- > 0)) {    // While no characteristic and bits remain, decrement position
          if(num_in & (1 << index)) {             // if the current bit is on
            pow2a = index;                        // record its location as the log2 characteristic
            break;                                // and exit
          }
        }
        endA = CNT;
        
        startB = CNT;
        // pow2b = (int)log2(num_in);              // use log tables? log(num_in)/log(2)? cast to integer
        pow2b = (int)(log(num_in)/log(2));         // guess that answers ^
        endB = CNT;
        
        print("Result A: %d time: %d Result B: %d time: %d \n", pow2a, (endA-startA), pow2b, (endB-startB)); // display the results
        //print("Result A: %d Result B: %d \n", pow2a, pow2b); // display the results
      }      
    }
    
    Dave Hein wrote: »
    I looked in the library source code, and there is code for a log2. However, there doesn't seem to be an object file for log2 in the pre-built libraries, so log(x)/log(2) would be the way to do it with the current library. The problem with this is that it uses floating point, so it's going to be slow.

    Oh so slow:
    Provide an integer: 8675309
    Result A: 23 time: 9312 Result B: 23 time: 319840 
    Provide an integer: 24601
    Result A: 14 time: 15648 Result B: 14 time: 327136 
    Provide an integer: 2
    Result A: 1 time: 24800 Result B: 1 time: 55072 
    Provide an integer: 1
    Result A: 0 time: 25504 Result B: 0 time: 16032 
    Provide an integer: 2000000000
    Result A: 30 time: 7392 Result B: 30 time: 326752
    
    The propeller doesn't have a BSR instruction. Finding the most significant 1 bit has to be done in software. The prop does have a log lookup table, but I think it's only accurate to 12 bits. So if that accuracy is OK you could use the PASM code that is in the Prop manual to implement an integer version of log2.

    The propeller system counter is accessible through the CNT variable. Include propeller.h to get the definition for it.

    Does CNT give an accurate reading here? If so, What a difference!
    Would this be faster in PASM? I think the answer is almost certainly yes. If so I can figure it out and use this guide to inject the PASM directly.

    Thanks for your help. Any pitfalls that you can see or recommended reading, please?
  • I used spin2cpp to generate:
    INLINE__ int32_t BitEncode__(uint32_t a) {
      int r=0;
      while (a != 0) { a = a>>1; r++; }
      return r;
    }
    

    from:
    pow2c := >|num_in
    

    Which is pretty much the same as mine.

    I think bitwise encoding -1 in PASM is likely to be the fastest way.
      int pow2c;
      int num_in;
    
         __asm__(
          /* TODO: 
                   bitwise encode num_in
                   subtract 1
                   store in pow2c */
    
        : /* output out = pow2c, write only */
          [out]   "=r"  (pow2c)
        : /* input in = num_in              */
          [in]    "r"   (num_in)
        );
    
    Now I just have to figure out how to fill in the TODO.
  • rjo__rjo__ Posts: 2,114
    I never tried this on a P1 but it does work on the P2. My interest was in measuring the information of various image areas.

    You don't mention the scope of your calculation... that is how many potential populations are you interested in. I found that for 8 bit images... at least for measuring information... it is possible to do the frequency based calculations once and store the results... multiply them out to remove the decimals and then put the results in a file, which is read in at run time. That reduces the problem to a sum over the population... about a 8-12 clocks per calculation before parsing the result into a final float.



  • rjo__ wrote: »
    I never tried this on a P1 but it does work on the P2. My interest was in measuring the information of various image areas.

    You don't mention the scope of your calculation... that is how many potential populations are you interested in. I found that for 8 bit images... at least for measuring information... it is possible to do the frequency based calculations once and store the results... multiply them out to remove the decimals and then put the results in a file, which is read in at run time. That reduces the problem to a sum over the population... about a 8-12 clocks per calculation before parsing the result into a final float.

    I have many uses for this in my float-free math library that I'm trying to cram into a propeller.
    One specific use is converting integers to array indexes for my partial predictive factoring method.
    In this the highest power of two points to 1 of 32 columns in a 2D array of ints, where the row is selected by the int's hamming weight mod(4), which is also a simple bitwise opertion.

    I'm not opposed to a file. I'm looking for the smallest fastest option; could be code, could be a table.
  • The brute force way is to just shift left until you find the leading 1 bit. But right away it's obvious how to half the work.

    Assuming a 32bit word, with bits numbered 1-32 I think this might work.

    Add the operand to ff:ff:00:00. Update C but not Z
    If it sets carry, then the first nonzero bit is at location 17 or more so add 16 to your result (do not update c or z)

    If c was not set then shift the operand left by 16 places. (And update z)

    Add ff:00:00:00 to the already shifted operand and update c but not z
    If it sets carry, add 8 to result (don't update c or z)
    If not, rotate left by 8

    Add f0:00:00:00 and update c but not z
    If c, then add 4 to result
    If not shift left 4

    Add c0:00:00:00 and update c but not z
    If c then add two to result
    If not, shift left by two

    Add 80:00:00:00 and update c but not z
    If c then add 1 to result
    If not then if z is not set (from the first shift, that's why we don't update it) then add one more.

    Unless I've made some error in logic, that should get you the index of the first nonzero bit, or zero if they are all zero and should take something like 3*log2(n) instructions on a propeller. (32 bits is 2^5, so I takes about 5 rounds or 15 instructions to find it.
  • Heater.Heater. Posts: 21,230
    How about one of these algorithms to find the log base 2 of an N-bit integer in O(lg(N)) operations:

    https://graphics.stanford.edu/~seander/bithacks.html#IntegerLog

    Looks like they could be done in PASM pretty neatly. The first one could do with some loop unrolling.

    Better than brute force shifting through 32 bits.
  • mikeologistmikeologist Posts: 337
    edited 2017-12-21 13:41
    @MIchael_Michalski
    @Heater

    Thanks a bunch, that's the stuff I was looking for. Now I've got plenty to work on over Christmas :-D
    Results to follow
  • Here's a PASM routine (untested) for finding the most significant "one" bit of a long that executes in 13 instructions:
    'Find msb of x; answer in y (1 - 32, or zero if no "one" bits).
    
                  mov       y,#0
                  test      x,_ffff0000 wz
            if_nz add       y,#16
                  test      x,_ff00ff00 wz
            if_nz add       y,#8
                  test      x,_f0f0f0f0 wz
            if_nz add       y,#4
                  test      x,_cccccccc wz
            if_nz add       y,#2
                  test      x,_aaaaaaaa wz
            if_nz add       y,#1
                  test      x,x wz
            if_nz add       y,#1
    
    _ffff0000     long      $ffff0000
    _ff00ff00     long      $ff00ff00
    _f0f0f0f0     long      $f0f0f0f0
    _cccccccc     long      $cccccccc
    _aaaaaaaa     long      $aaaaaaaa
    

    -Phil
  • Heater.Heater. Posts: 21,230
    edited 2017-12-21 21:10
    Excellent, Phil, I just started day dreaming about such a binary search technique in PASM after looking at the C versions I posted above. You beat me to it.

    But don't we have to AND the masks with x rather than TEST?

    TEST will leave the x unchanged. Which means any lower bits it has will still be there for later TESTs. Which will then be non-zero when they should not be.

    Sadly I'm way out in the Finnish forest far away from any Propellers to test this with.

  • mikeologistmikeologist Posts: 337
    edited 2017-12-21 23:22
    Here's a PASM routine (untested) for finding the most significant "one" bit of a long that executes in 13 instructions:
    'Find msb of x; answer in y (1 - 32, or zero if no "one" bits).
    
                  mov       y,#0
                  test      x,_ffff0000 wz
            if_nz add       y,#16
                  test      x,_ff00ff00 wz
            if_nz add       y,#8
                  test      x,_f0f0f0f0 wz
            if_nz add       y,#4
                  test      x,_cccccccc wz
            if_nz add       y,#2
                  test      x,_aaaaaaaa wz
            if_nz add       y,#1
                  test      x,x wz
            if_nz add       y,#1
    
    _ffff0000     long      $ffff0000
    _ff00ff00     long      $ff00ff00
    _f0f0f0f0     long      $f0f0f0f0
    _cccccccc     long      $cccccccc
    _aaaaaaaa     long      $aaaaaaaa
    

    -Phil

    Thank you very much @Phil

    I got it to work inline and it's nice and fast, here are the results:
    Provide an integer: 8675309
    bsr)         Arg: 8675309 Characteristic: 23 time: 6448
    log/log)     Arg: 8675309 Characteristic: 23 time: 363072
    inline PASM) Arg: 8675309 Characteristic: 32 time: 3072
    Provide an integer: 2000000000
    bsr)         Arg: 2000000000 Characteristic: 30 time: 1520
    log/log)     Arg: 2000000000 Characteristic: 30 time: 375248
    inline PASM) Arg: 2000000000 Characteristic: 32 time: 3072
    Provide an integer: 0
    bsr)         Arg: 0 Characteristic: -1 time: 22960
    log/log)     Arg: 0 Characteristic: -2147483648 time: 18800
    inline PASM) Arg: 0 Characteristic: 0 time: 2384
    Provide an integer: 1
    bsr)         Arg: 1 Characteristic: 0 time: 22640
    log/log)     Arg: 1 Characteristic: 0 time: 26544
    inline PASM) Arg: 1 Characteristic: 1 time: 2496
    Provide an integer: 36
    bsr)         Arg: 36 Characteristic: 5 time: 19120
    log/log)     Arg: 36 Characteristic: 5 time: 368000
    inline PASM) Arg: 36 Characteristic: 8 time: 2832
    

    Results are a bit off, though

    and here is the code:
    #include "simpletools.h"                      // scan(), print()
    #include "propeller.h"                        // CNT
    
    int main() {                                  // Entry point
      int index;                                  // location within the binary integer
      int pow2;                                   // Characteristic sans mantissa, first test
      int num_in;                                 // log2 argument
      unsigned int start;
      unsigned int end;
      long _ffff0000 = 0xffff0000;
      long _ff00ff00 = 0xff00ff00;
      long _f0f0f0f0 = 0xf0f0f0f0;
      long _cccccccc = 0xcccccccc;
      long _aaaaaaaa = 0xaaaaaaaa;
       
      while(1) {                                  // Primary loop
        print("Provide an integer: ");            // Prompt user
        scan("%d\n", &num_in);                    // Collect input
        
        start = CNT;
        index = 32;                               // length of int in bits
        pow2 = -1;                               // flag value
        while((pow2 == -1) & (index-- > 0)) {    // While no characteristic and bits remain, decrement position
          if(num_in & (1 << index)) {             // if the current bit is on
            pow2 = index;                        // record its location as the log2 characteristic
            break;                                // and exit
          }
        }
        end = CNT;
        
        print("bsr)         Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
        
        start = CNT;
        pow2 = (int)(log(num_in)/log(2));         // guess that answers ^
        end = CNT; 
        
        print("log/log)     Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
       
        start = CNT;
        __asm__ volatile (
                  "mov       %[y],#0\n\t"
                  "test      %[x],%[_ffff0000] wz\n\t"
            "if_nz add       %[y],#16\n\t"
                  "test      %[x],%[_ff00ff00] wz\n\t"
            "if_nz add       %[y],#8\n\t"
                  "test      %[x],%[_f0f0f0f0] wz\n\t"
            "if_nz add       %[y],#4\n\t"
                  "test      %[x],%[_cccccccc] wz\n\t"
            "if_nz add       %[y],#2\n\t"
                  "test      %[x],%[_aaaaaaaa] wz\n\t"
            "if_nz add       %[y],#1\n\t"
                  "test      %[x],%[x] wz\n\t"
            "if_nz add       %[y],#1\n\t"
    
          
        : /*outputs (+inputs) */
          [y]         "+r" (pow2)
        : /*inputs */
          [x]         "r"  (num_in),
          [_ffff0000] "r"  (_ffff0000),
          [_ff00ff00] "r"  (_ff00ff00),
          [_f0f0f0f0] "r"  (_f0f0f0f0),
          [_cccccccc] "r"  (_cccccccc),
          [_aaaaaaaa] "r"  (_aaaaaaaa)
          
        );
        end = CNT;
      
        print("inline PASM) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
      }      
    }
    

    In my inexperience I couldn't get "_ffff0000 long $ffff0000" and the rest to work, compiler said unknown command, but I adapted. Does that impact the performance?

    Either way it's a lot better than mine :cool:
    Thanks again
  • I'll have to play around with it
    With a functioning example I think I can move forward.
  • Heater.Heater. Posts: 21,230
    Don't move forward. It's not working yet. As you say, results are a bit off.

    Could you try changing those "test" instructions to "and".

    That will overwrite y so you will need to save that first. Or copy it to a temporary "t" and work on that.
  • Heater. wrote: »
    Don't move forward. It's not working yet. As you say, results are a bit off.

    Could you try changing those "test" instructions to "and".

    That will overwrite y so you will need to save that first. Or copy it to a temporary "t" and work on that.

    Move forward as in fixing it :thumb:
    I'll make a table tomorrow and apply your suggestions, thank you.
  • Heater.Heater. Posts: 21,230
    Ah yes, OK.

    I should have interpreted "functioning example" as "compiles and running but malfunctioning example" :)

  • Heater. wrote: »
    Ah yes, OK.

    I should have interpreted "functioning example" as "compiles and running but malfunctioning example" :)

    Hmm, yes quite :D
  • 'Sorry for the red herring, guys. Here's a quick counter-example where my algo fails:

    x = $00018000

    The correct answer is 16. But that second one bit screws things up, causing an additional 8, 4, 2, 1, and 1 to be added to y. Dang!

    Try this instead:
    'Find msb of x; answer in y (1 - 32, or zero if no "one" bits).
    
                  mov       y,#32
                  test      x,_ffff0000 wz
            if_z  sub       y,#16
                  test      x,_ff00ff00 wz
            if_z  sub       y,#8
                  test      x,_f0f0f0f0 wz
            if_z  sub       y,#4
                  test      x,_cccccccc wz
            if_z  sub       y,#2
                  test      x,_aaaaaaaa wz
            if_z  sub       y,#1
                  test      x,x wz
            if_z  sub       y,#1
    
    _ffff0000     long      $ffff0000
    _ff00ff00     long      $ff00ff00
    _f0f0f0f0     long      $f0f0f0f0
    _cccccccc     long      $cccccccc
    _aaaaaaaa     long      $aaaaaaaa
    

    This is actually the program I started with. But I thought it would be less tricky-looking to use add and if_nz. I neglected to consider that z and nz are not symmetrical opposites.

    -Phil
  • I'm still having some issues:
    Provide an integer: 2000000000
    bsr)         Arg: 2000000000 Characteristic: 30 time: 1616
    log/log)     Arg: 2000000000 Characteristic: 30 time: 375824
    inline PASM) Arg: 2000000000 Characteristic: 32 time: 2416
    Provide an integer: 20000000
    bsr)         Arg: 20000000 Characteristic: 24 time: 5840
    log/log)     Arg: 20000000 Characteristic: 24 time: 369776
    inline PASM) Arg: 20000000 Characteristic: 32 time: 2416
    Provide an integer: 1984
    bsr)         Arg: 1984 Characteristic: 10 time: 15696
    log/log)     Arg: 1984 Characteristic: 10 time: 367808
    inline PASM) Arg: 1984 Characteristic: 16 time: 2544
    Provide an integer: 1
    bsr)         Arg: 1 Characteristic: 0 time: 22736
    log/log)     Arg: 1 Characteristic: 0 time: 27120
    inline PASM) Arg: 1 Characteristic: 1 time: 2992
    Provide an integer: 2
    bsr)         Arg: 2 Characteristic: 1 time: 22032
    log/log)     Arg: 2 Characteristic: 1 time: 97632
    inline PASM) Arg: 2 Characteristic: 2 time: 2880
    Provide an integer: 4
    bsr)         Arg: 4 Characteristic: 2 time: 21328
    log/log)     Arg: 4 Characteristic: 2 time: 97632
    inline PASM) Arg: 4 Characteristic: 3 time: 2880
    Provide an integer: 0
    bsr)         Arg: 0 Characteristic: -1 time: 23280
    log/log)     Arg: 0 Characteristic: -2147483648 time: 19376
    inline PASM) Arg: 0 Characteristic: 0 time: 3104
    Provide an integer: 2147483647
    bsr)         Arg: 2147483647 Characteristic: 30 time: 1616
    log/log)     Arg: 2147483647 Characteristic: 31 time: 107968
    inline PASM) Arg: 2147483647 Characteristic: 32 time: 2416
    
    even found a rounding error in the log/log method :-)
        __asm__ volatile (
                  "mov       %[y],#32\n\t"
                  "test      %[x],%[_ffff0000] wz\n\t"
            "if_z  sub       %[y],#16\n\t"
                  "test      %[x],%[_ff00ff00] wz\n\t"
            "if_z  sub       %[y],#8\n\t"
                  "test      %[x],%[_f0f0f0f0] wz\n\t"
            "if_z  sub       %[y],#4\n\t"
                  "test      %[x],%[_cccccccc] wz\n\t"
            "if_z  sub       %[y],#2\n\t"
                  "test      %[x],%[_aaaaaaaa] wz\n\t"
            "if_z  sub       %[y],#1\n\t"
                  "test      %[x],%[x] wz\n\t"
            "if_z  sub       %[y],#1\n\t"
          
        : /*outputs (+inputs) */
          [y]         "+r" (pow2)
        : /*inputs */
          [x]         "r"  (num_in),
          [_ffff0000] "r"  (_ffff0000),
          [_ff00ff00] "r"  (_ff00ff00),
          [_f0f0f0f0] "r"  (_f0f0f0f0),
          [_cccccccc] "r"  (_cccccccc),
          [_aaaaaaaa] "r"  (_aaaaaaaa)
          
        );
    
  • thanks for the continued help :-D
  • AribaAriba Posts: 2,690
    This should be quite fast, it compiles to native Assembly hold in the cogram cache:
    __attribute__((fcache))
    int topOne(unsigned int a)
    {
      int r=32;
      if(a==0) return 0;
      if(!(a & 0xFFFF0000)) {r -= 16; a <<= 16;}
      if(!(a & 0xFF000000)) {r -= 8; a <<= 8;}
      if(!(a & 0xF0000000)) {r -= 4; a <<= 4;}
      if(!(a & 0xC0000000)) {r -= 2; a <<= 2;}
      if(!(a & 0x80000000)) r -= 1;
      return r;
    }
    
    It may be a bit faster if translated to PASM by hand.

    Andy
  • mikeologistmikeologist Posts: 337
    edited 2017-12-22 10:02
    Ariba wrote: »
    This should be quite fast, it compiles to native Assembly hold in the cogram cache:
    __attribute__((fcache))
    int topOne(unsigned int a)
    {
      int r=32;
      if(a==0) return 0;
      if(!(a & 0xFFFF0000)) {r -= 16; a <<= 16;}
      if(!(a & 0xFF000000)) {r -= 8; a <<= 8;}
      if(!(a & 0xF0000000)) {r -= 4; a <<= 4;}
      if(!(a & 0xC0000000)) {r -= 2; a <<= 2;}
      if(!(a & 0x80000000)) r -= 1;
      return r;
    }
    
    It may be a bit faster if translated to PASM by hand.

    Andy

    Thanks a bunch, Andy;
    This taught me a good bit more about C :smile:

    I put it in directly, it's fast and accurate:
    Provide an integer: 2000000000
    bsr)         Arg: 2000000000 Characteristic: 30 time: 1888
    log/log)     Arg: 2000000000 Characteristic: 30 time: 375136
    inline PASM) Arg: 2000000000 Characteristic: 32 time: 2416
    Ariba)       Arg: 2000000000 Characteristic: 30 time: 2528
    
    Provide an integer: 200000000
    bsr)         Arg: 200000000 Characteristic: 27 time: 4000
    log/log)     Arg: 200000000 Characteristic: 27 time: 375232
    inline PASM) Arg: 200000000 Characteristic: 32 time: 2416
    Ariba)       Arg: 200000000 Characteristic: 27 time: 2752
    
    Provide an integer: 8192
    bsr)         Arg: 8192 Characteristic: 13 time: 13856
    log/log)     Arg: 8192 Characteristic: 12 time: 102608
    inline PASM) Arg: 8192 Characteristic: 14 time: 2656
    Ariba)       Arg: 8192 Characteristic: 13 time: 2880
    
    Provide an integer: 0
    bsr)         Arg: 0 Characteristic: -1 time: 23552
    log/log)     Arg: 0 Characteristic: -2147483648 time: 19088
    inline PASM) Arg: 0 Characteristic: 0 time: 3104
    Ariba)       Arg: 0 Characteristic: -1 time: 512
    
    Provide an integer: 1
    bsr)         Arg: 1 Characteristic: 0 time: 23008
    log/log)     Arg: 1 Characteristic: 0 time: 26464
    inline PASM) Arg: 1 Characteristic: 1 time: 2992
    Ariba)       Arg: 1 Characteristic: 0 time: 3328
    
    Provide an integer: 2
    bsr)         Arg: 2 Characteristic: 1 time: 22304
    log/log)     Arg: 2 Characteristic: 1 time: 96976
    inline PASM) Arg: 2 Characteristic: 2 time: 2880
    Ariba)       Arg: 2 Characteristic: 1 time: 3328
    
    Provide an integer: 4
    bsr)         Arg: 4 Characteristic: 2 time: 21600
    log/log)     Arg: 4 Characteristic: 2 time: 96976
    inline PASM) Arg: 4 Characteristic: 3 time: 2880
    Ariba)       Arg: 4 Characteristic: 2 time: 3104
    
    Provide an integer: 8
    bsr)         Arg: 8 Characteristic: 3 time: 20896
    log/log)     Arg: 8 Characteristic: 3 time: 98656
    inline PASM) Arg: 8 Characteristic: 4 time: 2768
    Ariba)       Arg: 8 Characteristic: 3 time: 3104
    

    Case 8192 has me curious.

    Here's the updated code:
    #include "simpletools.h"                      // scan(), print()
    #include "propeller.h"                        // CNT
    
    int main() {                                  // Entry point
      int index;                                  // location within the binary integer
      int pow2;                                   // Characteristic sans mantissa, first test
      unsigned int num_in;                 // log2 argument
      unsigned int start;
      unsigned int end;
      long _ffff0000 = 0xffff0000;
      long _ff00ff00 = 0xff00ff00;
      long _f0f0f0f0 = 0xf0f0f0f0;
      long _cccccccc = 0xcccccccc;
      long _aaaaaaaa = 0xaaaaaaaa;
       
      while(1) {                                  // Primary loop
        print("Provide an integer: ");            // Prompt user
        scan("%d\n", &num_in);                    // Collect input
        
        start = CNT;
        index = 32;                               // length of int in bits
        pow2 = -1;                                // flag value
        while((pow2 == -1) & (index-- > 0)) {     // While no characteristic and bits remain, decrement position
          if(num_in & (1 << index)) {             // if the current bit is on
            pow2 = index;                         // record its location as the log2 characteristic
            break;                                // and exit
          }
        }
        end = CNT;
        
        print("bsr)         Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
        
        start = CNT;
        pow2 = (int)(log(num_in)/log(2));
        end = CNT; 
        
        print("log/log)     Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
       
        start = CNT;
        __asm__ volatile (
                  "mov       %[y],#32\n\t"
                  "test      %[x],%[_ffff0000] wz\n\t"
            "if_z  sub       %[y],#16\n\t"
                  "test      %[x],%[_ff00ff00] wz\n\t"
            "if_z  sub       %[y],#8\n\t"
                  "test      %[x],%[_f0f0f0f0] wz\n\t"
            "if_z  sub       %[y],#4\n\t"
                  "test      %[x],%[_cccccccc] wz\n\t"
            "if_z  sub       %[y],#2\n\t"
                  "test      %[x],%[_aaaaaaaa] wz\n\t"
            "if_z  sub       %[y],#1\n\t"
                  "test      %[x],%[x] wz\n\t"
            "if_z  sub       %[y],#1\n\t"
          
        : /*outputs (+inputs) */
          [y]         "+r" (pow2)
        : /*inputs */
          [x]         "r"  (num_in),
          [_ffff0000] "r"  (_ffff0000),
          [_ff00ff00] "r"  (_ff00ff00),
          [_f0f0f0f0] "r"  (_f0f0f0f0),
          [_cccccccc] "r"  (_cccccccc),
          [_aaaaaaaa] "r"  (_aaaaaaaa)
          
        );
        end = CNT;
      
        print("inline PASM) Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results
        
        unsigned int a = num_in;
        start = CNT;
        pow2 = 32;
        if(a == 0) {pow2 = 0;} 
        else {
          if(!(a & 0xFFFF0000)) {pow2 -= 16; a <<= 16;}
          if(!(a & 0xFF000000)) {pow2 -= 8; a <<= 8;}
          if(!(a & 0xF0000000)) {pow2 -= 4; a <<= 4;}
          if(!(a & 0xC0000000)) {pow2 -= 2; a <<= 2;}
          if(!(a & 0x80000000)) pow2 -= 1;
        }
        pow2--;
        end = CNT;
        
        print("Ariba)       Arg: %d Characteristic: %d time: %d\n", num_in, pow2, (end-start)); // display the results      
      }         
    }
    
    It may be a bit faster if translated to PASM by hand.
    Challenge accepted
  • Please note that if the compiler is set to CMM it will generate interpretive assembly.

    Switch to LMM will cause the compiler to generate hardware assembly and run much faster but use more memory.

    Mike

  • Woah
    Provide an integer: 2000000000
    bsr)         Arg: 2000000000 Characteristic: 30 time: 352
    log/log)     Arg: 2000000000 Characteristic: 30 time: 71104
    inline PASM) Arg: 2000000000 Characteristic: 32 time: 480
    Ariba)       Arg: 2000000000 Characteristic: 30 time: 512
    
    Provide an integer: 1
    bsr)         Arg: 1 Characteristic: 0 time: 4144
    log/log)     Arg: 1 Characteristic: 0 time: 5776
    inline PASM) Arg: 1 Characteristic: 1 time: 480
    Ariba)       Arg: 1 Characteristic: 0 time: 512
    
    Provide an integer: 0
    bsr)         Arg: 0 Characteristic: -1 time: 4240
    log/log)     Arg: 0 Characteristic: -2147483648 time: 4640
    inline PASM) Arg: 0 Characteristic: 0 time: 480
    Ariba)       Arg: 0 Characteristic: -1 time: 128
    
    Provide an integer: 8675309
    bsr)         Arg: 8675309 Characteristic: 23 time: 1264
    log/log)     Arg: 8675309 Characteristic: 23 time: 69200
    inline PASM) Arg: 8675309 Characteristic: 32 time: 480
    Ariba)       Arg: 8675309 Characteristic: 23 time: 512
    
    Provide an integer: 8192
    bsr)         Arg: 8192 Characteristic: 13 time: 2512
    log/log)     Arg: 8192 Characteristic: 12 time: 16496
    inline PASM) Arg: 8192 Characteristic: 14 time: 480
    Ariba)       Arg: 8192 Characteristic: 13 time: 512
    

    Thank you.

    Reading more on the subject here. Is there any more reading that you'd recommend? Specifically a sample of C that compiles into the COG memory mode and any theory.
  • Third time's a charm. (I've actually tested this one.) The key is to reverse the bits in x, since isolating the former most-significant is quicker when it's the least-significant one. Then it's a matter of using the parallel method of locating it. Here's the code (derived, in part, from the site that Heater cited above):
    CON
    
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
    
    VAR
    
      long new, value, log
    
    OBJ
    
      sio   : "Parallax Serial Terminal"
    
    PUB start
    
      sio.start(115200)
      cognew(@log2, @new)
      repeat
        value := sio.hexin   'Enter a number in hexadecimal
        new~~
        repeat while new
        sio.hex(value, 8)    'Print out original number.
        sio.char(" ")
        sio.dec(log)         'Print bit position.
        sio.char(13)
    
    DAT
    
    'Find msb "one" bit of x; answer in y (0 - 31, or -1 if x == 0).
    
    log2          mov       xaddr,par
                  add       xaddr,#4
                  mov       yaddr,par
                  add       yaddr,#8
                  
    :loop         rdlong    x,par wz
            if_z  jmp       #:loop
    
                  rdlong    x,xaddr wz
                  
    'Begin computation (15 instructions)...
    
                  rev       x,#0                    'Reverse the bits in x.
                  neg       y,x                     'Isolate least-significant "one" bit.
                  and       x,y
                  mov       y,#31
            if_z  sub       y,#1                    'Locate that "one" bit, if any.
                  test      x,_0000ffff wz
            if_z  sub       y,#16
                  test      x,_00ff00ff wz
            if_z  sub       y,#8
                  test      x,_0f0f0f0f wz
            if_z  sub       y,#4
                  test      x,_33333333 wz
            if_z  sub       y,#2
                  test      x,_55555555 wz
            if_z  sub       y,#1
            
    '... End computation.
     
                  wrlong    y,yaddr
                  mov       x,#0
                  wrlong    x,par
                  jmp       #:loop
    
    _0000ffff     long      $0000ffff
    _00ff00ff     long      $00ff00ff
    _0f0f0f0f     long      $0f0f0f0f
    _33333333     long      $33333333
    _55555555     long      $55555555
    
    x             res       1
    y             res       1
    xaddr         res       1
    yaddr         res       1
    

    -Phil
  • Third time's a charm. (I've actually tested this one.) The key is to reverse the bits in x, since isolating the former most-significant is quicker when it's the least-significant one. Then it's a matter of using the parallel method of locating it. Here's the code (derived, in part, from the site that Heater cited above):

    ~

    -Phil

    Excellent, thank you
    I look forward to testing it.
    I'm also making a binary search model for the same goal just to see what I can see :smile:
    I'll keep posting code and results

    Merry Christmas
  • ... I'm also making a binary search model for the same goal just to see what I can see ...
    That should work. PASM's sumxx operators will come in handy for that.

    -Phil
  • One minor change: I relied upon writing the zero flag in the rdlong x,xaddr instruction, but I didn't have to. Do this instead:
    'Begin computation (15 instructions)...
    
                  rev       x,#0                    'Reverse the bits in x.
                  neg       y,x                     'Isolate least-significant "one" bit.
                  and       x,y wz '<======= Add the wz here.
    

    That will isolate the computation from the parameter acquisition, should they become more separated.

    -Phil
  • ... I'm also making a binary search model for the same goal just to see what I can see ...
    That should work. PASM's sumxx operators will come in handy for that.

    -Phil

    I made the BST method fully unrolled to begin with and tucked the '0' case way down in and I think the nesting of logic has a major speed impact: (This would need to be condensed to a loop in order to be size and speed effective)
    start = CNT;
        if(num_in < 0x00010000) {             // 0-15
          if(num_in < 0x00000100) {           // 0-7
            if(num_in < 0x00000010) {         // 0-3
              if(num_in < 0x0000004) {        // 0-1
                if(num_in < 0x00000002) { 
                    if(num_in == 0) { pow2 = -1; }
                    else { pow2 = 0; } 
                  }                  
                  else { pow2 = 1; }                                
              } else {                        // 2-3
                if(num_in < 0x00000008) { pow2 = 2; }
                  else { pow2 = 3; }                                
              }                            
            } else {                          // 4-7
              if(num_in < 0x00000040) {       // 4-5
                if(num_in < 0x00000020) { pow2 = 4; } 
                  else { pow2 = 5; }                                
              } else {                        // 6-7
                if(num_in < 0x00000080) { pow2 = 6; } 
                  else { pow2 = 7; }                                
              }       
            }                        
          } else {                            // 8-15
            if(num_in < 0x00001000) {         // 8-11
              if(num_in < 0x00000400) {       // 8-9
                if(num_in < 0x00000200) { pow2 = 8; } 
                  else { pow2 = 9; }                                
              } else {                        // 10-11
                if(num_in < 0x00000800) { pow2 = 10; } 
                  else { pow2 = 11; }
              }                            
            } else {                          // 12-15
              if(num_in < 0x00004000) {       // 12-13
                if(num_in < 0x00002000) { pow2 = 12; } 
                  else { pow2 = 13; }                                
              } else {                        // 14-15
                if(num_in < 0x00008000) { pow2 = 14; } 
                  else { pow2 = 15; }
              } 
            }                        
          }                    
        } else {                              // 16-31
          if(num_in < 0x01000000) {           // 16-23
            if(num_in < 0x00100000) {         // 16-19
              if(num_in < 0x0040000) {        // 16-17
                if(num_in < 0x00020000) { pow2 = 16; } 
                  else { pow2 = 17; }                                
              } else {                        // 18-19
                if(num_in < 0x00080000) { pow2 = 18; } 
                  else { pow2 = 19; }                                
              }                            
            } else {                          // 20-23
              if(num_in < 0x00400000) {       // 20-21
                if(num_in < 0x00200000) { pow2 = 20; } 
                  else { pow2 = 21; }                                
              } else {                        // 22-23
                if(num_in < 0x00800000) { pow2 = 22; } 
                  else { pow2 = 23; }                                
              }       
            }                        
          } else {                            // 24-31
            if(num_in < 0x10000000) {         // 24-27
              if(num_in < 0x04000000) {       // 24-25
                if(num_in < 0x02000000) { pow2 = 24; } 
                  else { pow2 = 25; }                                
              } else {                        // 26-27
                if(num_in < 0x08000000) { pow2 = 26; } 
                  else { pow2 = 27; }
              }                            
            } else {                          // 28-31
              if(num_in < 0x40000000) {       // 28-29
                if(num_in < 0x20000000) { pow2 = 28; } 
                  else { pow2 = 29; }                                
              } else {                        // 30-31
                if(num_in < 0x80000000) { pow2 = 30; } 
                  else { pow2 = 31; }
              } 
            }                        
          }
        }                           
        end = CNT;
    

    It's the fastest that I had made up to that point. Then I looked at @Ariba code again and removed the nested logic and it sped up by a factor of 10 and turned it into a big theta, I like big theta.
    unsigned int a = num_in;
        start = CNT;
        pow2 = 31;
    //    if(a == 0) {pow2 = 0;} 
    //    else {
          if(!(a & 0xFFFF0000)) {pow2 -= 16; a <<= 16;}
          if(!(a & 0xFF000000)) {pow2 -= 8; a <<= 8;}
          if(!(a & 0xF0000000)) {pow2 -= 4; a <<= 4;}
          if(!(a & 0xC0000000)) {pow2 -= 2; a <<= 2;}
          if(!(a & 0x80000000)) pow2 -= 1;
          if(a == 0) pow2 = -1;
    //    }
        end = CNT;
    
    So far the modified Ariba is the winner.
    Provide an integer: 0
    bsr)         Arg: 0 Characteristic: -1 time: 160
    log/log)     Arg: 0 Characteristic: -2147483648 time: 4528
    Ariba)       Arg: 0 Characteristic: -1 time: 48
    BST)         Arg: 0 Characteristic: -1 time: 368
    Provide an integer: 1
    bsr)         Arg: 1 Characteristic: 0 time: 1264
    log/log)     Arg: 1 Characteristic: 0 time: 5664
    Ariba)       Arg: 1 Characteristic: 0 time: 48
    BST)         Arg: 1 Characteristic: 0 time: 368
    Provide an integer: 2
    bsr)         Arg: 2 Characteristic: 1 time: 1136
    log/log)     Arg: 2 Characteristic: 1 time: 16080
    Ariba)       Arg: 2 Characteristic: 1 time: 48
    BST)         Arg: 2 Characteristic: 1 time: 304
    Provide an integer: 4
    bsr)         Arg: 4 Characteristic: 2 time: 1008
    log/log)     Arg: 4 Characteristic: 2 time: 16080
    Ariba)       Arg: 4 Characteristic: 2 time: 48
    BST)         Arg: 4 Characteristic: 2 time: 320
    Provide an integer: 8
    bsr)         Arg: 8 Characteristic: 3 time: 896
    log/log)     Arg: 8 Characteristic: 3 time: 16240
    Ariba)       Arg: 8 Characteristic: 3 time: 48
    BST)         Arg: 8 Characteristic: 3 time: 320
    Provide an integer: 
    bsr)         Arg: 8 Characteristic: 3 time: 880
    log/log)     Arg: 8 Characteristic: 3 time: 16240
    Ariba)       Arg: 8 Characteristic: 3 time: 48
    BST)         Arg: 8 Characteristic: 3 time: 320
    Provide an integer: 16
    bsr)         Arg: 16 Characteristic: 4 time: 768
    log/log)     Arg: 16 Characteristic: 4 time: 16080
    Ariba)       Arg: 16 Characteristic: 4 time: 48
    BST)         Arg: 16 Characteristic: 4 time: 320
    Provide an integer: 32
    bsr)         Arg: 32 Characteristic: 5 time: 640
    log/log)     Arg: 32 Characteristic: 5 time: 16080
    Ariba)       Arg: 32 Characteristic: 5 time: 48
    BST)         Arg: 32 Characteristic: 5 time: 320
    Provide an integer: 64
    bsr)         Arg: 64 Characteristic: 6 time: 512
    log/log)     Arg: 64 Characteristic: 6 time: 16240
    Ariba)       Arg: 64 Characteristic: 6 time: 48
    BST)         Arg: 64 Characteristic: 6 time: 320
    Provide an integer: 128
    bsr)         Arg: 128 Characteristic: 7 time: 368
    log/log)     Arg: 128 Characteristic: 7 time: 16352
    Ariba)       Arg: 128 Characteristic: 7 time: 48
    BST)         Arg: 128 Characteristic: 7 time: 320
    Provide an integer: 256
    bsr)         Arg: 256 Characteristic: 8 time: 1248
    log/log)     Arg: 256 Characteristic: 8 time: 16080
    Ariba)       Arg: 256 Characteristic: 8 time: 48
    BST)         Arg: 256 Characteristic: 8 time: 368
    Provide an integer: 512
    bsr)         Arg: 512 Characteristic: 9 time: 1120
    log/log)     Arg: 512 Characteristic: 9 time: 16080
    Ariba)       Arg: 512 Characteristic: 9 time: 48
    BST)         Arg: 512 Characteristic: 9 time: 368
    Provide an integer: 1024
    bsr)         Arg: 1024 Characteristic: 10 time: 992
    log/log)     Arg: 1024 Characteristic: 10 time: 16080
    Ariba)       Arg: 1024 Characteristic: 10 time: 48
    BST)         Arg: 1024 Characteristic: 10 time: 400
    Provide an integer: 2000000000
    bsr)         Arg: 2000000000 Characteristic: 30 time: 528
    log/log)     Arg: 2000000000 Characteristic: 30 time: 70992
    Ariba)       Arg: 2000000000 Characteristic: 30 time: 48
    BST)         Arg: 2000000000 Characteristic: 30 time: 384
    Provide an integer: 66000
    bsr)         Arg: 66000 Characteristic: 16 time: 1264
    log/log)     Arg: 66000 Characteristic: 16 time: 68848
    Ariba)       Arg: 66000 Characteristic: 16 time: 48
    BST)         Arg: 66000 Characteristic: 16 time: 432
    
    My next step is to test the code that you were kind enough to provide. I will share my results :smile:
  • If you convert my PASM to C, I'm not sure how you will be able to express the rev operator.

    -Phil
  • If you convert my PASM to C, I'm not sure how you will be able to express the rev operator.

    -Phil
    Since the PASM is so nice I was just planning on putting this portion inline:
                  rev       x,#0                    'Reverse the bits in x.
                  neg       y,x                     'Isolate least-significant "one" bit.
                  and       x,y wz
                  mov       y,#31
            if_z  sub       y,#1                    'Locate that "one" bit, if any.
                  test      x,_0000ffff wz
            if_z  sub       y,#16
                  test      x,_00ff00ff wz
            if_z  sub       y,#8
                  test      x,_0f0f0f0f wz
            if_z  sub       y,#4
                  test      x,_33333333 wz
            if_z  sub       y,#2
                  test      x,_55555555 wz
            if_z  sub       y,#1
    
Sign In or Register to comment.