Shop OBEX P1 Docs P2 Docs Learn Events
The quickest way of computing 2/3 of an arbitrary number — Parallax Forums

The quickest way of computing 2/3 of an arbitrary number

Paul BakerPaul Baker Posts: 6,351
edited 2004-11-13 07:36 in General Discussion
I am faced with the dilemma of trying to compute 2/3*x in the fastest (and most importantly code efficient) manner. Computing 3/2*x is child's play: rr((rl x)+x). However dividing by 3 is another matter.·Playing with the numbers, I stumbled upon the power series: x/2 + x/4 - x/8 + x/16 - ... = 2/3*x,·the series is computed for n iterations where·x/(2^(n+1))=0 (ie 2/3 of any 8 bit number can be computed in 7 or fewer iterations). Is this the fastest method for computing 2/3*x? Also in preliminary calculations I am finding it difficult to tease out the remainder from the calculation. Additionally the computation doesn't seem to always work (this maybe due to incorrect binary calculations since its been more than a decade since I was doing it in college). For example 2/3*21 is properly calculated (2/3*10101 = 1110) but 2/3*22 seems to not be correct (2/3*10110 = 1101). Does anyone care to lend a hand in helping me figure this puzzle out?

TIA,
Paul

Comments

  • BrianWBrianW Posts: 7
    edited 2004-11-04 20:47
    Hi,

    Many years ago when working with Z80 assembler I used a subtraction method.
    To divide by 3 a simple example if x is an unsigned 8 bit value then max it could be is 255

    so if result = r

    subtract 30 if no underflow then add 10 to r and repeat
    otherwise add 30 back to x
    subtract 3 if no underflow then add 1 to r and repeat
    otherwise add 3 and remainder is what is left

    on the SX the carry flag is cleared for underflow

    for larger variables ie. 16 bit start with subtract 30000 then 3000, 300, 30, 3

    as a maximum each loop would be 3 itterations

    This method can be coded to divide by any value, sorry not had time to look at coding it in SX asm but it's time I was out for a beer or two.

    Regards
    Brian
  • JamesCJamesC Posts: 26
    edited 2004-11-04 21:05
    I too had the fortune (or misfortune) of also having to do this back in the Z-80 days (trash-80 model 1), and the bit-shift-add multiply and bit-shift-subtract divide were generally regarded as the best simple methods for general purpose multiplication and division. they are also very simple to implement. Don't know if there is a particular short cut that would be more efficient for 1/3rds, but I cant think of one.

    I was going to type an explanation of the algorithm, but a quick google search revealed this site which already has an excellent tutorial. should be a snap to implement on the sx.

    http://cobb.host.sk/z80guide/part4.htm

    Cheers
  • James NewtonJames Newton Posts: 329
    edited 2004-11-04 21:34
    http://www.sxlist.com/cgi-bin/constdivmul.exe?Acc=fr37p&Bits=8&endian=little&Const=0.333333333333333&ConstErr=0.5&Temp=TEMP&cpu=sx28

    http://www.sxlist.com/techref/expeval2.asp?exp=2+*+X+%2F+3



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ---
    James Newton, Host of SXList.com
    james@sxlist.com 1-619-652-0593 fax:1-208-279-8767
    SX FAQ / Code / Tutorials / Documentation:
    http://www.sxlist.com Pick faster!



  • James NewtonJames Newton Posts: 329
    edited 2004-11-04 21:37
    Actually

    http://www.sxlist.com/cgi-bin/constdivmul.exe?Acc=ACC&Bits=8&endian=little&Const=0.66666666666666666666666666666667&ConstErr=0.5&Temp=TEMP&cpu=sx28

    And you may wish to play with the error, width of the variable, etc... at
    http://www.sxlist.com/cgi-bin/constdivmul.exe

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ---
    James Newton, Host of SXList.com
    james@sxlist.com 1-619-652-0593 fax:1-208-279-8767
    SX FAQ / Code / Tutorials / Documentation:
    http://www.sxlist.com Pick faster!



  • dkemppaidkemppai Posts: 315
    edited 2004-11-04 21:38
    On the SX list site, there is a code generator for multiplication by a constant.·
    http://www.sxlist.com/cgi-bin/constdivmul.exe?cpu=sx

    If you want the remainder, multiply the result
    by 3 and subtract from the number you started with...

    -Dan

    ·
  • BrianWBrianW Posts: 7
    edited 2004-11-05 11:05
    Just thought I would see how the super free basic compiler does it :-

    ; x = x / 3
    MOV __PARAM1,x
    MOV __PARAM2,#3
    CLR __PARAM3
    TEST __PARAM2
    JZ $+23
    CLR __PARAM4
    CLC
    SKIP
    RL __PARAM2
    INC __PARAM4
    JNB __PARAM2.7,$-2
    JMP $+4
    CLC
    RL __PARAM3
    RR __PARAM2
    INC __PARAM3
    SUB __PARAM1,__PARAM2
    JC $+5
    DEC __PARAM3
    ADD __PARAM1,__PARAM2
    DJNZ __PARAM4,$-11
    MOV x,__PARAM3
  • James NewtonJames Newton Posts: 329
    edited 2004-11-05 19:08
    Well, the code generator at sxlist.com beats that all to heck, but the SX/B routine is a more general purpose solution, so if you were doing a lot of different divides it would end up being more code efficient although slower.

    ; fr37p = fr37p * 0.333333
    ; Temp = TEMP
    ; fr37p size = 8 bits
    ; Error = 0.5 %
    ; Bytes order = little endian
    ; Round = no

    ; ALGORITHM:
    ; Clear accumulator
    ; Add input / 4 to accumulator
    ; Add input / 16 to accumulator
    ; Add input / 64 to accumulator
    ; Add input / 256 to accumulator
    ; Move accumulator to result
    ;
    ; Approximated constant: 0.332031, Error: 0.390625 %

    ; Input: fr37p0, 8 bits
    ; Output: fr37p0, 7 bits
    ; Code size: 17 instructions

    fr37p0 DS 1

    ;copy accumulator to temporary
    mov w, fr37p0


    ;shift accumulator right 2 times
    clc
    rr fr37p0
    clc
    rr fr37p0

    ;add temporary to accumulator
    add fr37p0, w

    ;shift accumulator right 2 times
    rr fr37p0
    clc
    rr fr37p0

    ;add temporary to accumulator
    add fr37p0, w

    ;shift accumulator right 2 times
    rr fr37p0
    clc
    rr fr37p0

    ;add temporary to accumulator
    add fr37p0, w

    ;shift accumulator right 2 times
    rr fr37p0
    clc
    rr fr37p0


    Actually, Marks 8/8 divide beats the SX/B code as well: 16 vs 22 words (and that 16 includes the return so it is actually 15 vs 22) and I believe the cycles will be about the same.

    http://www.markworld.com/Div8x8.txt
    ; Divide W by MpyDiv1
    ; ret
    ; ret
    ;
    Div8x8 mov MpyDiv4, W ;numerator
    mov W, #8
    mov Count, W
    clr MpyDiv2 ;quotient
    clr MpyDiv3 ;remainder
    div1 clrb C
    rl MpyDiv4
    rl MpyDiv3
    mov W, MpyDiv1
    mov W, MpyDiv3-w
    snb C
    mov MpyDiv3, W
    rl MpyDiv2
    decsz Count
    jmp div1
    retw 0
    ;

    And his code generator
    http://www.markworld.com/evaluate8.html
    is better than the sxlist.com one, but it is for the PIC 16C54 so you have to translate the result with
    http://www.sxlist.com/cgi-bin/mpasm2sasm2.exe

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ---
    James Newton, Host of SXList.com
    james@sxlist.com 1-619-652-0593 fax:1-208-279-8767
    SX FAQ / Code / Tutorials / Documentation:
    http://www.sxlist.com Pick faster!



  • JamesCJamesC Posts: 26
    edited 2004-11-06 16:46
    It looks like the general purpose routine could be optimized even further...

    Consider what it does (I've re-written it here with more legible labels)

    ; Divide W by denominator
    ; Return quotient in quotient 
    ; Return remainder in remainder
    ;
    
    Div8x8  
        MOV numerator, W      ; store numerator
        MOV W, #8
        MOV Count, W          ; loop 8 times
        CLR quotient          ; quotient = 0
        CLR remainder         ; remainder = 0
    
    div1    
        CLRB    C             ; clear carry flag
        RL  numerator         ; rotate left C <- numerator <-C
        RL  remainder         ; rotate left C <- remainder <-C
        MOV W, denominator    ; 
        MOV W, remainder-w    ;   is denom > remainder ?    
        SNB C                 ; skip next if underflow 
        MOV remainder, W      ; save remainder - denominator
        RL  quotient          ; rotate left C <-quotient <- C (result from mov W,remainder-w)
        
        DECSZ   Count         ; loop
        JMP div1              ; 
        
        RETW    0
    
    
    




    First thing I noticed is that the CLRB C instruction appears redundant,
    it only serves to ensure that the numerator variable is 0 when the
    routine completes, and this is pre-set at the start to 0 anyway.

    That said, it looks like its possible to combine the numerator and
    quotient variables. It may look a little confusing at first, but essentially,
    the variable will start out holding the numerator, which will get shifted
    out while the quotient is shifted in.

    
    Div8x8  
        MOV quotient, W       ; store numerator
        MOV W, #8
        MOV Count, W          ; loop 8 times
        CLR remainder         ; remainder = 0
    
    div1    
        RL  quotient          ; rotate left C <- quotient <-C (numerator out/quotient in from last loop)
        RL  remainder         ; rotate left C <- remainder <-C
        MOV W, denominator    ; 
        MOV W, remainder-w    ;   is denom > remainder ?    
        SNB C                 ; skip next if underflow 
        MOV remainder, W      ; save remainder - denominator
                              ; C flag contains bit to shift into quotient
        
        DECSZ   Count         ; loop
        JMP div1              ; 
        
        RL  quotient          ; needed to shift final bit into quotient... 
        
        RETW    0
    
    
    




    The upshoot is that 2 code words are saved, but more importantly
    less work is done inside the loop (two less instructions per pass = 16 less cycles),
    and the extra temp variable for the numerator is no longer needed.

    BTW, none of these routines work if CARRYX is turned on. I also
    had problems getting the correct result with SXSIM, but a quick preliminary
    test with an actual chip seems to verify the result.

    I should point out here that I am new to the SX chip, so I may be making
    an assumption here that does not always hold. I had a lot of initial confusion about
    the way the carry flag is handled on this processor. I also have not exhaustively
    tested my optimization as of yet.

    Also, note that while the general purpose solution is still slower than the dedicated
    *.6666 or *.3333, it is generally more accurate: The *.3333 will not return the
    correct result of, say, 6/3, due to the rounding. This would be a problem with any
    repeating decimal multiplication (is it more correct to say repeating binary?).
    The general purpose routine returns correct integer quotients in this case, and a remainder if
    the result of the division is not an integer. Depending on the application,
    this may or may not be important.
  • Paul BakerPaul Baker Posts: 6,351
    edited 2004-11-08 16:19
    Wow, I want to start off by thanking everyone so much for their assistance, the plethora of responses helped me attack the problem from several different angles and perspectives. After viewing the code produced by the constmult utility, I realized that my power series is the same if I were to combine the subtraction couplets. After careful consideration of all the solutions provided, I hope not to disappoint any of you by stating that after repeated code-ups and numerical evaluations of each of the proposed solutions, I have decided to choose option E) None of the above. By giving a brief background to the underlying problem that generated this perceived need and a brief walkthrough some of the proposed solutions, I hope you will concur that I chose correctly.

    I am creating a token driven system where each token is a multiple of 8 bits, the vast majority of tokens are 8 bits with control tokens occupying 16 bits (goto, conditional goto etc, the 4 msb of the first 8bits is the type identifier, the 4 lsb of the first 8 bits and the second 8 bits are combined to produce a 12 bit address). I have chosen the virtual PC counter to coincide with the actual address of the token in program memory, therefore at the onset I believed that a jump to a target which straddles two adjacent program words·would necessitate the translation of the jump address to a real address, hence the need for computing·2/3 of an arbitrary number.·However, the 2/3 computation must be exact.

    Computing 2/3 of·an 8 bit number using the constmult utility introduces a rounding error, but by my computations the error should within 3 (less than 3 bits) which can be corrected in computing the modulo portion of the computation. But computing 2/3 of a 12 bit number can produce a rounding error·greater than 2 bits, ie a modulo calculation cannot easily recover the correct result. This is even taking into account the extra terms required for 2/3 of a 12 bit number. Further analysis showed that by accounting for extra fractional bits (the shifted lsb is kept in an additional register rather than thrown away and is added·in fixed point style)·the correct result can be produced by rounding up to the next number at the end of calculation, but this greatly complicates the required code and is no longer the elegant solution I was in search of. So nix on the constmult solution.

    The SX/B code (I know this was proffered only as a "I wonder" solution) is too generalized and not efficient enough for my liking.

    I was impressed with the straight-forwardness of BrianW's solution, especially since it inherently produces the modulo as a byproduct and is always exact. But code-up show that this solution is too slow for a 12 bit number (1·x 3000, 9 x 300,·9 x 30, 9 x 3 for a possibility of 22 total conditional subtractions). Even using a temp register so that the addback is not necessary and·pop-out of each level·when the overflow occurs still·requires an average of 11 subtractions (the average for·in-system calculations is higher since my tokens are placed in higher program memory). Having to perform this whenever a branch occurs is too much of a slow-down on the system performance.

    So here is my simple solution: byte alignment, a target address cannot straddle two adjacent code words. The token generator·program I will be producing will understand this and place a special $00 nop token in the straddled token so that the target will always occupy the 8 lsb of a program word. Therefore the actual address will be the "branch to" address.

    I will likely incorporate JamesC's optimized routines if I also conclude that it works in all circumstances. I will have generalized mul and div routines within my code for 8 and 16 bit operands, but these routines (along with all data operation routines) will only be included if the user program uses them (after all if a user program never needs to divide two 16 bit numbers, why incorporate the code?).

    Once again I want to thank everyone for their input. Although the code-ups and testing was somewhat arduous, it produced a fun weekend.

    Thanks,
    Paul

    Post Edited (Paul Baker) : 11/8/2004 4:21:33 PM GMT
  • Tracy AllenTracy Allen Posts: 6,666
    edited 2004-11-13 07:36
    I'm a little late tuning in on this thread, and I don't really have anything to add for Paul's special application.

    However, I just wanted to bring in the perspective of the Stamps' */ and ** operators, vis a vis the issue of multiplying by a fractional constant.

    The */ operator would approximate 2/3 as 170/256, that is, multiply by 170 followed by shift right 8. (or, 171/256 is a little closer on the high side of 2/3). In binary, 171 is 10101011, so, with an input 8-bit argument abcdefgh, the answer is czzzzzzz in the following:
                            abcdefgh
                         * 10101011
                         ----------------
                            abcdefgh
                          abcdefgh
                        abcdefgh
                      abcdefgh
                    abcdefgh
                    ----------------------------
                   czzzzzzzrrrrrrrh
    
    


    The remainder rrrrrrrh is discarded by the division by 256 (shift right, middle byte of a 24 bit accumulator). There might be a carry (up to binary 100) out of the the discarded term. I don't know that that is going to line up right in the box, but you get the idea.

    Now, compare that with the code generator method. It is similar, but it uses shift rights, the following addition:

                    000000ab
                    0000abcd
                    00abcdef
                    abcdefgh
                    ----------
                   cxxxxxxxr
    
    



    followed by one more shift right to discard the bit r and bring in the carry c. That answer cxxxxxx differs from the above czzzzzzz by whatever carry might occur out of the discarded terms. Of course the full multiply requires a higher precision accumulator, which the code generator avoids by working in from the left.

    If more accuracy is needed, then the same logic applies when using 2730/4096, that is, multiply by binary 101010101010 followed by shift right 12. Or, as in the Stamp ** operator, approximate 2/3 as 43690/65536, multiply by 1010101010101010 followed by shift right 16. The error can be reduced, but there is always a possible one bit error, as the following shows:
        6** 43690 = 3,       not 4, a one bit error
        6**43691=4
        60000**43690 = 39999   still only a one bit error
        60000**43691 = 40000
    
      6 */ 170 = 3       not 4, again a one bit error
      6 */ 171 = 4
      60000 */ 170 = 39843       worse approximation, /256, bigger error   
      60000 */ 171 = 40078
    
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
Sign In or Register to comment.