The quickest way of computing 2/3 of an arbitrary number
Paul Baker
Posts: 6,351
I am faced with the dilemma of trying to compute 2/3*x in the fastest (and most importantly code efficient) manner. Computing 3/2*x is child's play: rr((rl x)+x). However dividing by 3 is another matter.·Playing with the numbers, I stumbled upon the power series: x/2 + x/4 - x/8 + x/16 - ... = 2/3*x,·the series is computed for n iterations where·x/(2^(n+1))=0 (ie 2/3 of any 8 bit number can be computed in 7 or fewer iterations). Is this the fastest method for computing 2/3*x? Also in preliminary calculations I am finding it difficult to tease out the remainder from the calculation. Additionally the computation doesn't seem to always work (this maybe due to incorrect binary calculations since its been more than a decade since I was doing it in college). For example 2/3*21 is properly calculated (2/3*10101 = 1110) but 2/3*22 seems to not be correct (2/3*10110 = 1101). Does anyone care to lend a hand in helping me figure this puzzle out?
TIA,
Paul
TIA,
Paul
Comments
Many years ago when working with Z80 assembler I used a subtraction method.
To divide by 3 a simple example if x is an unsigned 8 bit value then max it could be is 255
so if result = r
subtract 30 if no underflow then add 10 to r and repeat
otherwise add 30 back to x
subtract 3 if no underflow then add 1 to r and repeat
otherwise add 3 and remainder is what is left
on the SX the carry flag is cleared for underflow
for larger variables ie. 16 bit start with subtract 30000 then 3000, 300, 30, 3
as a maximum each loop would be 3 itterations
This method can be coded to divide by any value, sorry not had time to look at coding it in SX asm but it's time I was out for a beer or two.
Regards
Brian
I was going to type an explanation of the algorithm, but a quick google search revealed this site which already has an excellent tutorial. should be a snap to implement on the sx.
http://cobb.host.sk/z80guide/part4.htm
Cheers
http://www.sxlist.com/techref/expeval2.asp?exp=2+*+X+%2F+3
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
James Newton, Host of SXList.com
james@sxlist.com 1-619-652-0593 fax:1-208-279-8767
SX FAQ / Code / Tutorials / Documentation:
http://www.sxlist.com Pick faster!
http://www.sxlist.com/cgi-bin/constdivmul.exe?Acc=ACC&Bits=8&endian=little&Const=0.66666666666666666666666666666667&ConstErr=0.5&Temp=TEMP&cpu=sx28
And you may wish to play with the error, width of the variable, etc... at
http://www.sxlist.com/cgi-bin/constdivmul.exe
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
James Newton, Host of SXList.com
james@sxlist.com 1-619-652-0593 fax:1-208-279-8767
SX FAQ / Code / Tutorials / Documentation:
http://www.sxlist.com Pick faster!
http://www.sxlist.com/cgi-bin/constdivmul.exe?cpu=sx
If you want the remainder, multiply the result
by 3 and subtract from the number you started with...
-Dan
·
; x = x / 3
MOV __PARAM1,x
MOV __PARAM2,#3
CLR __PARAM3
TEST __PARAM2
JZ $+23
CLR __PARAM4
CLC
SKIP
RL __PARAM2
INC __PARAM4
JNB __PARAM2.7,$-2
JMP $+4
CLC
RL __PARAM3
RR __PARAM2
INC __PARAM3
SUB __PARAM1,__PARAM2
JC $+5
DEC __PARAM3
ADD __PARAM1,__PARAM2
DJNZ __PARAM4,$-11
MOV x,__PARAM3
; fr37p = fr37p * 0.333333
; Temp = TEMP
; fr37p size = 8 bits
; Error = 0.5 %
; Bytes order = little endian
; Round = no
; ALGORITHM:
; Clear accumulator
; Add input / 4 to accumulator
; Add input / 16 to accumulator
; Add input / 64 to accumulator
; Add input / 256 to accumulator
; Move accumulator to result
;
; Approximated constant: 0.332031, Error: 0.390625 %
; Input: fr37p0, 8 bits
; Output: fr37p0, 7 bits
; Code size: 17 instructions
fr37p0 DS 1
;copy accumulator to temporary
mov w, fr37p0
;shift accumulator right 2 times
clc
rr fr37p0
clc
rr fr37p0
;add temporary to accumulator
add fr37p0, w
;shift accumulator right 2 times
rr fr37p0
clc
rr fr37p0
;add temporary to accumulator
add fr37p0, w
;shift accumulator right 2 times
rr fr37p0
clc
rr fr37p0
;add temporary to accumulator
add fr37p0, w
;shift accumulator right 2 times
rr fr37p0
clc
rr fr37p0
Actually, Marks 8/8 divide beats the SX/B code as well: 16 vs 22 words (and that 16 includes the return so it is actually 15 vs 22) and I believe the cycles will be about the same.
http://www.markworld.com/Div8x8.txt
; Divide W by MpyDiv1
; ret
; ret
;
Div8x8 mov MpyDiv4, W ;numerator
mov W, #8
mov Count, W
clr MpyDiv2 ;quotient
clr MpyDiv3 ;remainder
div1 clrb C
rl MpyDiv4
rl MpyDiv3
mov W, MpyDiv1
mov W, MpyDiv3-w
snb C
mov MpyDiv3, W
rl MpyDiv2
decsz Count
jmp div1
retw 0
;
And his code generator
http://www.markworld.com/evaluate8.html
is better than the sxlist.com one, but it is for the PIC 16C54 so you have to translate the result with
http://www.sxlist.com/cgi-bin/mpasm2sasm2.exe
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
James Newton, Host of SXList.com
james@sxlist.com 1-619-652-0593 fax:1-208-279-8767
SX FAQ / Code / Tutorials / Documentation:
http://www.sxlist.com Pick faster!
Consider what it does (I've re-written it here with more legible labels)
First thing I noticed is that the CLRB C instruction appears redundant,
it only serves to ensure that the numerator variable is 0 when the
routine completes, and this is pre-set at the start to 0 anyway.
That said, it looks like its possible to combine the numerator and
quotient variables. It may look a little confusing at first, but essentially,
the variable will start out holding the numerator, which will get shifted
out while the quotient is shifted in.
The upshoot is that 2 code words are saved, but more importantly
less work is done inside the loop (two less instructions per pass = 16 less cycles),
and the extra temp variable for the numerator is no longer needed.
BTW, none of these routines work if CARRYX is turned on. I also
had problems getting the correct result with SXSIM, but a quick preliminary
test with an actual chip seems to verify the result.
I should point out here that I am new to the SX chip, so I may be making
an assumption here that does not always hold. I had a lot of initial confusion about
the way the carry flag is handled on this processor. I also have not exhaustively
tested my optimization as of yet.
Also, note that while the general purpose solution is still slower than the dedicated
*.6666 or *.3333, it is generally more accurate: The *.3333 will not return the
correct result of, say, 6/3, due to the rounding. This would be a problem with any
repeating decimal multiplication (is it more correct to say repeating binary?).
The general purpose routine returns correct integer quotients in this case, and a remainder if
the result of the division is not an integer. Depending on the application,
this may or may not be important.
I am creating a token driven system where each token is a multiple of 8 bits, the vast majority of tokens are 8 bits with control tokens occupying 16 bits (goto, conditional goto etc, the 4 msb of the first 8bits is the type identifier, the 4 lsb of the first 8 bits and the second 8 bits are combined to produce a 12 bit address). I have chosen the virtual PC counter to coincide with the actual address of the token in program memory, therefore at the onset I believed that a jump to a target which straddles two adjacent program words·would necessitate the translation of the jump address to a real address, hence the need for computing·2/3 of an arbitrary number.·However, the 2/3 computation must be exact.
Computing 2/3 of·an 8 bit number using the constmult utility introduces a rounding error, but by my computations the error should within 3 (less than 3 bits) which can be corrected in computing the modulo portion of the computation. But computing 2/3 of a 12 bit number can produce a rounding error·greater than 2 bits, ie a modulo calculation cannot easily recover the correct result. This is even taking into account the extra terms required for 2/3 of a 12 bit number. Further analysis showed that by accounting for extra fractional bits (the shifted lsb is kept in an additional register rather than thrown away and is added·in fixed point style)·the correct result can be produced by rounding up to the next number at the end of calculation, but this greatly complicates the required code and is no longer the elegant solution I was in search of. So nix on the constmult solution.
The SX/B code (I know this was proffered only as a "I wonder" solution) is too generalized and not efficient enough for my liking.
I was impressed with the straight-forwardness of BrianW's solution, especially since it inherently produces the modulo as a byproduct and is always exact. But code-up show that this solution is too slow for a 12 bit number (1·x 3000, 9 x 300,·9 x 30, 9 x 3 for a possibility of 22 total conditional subtractions). Even using a temp register so that the addback is not necessary and·pop-out of each level·when the overflow occurs still·requires an average of 11 subtractions (the average for·in-system calculations is higher since my tokens are placed in higher program memory). Having to perform this whenever a branch occurs is too much of a slow-down on the system performance.
So here is my simple solution: byte alignment, a target address cannot straddle two adjacent code words. The token generator·program I will be producing will understand this and place a special $00 nop token in the straddled token so that the target will always occupy the 8 lsb of a program word. Therefore the actual address will be the "branch to" address.
I will likely incorporate JamesC's optimized routines if I also conclude that it works in all circumstances. I will have generalized mul and div routines within my code for 8 and 16 bit operands, but these routines (along with all data operation routines) will only be included if the user program uses them (after all if a user program never needs to divide two 16 bit numbers, why incorporate the code?).
Once again I want to thank everyone for their input. Although the code-ups and testing was somewhat arduous, it produced a fun weekend.
Thanks,
Paul
Post Edited (Paul Baker) : 11/8/2004 4:21:33 PM GMT
However, I just wanted to bring in the perspective of the Stamps' */ and ** operators, vis a vis the issue of multiplying by a fractional constant.
The */ operator would approximate 2/3 as 170/256, that is, multiply by 170 followed by shift right 8. (or, 171/256 is a little closer on the high side of 2/3). In binary, 171 is 10101011, so, with an input 8-bit argument abcdefgh, the answer is czzzzzzz in the following:
The remainder rrrrrrrh is discarded by the division by 256 (shift right, middle byte of a 24 bit accumulator). There might be a carry (up to binary 100) out of the the discarded term. I don't know that that is going to line up right in the box, but you get the idea.
Now, compare that with the code generator method. It is similar, but it uses shift rights, the following addition:
followed by one more shift right to discard the bit r and bring in the carry c. That answer cxxxxxx differs from the above czzzzzzz by whatever carry might occur out of the discarded terms. Of course the full multiply requires a higher precision accumulator, which the code generator avoids by working in from the left.
If more accuracy is needed, then the same logic applies when using 2730/4096, that is, multiply by binary 101010101010 followed by shift right 12. Or, as in the Stamp ** operator, approximate 2/3 as 43690/65536, multiply by 1010101010101010 followed by shift right 16. The error can be reduced, but there is always a possible one bit error, as the following shows:
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com