Fast Division by 100 ?
Ale
Posts: 2,363
Before I describe an "idea" I had yesterday, let me present the problem:
I like calculators, (handhelds), so I decided to build my own calc, sort of the ultimate model, something like a HP-41 of sorts. (www.hpmuseum.org)
Well, of course the problem is to perform the actual calculations, no ?. Binary floating point is AFAIK never used. BCD is used instead.
After plenty of research, I decided more or less how I'm going to implement all the math functions, but some low level algorithms are still in the works,
Wenn all this started, I decided for a HC11, but later ended up with an AVR. Easier to work with, much faster, and similar advantages as the PICs and derivatives.
When I discovered the propeller (Thanks to Elekt*r), I thought that could be a good candidate for a better, improved, more functional, usw,. calculator.
In the course of "the ultimate BCD package" (TUBP) I come across finally to the description of how to do "high-performance" BCD calculations (Have a look at the HP-42 simulator for the PC), or the un*x utility bc, included in your OS.
This TUBP uses shorts to hold 4 digits together, but not as packed BCD, but as a binary number. And instead of using BASE 10 is using BASE 10000 !. I wanted to port this to the propeller, but so far so good, the propeller does not bring any speed to the problem This algorithm relays on multiplication and ... division !, for both bcd multiplication and bcd division. As everyone knows, division is something that is normally avoided for known reasons (speed).
I decided meanwhile, to test a simplified (using bytes instead of shorts) version for the AVR. The problem of multiplication is easily solved, because it has an 8 bit mul instruction... but division ?, that is horribly slow !
I needed a fast, and that means FAST, division by 100. The problem is how ? And unrolled 16 bit by 8 bit division needs ~100 instructions
The division by 10 used to unpack a binary into two digits is done by a table (hehe), can a table be also useful ?
The number I have to divide is ever hardly above 1100, and I am only going to divide it by 100, so...
Being:
A the dividend
R the remainder
C the hundreds (Quotient in a by 100 division)
A = R + C*100
Great, but how I get C ?
A = R1 + E*128
Dividing A by 128 is straight-forward...
A table can give me C from E, when E is used as index....
table: .byte 0,1,2,3,5,6,7,8,10,11...
Now we get R:
R = A - C*100
R still can be bigger than 100, so a simple compare with 100, and subsequent subtraction, and increment of C, provides both Quotient, and Remainder
This can be implemented in few bytes (plus the Table), and is going to be Fast !. (A division by 256 instead of 128, is I think not better in terms of performance, but shorter).
I know I did not present rocket science, but something I think worth shearing.
Gru
I like calculators, (handhelds), so I decided to build my own calc, sort of the ultimate model, something like a HP-41 of sorts. (www.hpmuseum.org)
Well, of course the problem is to perform the actual calculations, no ?. Binary floating point is AFAIK never used. BCD is used instead.
After plenty of research, I decided more or less how I'm going to implement all the math functions, but some low level algorithms are still in the works,
Wenn all this started, I decided for a HC11, but later ended up with an AVR. Easier to work with, much faster, and similar advantages as the PICs and derivatives.
When I discovered the propeller (Thanks to Elekt*r), I thought that could be a good candidate for a better, improved, more functional, usw,. calculator.
In the course of "the ultimate BCD package" (TUBP) I come across finally to the description of how to do "high-performance" BCD calculations (Have a look at the HP-42 simulator for the PC), or the un*x utility bc, included in your OS.
This TUBP uses shorts to hold 4 digits together, but not as packed BCD, but as a binary number. And instead of using BASE 10 is using BASE 10000 !. I wanted to port this to the propeller, but so far so good, the propeller does not bring any speed to the problem This algorithm relays on multiplication and ... division !, for both bcd multiplication and bcd division. As everyone knows, division is something that is normally avoided for known reasons (speed).
I decided meanwhile, to test a simplified (using bytes instead of shorts) version for the AVR. The problem of multiplication is easily solved, because it has an 8 bit mul instruction... but division ?, that is horribly slow !
I needed a fast, and that means FAST, division by 100. The problem is how ? And unrolled 16 bit by 8 bit division needs ~100 instructions
The division by 10 used to unpack a binary into two digits is done by a table (hehe), can a table be also useful ?
The number I have to divide is ever hardly above 1100, and I am only going to divide it by 100, so...
Being:
A the dividend
R the remainder
C the hundreds (Quotient in a by 100 division)
A = R + C*100
Great, but how I get C ?
A = R1 + E*128
Dividing A by 128 is straight-forward...
A table can give me C from E, when E is used as index....
table: .byte 0,1,2,3,5,6,7,8,10,11...
Now we get R:
R = A - C*100
R still can be bigger than 100, so a simple compare with 100, and subsequent subtraction, and increment of C, provides both Quotient, and Remainder
This can be implemented in few bytes (plus the Table), and is going to be Fast !. (A division by 256 instead of 128, is I think not better in terms of performance, but shorter).
I know I did not present rocket science, but something I think worth shearing.
Gru
Comments
please have a look at the webpage of Hacker's Delight. There is a chapter "More on Division by Constants" free for download, which would be very applicable for your problem.
http://www.hackersdelight.org/
Thomas
Graham
Thanks for the reference to 'Algorithms for programmers'. Pretty fat, 0ver 900 pages!!!
That will take some time to pour over. Something to do this winter.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
you made some quite questionable remarks in your posting, but I think reading through the given links will be extremely helpful. Also using base 10,000 will most like give you the fastest implementation of multiple precission arithmetik. When coming to fgloeting point it will be the simplest solution to just use Float32...
This is the standard divioson algirithm on the Prop
You can see that there are just 2 instructions per unrolled loop per bitposition of the divisor.
deSilva:
I had that algorithm in mind, for the implementation on the Propeller of the BASE 10000 BCD.
But, do not forget that I wanted to implement on the AVR, with bytes. AVR has no cmpsub or equivalent, nor has it a barrel shifter, and only 8 bit registers.
Maybe my post is a bit messy on the description side, but I think, and I may be wrong, the problem was described.
Please, will help me if you could point out what are the "questionable remarks" .
On the AVR: The algorithm I came out with takes as maximum 18 cycles:
Is it may be possible to shorten it using a table with steps of 256 instead of 128, but you may not gain too much.
Edit: I reviewed the algorithms in Hacker's delight, very interesting indeed. The propeller would probably benefit from some of them, especially the divide by 100 . Again the AVR lacks a barrel shifter, so does not benefit from these more advanced algorithms. I stay with mine, for this time. Nice books.
Post Edited (Ale) : 9/22/2007 3:56:40 PM GMT
May be they did not understand the difference between a "hacker" and a "cracker". Let's the flames begin
I do not think is the"ultimate" (but I do like its format).
I built 2 prototypes with that shape , they are slightly different (I have extra PCBs laying around, too)
A merge between the HP-11 and HP-16 is what the last prototype (when finished) should be like, by the way.
As ultimate I consider the HP-48. But replicating what they did... can take many many years, considering that most functions are actually in an intermediate language. And everything with a 2 (or 4) MHz processor.
I never ever saw a HP-16, just in HP's catalog from 1984. The HEX->BIN->DEC conversion is what I use most in the HP-48... a bit overkill, but I don't have something else. I hope I can finish mine. (You can get some info on geocities.com/hppacito/calcs.html if interested. The second prototype is not there though)
My 12C provokes my defenestration response every time I hit the E+ key thereby overwriting my saved values in registers 1, 2, 4 and 5. Grrrr. It is the calculator that really should have a bsp key, too.
Thanks for the link. And no, I don't the english for those words either. I like your yellow robot test object which suggests visions of a Bucyrus shovel for my proposed sugar cube board.
Todays HP-12 Platinum has a 6502 derivative (I heard without the Y register...), and not anymore a Nut derivative. I have to update the site with the new prototype and some soft for it .
The HP-1x series did not have that much memory... so some functions used the registers, like statistics. I really do not remember I had problems with the E+ Key... well my fingers were much smaller then (when I was 12 or so...some 20 years ago...)
I have to confess that for a decade and a half I used its program to balance my checkbook, savings and cash accounts. (enter the number, press r/s A, B, C, or D) C was for Checking of couse. D was for debt, ie credit card).
www.jacques-laporte.org/TheSecretOfTheAlgorithms.htm
Still use it,works in HEX
I'm OK up to 256 but 32bit I need the calc.
Regards Ian
:- When I log on to AOL is says " You got problems "
Some years ago I tried to understand how this worked.. but without knowing the algorithm used... was a bit futile...
??? I'll dump anything that says that I got any problems
Gr,
Mightor
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
| To know recursion, you must first know recursion.
| I reject your reality and substitute my own!
| - Adam Savage
It is possible you haven't grasped the gravity of the situation.