Fast 8x8 Multiply
Brian Fairchild
Posts: 549
I have a requirement for a routine to take two unsigned 8-bit values and multiply them to give a 16-bit product. Speed is important, repeatability is important, accuracy is not. The values 0x00 - 0xFF represent analogue values of 0.0 to 1.0
A looped shift-and-add routine in PASM is hardly going to be fast enough. So I thought of using the log and anti-log lookup tables. Now maybe I'm being a bit slow this evening but Appendix B is not helping me understand how to use the tables and my protoboard is not around to try. I'm comfortable with the concept of taking two log values, adding them and then looking up the anti-log of the result but I'm very confused how the tables are organised.
A looped shift-and-add routine in PASM is hardly going to be fast enough. So I thought of using the log and anti-log lookup tables. Now maybe I'm being a bit slow this evening but Appendix B is not helping me understand how to use the tables and my protoboard is not around to try. I'm comfortable with the concept of taking two log values, adding them and then looking up the anti-log of the result but I'm very confused how the tables are organised.
Comments
Implement this in ASM. I expect each case would take about 4 ASM statements, using the conditional execution of the prop:
if( a & 1 ) Result += (b << 0)
if( a & 2 ) Result += (b << 1)
if( a & 4 ) Result += (b << 2)
// repeat for all 8 input bits
Each of those lines would become:
mov temp, b
shl temp, 0 'or 1, 2, 3, ...
test a, #1 'or 2, 4, 8, ...
if_nz add Result, temp
Hmmm... You could do it in 3 ops per bit by simply shifting TEMP by 1 bit for every cycle, like this:
mov temp, b
test a, #1
if_nz add Result, temp
shl temp, 1
test a, #2
if_nz add Result, temp
shl temp, 1
test a, #4
if_nz add Result, temp
...
So 3 instructions per bit x 8 bits = 24 instructions. How fast does it NEED to be?
Jason
The "Propeller Guts.pdf" also has example rutines for divide, square root, and for using the log, anti-log, and sine tables.
Lawson
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· -- Carl, nn5i@arrl.net
It looks like Mike's snippet is the way to go, 16 instructions plus set-up is probably about as good as it gets. If I've figured out correctly how the log tables work then I could use them for a similar execution time but with the potential loss of accuracy.
If you are multiplying by a known constant value with few bits actually set in its representation then you can remove shift and add steps in the middle adjusting the the shifts remaining to "skip" over the zeros in the constants.
If you have a small number of constants values to use at different times then create customized multiply sequences for each one.
Finally if your y values are mostly constant but change occasionally and you don't know what they will be, a user setting or calibration setting for examples, then you could have some code dynamically create the multiply sequences at run time.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 2/16/2009 12:58:50 PM GMT
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
-Phil