Fast 8x8 Multiply

Brian Fairchild · 2009-02-15 22:03

I have a requirement for a routine to take two unsigned 8-bit values and multiply them to give a 16-bit product. Speed is important, repeatability is important, accuracy is not. The values 0x00 - 0xFF represent analogue values of 0.0 to 1.0

A looped shift-and-add routine in PASM is hardly going to be fast enough. So I thought of using the log and anti-log lookup tables. Now maybe I'm being a bit slow this evening but Appendix B is not helping me understand how to use the tables and my protoboard is not around to try. I'm comfortable with the concept of taking two log values, adding them and then looking up the anti-log of the result but I'm very confused how the tables are organised.

JasonDorie · 2009-02-15 22:13

You don't need a loop, do you?

Implement this in ASM. I expect each case would take about 4 ASM statements, using the conditional execution of the prop:

if( a & 1 ) Result += (b << 0)
if( a & 2 ) Result += (b << 1)
if( a & 4 ) Result += (b << 2)
// repeat for all 8 input bits

Each of those lines would become:

mov temp, b
shl temp, 0 'or 1, 2, 3, ...
test a, #1 'or 2, 4, 8, ...
if_nz add Result, temp

Hmmm... You could do it in 3 ops per bit by simply shifting TEMP by 1 bit for every cycle, like this:

mov temp, b
test a, #1
if_nz add Result, temp

shl temp, 1
test a, #2
if_nz add Result, temp

shl temp, 1
test a, #4
if_nz add Result, temp
...

So 3 instructions per bit x 8 bits = 24 instructions. How fast does it NEED to be?

Jason

Lawson · 2009-02-16 00:53

This is copied out of the "propeller guts.pdf" that's floating around on the board somewhere. Unsigned 16-bit multiply. (should be easy to cut back to 8-bits)

'
' Multiply x[noparse][[/noparse]15..0] by y[noparse][[/noparse]15..0] (y[noparse][[/noparse]31..16] must be 0)
' on exit, product in y[noparse][[/noparse]31..0]
'
multiply     shl x,#16     'get multiplicand into x[noparse][[/noparse]31..16]
             mov t,#16     'ready for 16 multiplier bits
             shr y,#1 wc   'get initial multiplier bit into c
:loop  if_c  add y,x wc    'if c set, add multiplicand into product
             rcr y,#1 wc   'get next multiplier bit into c, shift product
             djnz t,#:loop 'loop until done
multiply_ret ret           'return with product in y[noparse][[/noparse]31..0]

The "Propeller Guts.pdf" also has example rutines for divide, square root, and for using the log, anti-log, and sine tables.

Lawson

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?

Mike Green · 2009-02-16 01:16

Notice that you can "unroll" the loop with 2 instructions per bit (plus setup) as in

multiply  shl   x,#16
        shr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
        rcr   y,#1   wc
 if_c  add  y,x     wc
 multiply_ret   ret

Carl Hayes · 2009-02-16 05:43

That's elegant, Mike.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· -- Carl, nn5i@arrl.net

Brian Fairchild · 2009-02-16 12:01

Thanks Guys

It looks like Mike's snippet is the way to go, 16 instructions plus set-up is probably about as good as it gets. If I've figured out correctly how the log tables work then I could use them for a similar execution time but with the potential loss of accuracy.

heater · 2009-02-16 12:49

This may not be relevant to your application but don't forget if speed is of the essence and accuracy is not then you can skip one, two or more shift and add steps depending on the number of bits in y. Example: If y is only 6 bits bits (0 to 5) then skip two shift and add. At the beginning or the end depending if you want to confine y to 0 to 64 or have y as 0 to 255 but with lower two bits always zero.

If you are multiplying by a known constant value with few bits actually set in its representation then you can remove shift and add steps in the middle adjusting the the shifts remaining to "skip" over the zeros in the constants.

If you have a small number of constants values to use at different times then create customized multiply sequences for each one.

Finally if your y values are mostly constant but change occasionally and you don't know what they will be, a user setting or calibration setting for examples, then you could have some code dynamically create the multiply sequences at run time.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 2/16/2009 12:58:50 PM GMT

Leon · 2009-02-16 13:33

A lookup table is the fastest way to do it, but needs a lot of memory.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle

Baggers · 2009-02-16 17:51

Leon, Prop hasn't got 64KB spare, which is what is needed for an 8bitx8bit multiply table, otherwise I'm sure he would have used that [noparse]:D[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite

·

Phil Pilgrim (PhiPi) · 2009-02-16 18:12

Well, actually, since multiplication is commutative, you need "only" 32K words for a * b (a < b), plus another 256 words for the squares. Of course, that entails an additional compare with a possible swap to set up the lookup address.

-Phil

Baggers · 2009-02-16 22:51

so all in all it's probably just as quick with 16 instructions [noparse]:D[/noparse] and saves shed loads of ram [noparse]:D[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite

·

Phil Pilgrim (PhiPi) · 2009-02-17 10:47

Yup.

-Phil

Fast 8x8 Multiply

Comments