Shop OBEX P1 Docs P2 Docs Learn Events
Fast 8x8 Multiply — Parallax Forums

Fast 8x8 Multiply

Brian FairchildBrian Fairchild Posts: 549
edited 2009-02-17 10:47 in Propeller 1
I have a requirement for a routine to take two unsigned 8-bit values and multiply them to give a 16-bit product. Speed is important, repeatability is important, accuracy is not. The values 0x00 - 0xFF represent analogue values of 0.0 to 1.0

A looped shift-and-add routine in PASM is hardly going to be fast enough. So I thought of using the log and anti-log lookup tables. Now maybe I'm being a bit slow this evening but Appendix B is not helping me understand how to use the tables and my protoboard is not around to try. I'm comfortable with the concept of taking two log values, adding them and then looking up the anti-log of the result but I'm very confused how the tables are organised.

Comments

  • JasonDorieJasonDorie Posts: 1,930
    edited 2009-02-15 22:13
    You don't need a loop, do you?

    Implement this in ASM. I expect each case would take about 4 ASM statements, using the conditional execution of the prop:

    if( a & 1 ) Result += (b << 0)
    if( a & 2 ) Result += (b << 1)
    if( a & 4 ) Result += (b << 2)
    // repeat for all 8 input bits

    Each of those lines would become:

    mov temp, b
    shl temp, 0 'or 1, 2, 3, ...
    test a, #1 'or 2, 4, 8, ...
    if_nz add Result, temp

    Hmmm... You could do it in 3 ops per bit by simply shifting TEMP by 1 bit for every cycle, like this:

    mov temp, b
    test a, #1
    if_nz add Result, temp

    shl temp, 1
    test a, #2
    if_nz add Result, temp

    shl temp, 1
    test a, #4
    if_nz add Result, temp
    ...

    So 3 instructions per bit x 8 bits = 24 instructions. How fast does it NEED to be?

    Jason
  • LawsonLawson Posts: 870
    edited 2009-02-16 00:53
    This is copied out of the "propeller guts.pdf" that's floating around on the board somewhere. Unsigned 16-bit multiply. (should be easy to cut back to 8-bits)

    '
    ' Multiply x[noparse][[/noparse]15..0] by y[noparse][[/noparse]15..0] (y[noparse][[/noparse]31..16] must be 0)
    ' on exit, product in y[noparse][[/noparse]31..0]
    '
    multiply     shl x,#16     'get multiplicand into x[noparse][[/noparse]31..16]
                 mov t,#16     'ready for 16 multiplier bits
                 shr y,#1 wc   'get initial multiplier bit into c
    :loop  if_c  add y,x wc    'if c set, add multiplicand into product
                 rcr y,#1 wc   'get next multiplier bit into c, shift product
                 djnz t,#:loop 'loop until done
    multiply_ret ret           'return with product in y[noparse][[/noparse]31..0]
    



    The "Propeller Guts.pdf" also has example rutines for divide, square root, and for using the log, anti-log, and sine tables.

    Lawson

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Lunch cures all problems! have you had lunch?
  • Mike GreenMike Green Posts: 23,101
    edited 2009-02-16 01:16
    Notice that you can "unroll" the loop with 2 instructions per bit (plus setup) as in
    multiply  shl   x,#16
            shr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
            rcr   y,#1   wc
     if_c  add  y,x     wc
     multiply_ret   ret
    
  • Carl HayesCarl Hayes Posts: 841
    edited 2009-02-16 05:43
    That's elegant, Mike.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    · -- Carl, nn5i@arrl.net
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2009-02-16 12:01
    Thanks Guys

    It looks like Mike's snippet is the way to go, 16 instructions plus set-up is probably about as good as it gets. If I've figured out correctly how the log tables work then I could use them for a similar execution time but with the potential loss of accuracy.
  • heaterheater Posts: 3,370
    edited 2009-02-16 12:49
    This may not be relevant to your application but don't forget if speed is of the essence and accuracy is not then you can skip one, two or more shift and add steps depending on the number of bits in y. Example: If y is only 6 bits bits (0 to 5) then skip two shift and add. At the beginning or the end depending if you want to confine y to 0 to 64 or have y as 0 to 255 but with lower two bits always zero.

    If you are multiplying by a known constant value with few bits actually set in its representation then you can remove shift and add steps in the middle adjusting the the shifts remaining to "skip" over the zeros in the constants.

    If you have a small number of constants values to use at different times then create customized multiply sequences for each one.

    Finally if your y values are mostly constant but change occasionally and you don't know what they will be, a user setting or calibration setting for examples, then you could have some code dynamically create the multiply sequences at run time.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.

    Post Edited (heater) : 2/16/2009 12:58:50 PM GMT
  • LeonLeon Posts: 7,620
    edited 2009-02-16 13:33
    A lookup table is the fastest way to do it, but needs a lot of memory.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • BaggersBaggers Posts: 3,019
    edited 2009-02-16 17:51
    Leon, Prop hasn't got 64KB spare, which is what is needed for an 8bitx8bit multiply table, otherwise I'm sure he would have used that [noparse]:D[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    http://www.propgfx.co.uk/forum/·home of the PropGFX Lite

    ·
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-02-16 18:12
    Well, actually, since multiplication is commutative, you need "only" 32K words for a * b (a < b), plus another 256 words for the squares. Of course, that entails an additional compare with a possible swap to set up the lookup address.

    -Phil
  • BaggersBaggers Posts: 3,019
    edited 2009-02-16 22:51
    so all in all it's probably just as quick with 16 instructions [noparse]:D[/noparse] and saves shed loads of ram [noparse]:D[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    http://www.propgfx.co.uk/forum/·home of the PropGFX Lite

    ·
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-02-17 10:47
    Yup.

    -Phil
Sign In or Register to comment.