Shop OBEX P1 Docs P2 Docs Learn Events
Fastest 16X16 multiply and divide — Parallax Forums

Fastest 16X16 multiply and divide

electric550electric550 Posts: 122
edited 2009-07-16 13:09 in Propeller 1
Is it correct to assume that the fastest 16x16 multiplication of two unkown variables, is the "school multiplication" method from desilvas assmbly tutorial? and similarly is the "school division" after that the fastest method for division?

Comments

  • LeonLeon Posts: 7,620
    edited 2009-07-16 08:49
    Try Booth's algorithm.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • heaterheater Posts: 3,370
    edited 2009-07-16 08:57
    Does anyone have a signed 16 (or 32) bit multiply using Booths algorithm in PASM? Last time I thought about it it seemed rather long winded in PASM.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • LeonLeon Posts: 7,620
    edited 2009-07-16 10:10
    I found this MIPS code for it:

    .text 0x400000
    main:
        li $v0, 5
        syscall
        move $s0, $v0
        li $v0, 5
        syscall
        andi $s1, $v0, 0x0000ffff
        sll $s0, $s0, 16                    # into Upper-Halfword (for addition)
        li $s4,0                            # saving the last bit
        li $s5,16                           # counter
    loop:
        andi $s3, $s1, 1                    # $s3 = LSB($s1)
        beq $s3, $s4, endloop               # 00 or 11 -> cont
        beq $s3, $zero, runend              # 01 -> runend
        sub $s1, $s1, $s0                   # beginning of a run
        j endloop
    runend:
        add $s1, $s1, $s0
    endloop:
        sra $s1, $s1, 1                     # arithmetic right shift
        addi $s5, -1
        move $s4, $s3
        bne $s5, $zero, loop 
        
        li $v0, 1
        move $a0, $s1
        syscall   
    end:
        li $v0, 10
        syscall
        
    
    
    



    I could test it with the PIC32 simulator. It should be quite easy to convert it into PASM.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • ericballericball Posts: 774
    edited 2009-07-16 12:49
    Unsigned multiply is pretty simple:
    umultiply      MOV     mulout, #0
    :mulloop       SHR     mulin1, #1      WC,WZ
            IF_C   ADD     mulout, mulin2
                   SHL     mulin2, #1
            IF_NZ  JMP     #:mulloop
    umultiply_ret  RET
    
    

    It requires 4+16*# bits·in mulin1 (+8 for the CALL/RET) cycles.· Add 16 more cycles to automatically swap mulin1 & mulin2.

    I've looked at Booth's multiplication algorithm·and I'm not sure it's any more efficient than a brute force method.· The inner loop is still done the same number of times.· The algorithm also requires a test against the 2 LSBs of a value, which doesn't map nicely to a single PASM instruction.· It also assumes 33 bit registers (for a 16x16 multiply), although it may be possible to cheat using the Carry bit.

    One thing to pay attention to with all algorithms is the assumptions they are based on - "Booth used desk calculators that were faster at shifting than adding and created the algorithm to increase their speed".· Since PASM doesn't have that constraint, there may be better algorithms.·



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Composite NTSC sprite driver: Forum
    NTSC & PAL driver templates: Forum
    OnePinTVText driver: ObEx Forum
  • LeonLeon Posts: 7,620
    edited 2009-07-16 12:56
    It tends to be used more for hardware multipliers than ones implemented in software, probably for that reason. A single-cycle shift would probably help; the Propeller takes four clocks, IIRC.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle

    Post Edited (Leon) : 7/16/2009 1:13:07 PM GMT
  • ericballericball Posts: 774
    edited 2009-07-16 13:09
    Leon said...
    It tends to be used more for hardware multipliers than ones implemented in software, probably for that reason.
    Yeah, I just saw this in the Wikipedia description - "Booth's algorithm performs fewer additions and subtractions than the normal multiplication algorithm."· But on a Prop, performing the ADD/SUB takes no more time than skipping it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Composite NTSC sprite driver: Forum
    NTSC & PAL driver templates: Forum
    OnePinTVText driver: ObEx Forum
Sign In or Register to comment.