View Full Version : Fastest 16X16 multiply and divide

electric550

07-16-2009, 02:59 PM

Is it correct to assume that the fastest 16x16 multiplication of two unkown variables, is the "school multiplication" method from desilvas assmbly tutorial? and similarly is the "school division" after that the fastest method for division?

Try Booth's algorithm.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Amateur radio callsign: G1HSM

Suzuki SV1000S motorcycle

heater

07-16-2009, 04:57 PM

Does anyone have a signed 16 (or 32) bit multiply using Booths algorithm in PASM? Last time I thought about it it seemed rather long winded in PASM.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

For me, the past is not over yet.

I found this MIPS code for it:

.text 0x400000

main:

li $v0, 5

syscall

move $s0, $v0

li $v0, 5

syscall

andi $s1, $v0, 0x0000ffff

sll $s0, $s0, 16 # into Upper-Halfword (for addition)

li $s4,0 # saving the last bit

li $s5,16 # counter

loop:

andi $s3, $s1, 1 # $s3 = LSB($s1)

beq $s3, $s4, endloop # 00 or 11 -> cont

beq $s3, $zero, runend # 01 -> runend

sub $s1, $s1, $s0 # beginning of a run

j endloop

runend:

add $s1, $s1, $s0

endloop:

sra $s1, $s1, 1 # arithmetic right shift

addi $s5, -1

move $s4, $s3

bne $s5, $zero, loop

li $v0, 1

move $a0, $s1

syscall

end:

li $v0, 10

syscall

I could test it with the PIC32 simulator. It should be quite easy to convert it into PASM.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Amateur radio callsign: G1HSM

Suzuki SV1000S motorcycle

ericball

07-16-2009, 08:49 PM

Unsigned multiply is pretty simple:

umultiply MOV mulout, #0

:mulloop SHR mulin1, #1 WC,WZ

IF_C ADD mulout, mulin2

SHL mulin2, #1

IF_NZ JMP #:mulloop

umultiply_ret RET

It requires 4+16*# bits·in mulin1 (+8 for the CALL/RET) cycles.· Add 16 more cycles to automatically swap mulin1 & mulin2.

I've looked at Booth's multiplication algorithm (http://en.wikipedia.org/wiki/Booth%27s_multiplication_algorithm)·and I'm not sure it's any more efficient than a brute force method.· The inner loop is still done the same number of times.· The algorithm also requires a test against the 2 LSBs of a value, which doesn't map nicely to a single PASM instruction.· It also assumes 33 bit registers (for a 16x16 multiply), although it may be possible to cheat using the Carry bit.

One thing to pay attention to with all algorithms is the assumptions they are based on - "Booth used desk calculators that were faster at shifting than adding and created the algorithm to increase their speed".· Since PASM doesn't have that constraint, there may be better algorithms.·

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Composite NTSC sprite driver: Forum (http://forums.parallax.com/showthread.php?p=800114)

NTSC & PAL driver templates: Forum (http://forums.parallax.com/showthread.php?p=803904)

OnePinTVText driver: ObEx (http://obex.parallax.com/objects/480/) Forum (http://forums.parallax.com/showthread.php?p=822453)

It tends to be used more for hardware multipliers than ones implemented in software, probably for that reason. A single-cycle shift would probably help; the Propeller takes four clocks, IIRC.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Amateur radio callsign: G1HSM

Suzuki SV1000S motorcycle

Post Edited (Leon) : 7/16/2009 1:13:07 PM GMT

ericball

07-16-2009, 09:09 PM

Leon said...

It tends to be used more for hardware multipliers than ones implemented in software, probably for that reason.

Yeah, I just saw this in the Wikipedia description - "Booth's algorithm performs fewer additions and subtractions than the normal multiplication algorithm."· But on a Prop, performing the ADD/SUB takes no more time than skipping it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Composite NTSC sprite driver: Forum (http://forums.parallax.com/showthread.php?p=800114)

NTSC & PAL driver templates: Forum (http://forums.parallax.com/showthread.php?p=803904)

OnePinTVText driver: ObEx (http://obex.parallax.com/objects/480/) Forum (http://forums.parallax.com/showthread.php?p=822453)