PDA

View Full Version : Fastest 16X16 multiply and divide



electric550
07-16-2009, 01:59 PM
Is it correct to assume that the fastest 16x16 multiplication of two unkown variables, is the "school multiplication" method from desilvas assmbly tutorial? and similarly is the "school division" after that the fastest method for division?

Leon
07-16-2009, 03:49 PM
Try Booth's algorithm.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle

heater
07-16-2009, 03:57 PM
Does anyone have a signed 16 (or 32) bit multiply using Booths algorithm in PASM? Last time I thought about it it seemed rather long winded in PASM.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Leon
07-16-2009, 05:10 PM
I found this MIPS code for it:




.text 0x400000
main:
li $v0, 5
syscall
move $s0, $v0
li $v0, 5
syscall
andi $s1, $v0, 0x0000ffff
sll $s0, $s0, 16 # into Upper-Halfword (for addition)
li $s4,0 # saving the last bit
li $s5,16 # counter
loop:
andi $s3, $s1, 1 # $s3 = LSB($s1)
beq $s3, $s4, endloop # 00 or 11 -> cont
beq $s3, $zero, runend # 01 -> runend
sub $s1, $s1, $s0 # beginning of a run
j endloop
runend:
add $s1, $s1, $s0
endloop:
sra $s1, $s1, 1 # arithmetic right shift
addi $s5, -1
move $s4, $s3
bne $s5, $zero, loop

li $v0, 1
move $a0, $s1
syscall
end:
li $v0, 10
syscall






I could test it with the PIC32 simulator. It should be quite easy to convert it into PASM.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle

ericball
07-16-2009, 07:49 PM
Unsigned multiply is pretty simple:


umultiply MOV mulout, #0
:mulloop SHR mulin1, #1 WC,WZ
IF_C ADD mulout, mulin2
SHL mulin2, #1
IF_NZ JMP #:mulloop
umultiply_ret RET


It requires 4+16*# bits·in mulin1 (+8 for the CALL/RET) cycles.· Add 16 more cycles to automatically swap mulin1 & mulin2.

I've looked at Booth's multiplication algorithm (http://en.wikipedia.org/wiki/Booth%27s_multiplication_algorithm)·and I'm not sure it's any more efficient than a brute force method.· The inner loop is still done the same number of times.· The algorithm also requires a test against the 2 LSBs of a value, which doesn't map nicely to a single PASM instruction.· It also assumes 33 bit registers (for a 16x16 multiply), although it may be possible to cheat using the Carry bit.

One thing to pay attention to with all algorithms is the assumptions they are based on - "Booth used desk calculators that were faster at shifting than adding and created the algorithm to increase their speed".· Since PASM doesn't have that constraint, there may be better algorithms.·



▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum (http://forums.parallax.com/showthread.php?p=800114)
NTSC & PAL driver templates: Forum (http://forums.parallax.com/showthread.php?p=803904)
OnePinTVText driver: ObEx (http://obex.parallax.com/objects/480/) Forum (http://forums.parallax.com/showthread.php?p=822453)

Leon
07-16-2009, 07:56 PM
It tends to be used more for hardware multipliers than ones implemented in software, probably for that reason. A single-cycle shift would probably help; the Propeller takes four clocks, IIRC.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle

Post Edited (Leon) : 7/16/2009 1:13:07 PM GMT

ericball
07-16-2009, 08:09 PM
Leon said...
It tends to be used more for hardware multipliers than ones implemented in software, probably for that reason.

Yeah, I just saw this in the Wikipedia description - "Booth's algorithm performs fewer additions and subtractions than the normal multiplication algorithm."· But on a Prop, performing the ADD/SUB takes no more time than skipping it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum (http://forums.parallax.com/showthread.php?p=800114)
NTSC & PAL driver templates: Forum (http://forums.parallax.com/showthread.php?p=803904)
OnePinTVText driver: ObEx (http://obex.parallax.com/objects/480/) Forum (http://forums.parallax.com/showthread.php?p=822453)