Fast, counter-augmented fixed multiply
Phil Pilgrim (PhiPi)
Posts: 23,514
Inspired by this thread, I decided to write a general fixed multiply routine using ctra to do some of the work. Basically, multiplication is just a combination of shifts and adds. In this implementation, the PASM program does the shifts and the counter does the adds in parallel, cutting the program's execution time by nearly half. Because the counter does four adds for every shift, the result has to be corrected at the end.
Here's a demo program that multiplies by 12345:
Going one step further, I decided to write a Spin program that generates the PASM code for any muliplier:
It produces optimized code that eliminates the final correction if the last two bits of the multiplier are zeroes. If only the last bit is zero, it reduces the amount of the correction by one bit. This is done to provide the greatest amount of headroom without producing an overflow. The user still needs to make sure that the operands are in a range that will not overflow, however.
Anyway, I've only cursorily tested the code. If you find any errors, please post the issue here. Thanks.
-Phil
Here's a demo program that multiplies by 12345:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 OBJ sio : "FullDuplexSerial" VAR long cmd, multiplicand, multiplier PUB start cognew(@multiply, @cmd) sio.start(31, 30, 0, 115200) multiplier := 12345 repeat multiplicand := get_dec sio.dec(multiplicand) sio.str(string(" * ")) sio.dec(multiplier) sio.str(string(" == ")) sio.dec(multiplicand * multiplier) cmd~~ sio.str(string(" == ")) sio.dec(multiplicand) sio.tx(13) PRI get_dec | char repeat char := sio.rx case char "0" .. "9" : result := result * 10 + char - "0" 13 : return result DAT org 0 multiply mov ctra,ctra0 mov mpptr,par add mpptr,#4 cmd_lp rdlong mpcnd,par wz if_z jmp #cmd_lp rdlong mpcnd,mpptr '-------[Replace with generated code]-------------------------------------------------------------- 'Multiply by 12345 == $3039 == %0011_0000_0011_1001. mov frqa,#0 'Zero the multiplicand. mov phsa,#0 'Zero the product. mov frqa,mpcnd 'Bit 0. shl frqa,#3 'Bit 3. shl frqa,#1 'Bit 4. shl frqa,#1 'Bit 5. shl frqa,#7 'Bit 12. shl frqa,#1 'Bit 13. mov frqa,#0 'Let Bit 13 finish. mov product,phsa 'Read product. shr product,#2 'Correct by four counts per instruction. '-------------------------------------------------------------------------------------------------- wrlong product,mpptr wrlong zero,par jmp #cmd_lp ctra0 long %00100 << 26 zero long 0 product mpcnd res 1 mpptr res 1
Going one step further, I decided to write a Spin program that generates the PASM code for any muliplier:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 OBJ sio : "FullDuplexSerial" VAR long multiplier PUB start | shrgt, shlft, zeroes sio.start(31, 30, 0, 115200) repeat multiplier := get_dec sio.str(@comment) sio.dec(multiplier) sio.str(string(13, 13)) multiplier ><= 32 shlft := 32 - >| multiplier multiplier <<= shlft shrgt := (2 - shlft) #> 0 shlft := shlft - (2 - shrgt) sio.str(@zero_frqa) sio.str(@zero_phsa) if (shlft) sio.str(@shl_mpcnd) sio.dec(shlft) sio.tx(13) sio.str(@mov_frqa) repeat while (multiplier) if (multiplier &= $7fff_ffff) shlft := 32 - >| multiplier sio.str(@shl_frqa) sio.dec(shlft) sio.tx(13) multiplier <<= shlft sio.str(@zero_frqa) sio.str(@mov_product) if (shrgt) sio.str(@shr_product) sio.dec(shrgt) sio.tx(13) PRI get_dec | char repeat char := sio.rx case char "0" .. "9" : result := result * 10 + char - "0" 13 : return result DAT comment byte 13, "'Multiply by ", 0 zero_frqa byte " mov frqa,#0", 13, 0 zero_phsa byte " mov phsa,#0",13, 0 shl_mpcnd byte " shl mpcnd,#", 0 mov_frqa byte " mov frqa,mpcnd", 13, 0 shl_frqa byte " shl frqa,#", 0 mov_product byte " mov product,phsa", 13, 0 shr_product byte " shr product,#", 0
It produces optimized code that eliminates the final correction if the last two bits of the multiplier are zeroes. If only the last bit is zero, it reduces the amount of the correction by one bit. This is done to provide the greatest amount of headroom without producing an overflow. The user still needs to make sure that the operands are in a range that will not overflow, however.
Anyway, I've only cursorily tested the code. If you find any errors, please post the issue here. Thanks.
-Phil
Comments
And if someone really what to save 4 cycles by getting rid of shr , setup CTRB to 20mhz to feed CTRA on the same pin(?) instead of always, but waste 1 pin
Over on msp430 forum I was showed what I learned to create the fastest 8bit x 8bit ->16bit multiplication ever made for the msp430
a jnc take two cycles so instead I do +1 or +2 to PC
For the last years most msp's have a 0 fixed for bit0 of PC so +1 does noting but +2 skips to the next opcode.
msp430 also have counters, but no frqa so always +1 but maybe I can still do some tricks with it.
@PhiPi: Wouldn't a LOGIC.always be a more logical choice (movi ...)? Not relevant for your example but if it becomes part of something bigger then you have to keep an eye on it.