Fast PASM multiply by semi-static multipliers
Phil Pilgrim (PhiPi)
Posts: 23,514
This first post is merely peripheral to the title subject but lays the groundwork for covering that topic in the second post.
In my Hilbert Transform and [url=http://forums.parallax.com/showthread.php?133173[/url]FIR2PASM[/url] threads, I mentioned and made considerable use of a number representation called Canonical Signed Digits (CSD). This representation is used to minimize the number of steps required to multiply a variable by a constant. For example, to multiply a number by 15 (%01111) requires four adds and three shifts. But you can get the same result from multiplying by 16 (%10000) and subtracting one (%00001), which requires one add, one subtract and one shift. By breaking a number into its positive and negative CSD components (%10000 and %00001 in the previous example), it is possible to minimize the number of steps required to multiply by any constant value by an average of 33%. In fact, the worst case is reduced by half.
Every binary number has a unique CSD representation, such that the total number of "1" bits is minimized, and the bitwise OR of the positive and negative components has no runs of two or more sequential "1" bits. This means that the total number of "1" bits in a 32-bit CSD representation cannot exceed 16. Once a multiplier's CSD representation is found, it's simply a matter of writing an unrolled "loop" of shifts, adds, and subtracts to get a product. For example, x * 15 reduces to:
In the FIR2PASM example, the PASM code to perform the coefficient multiplications is generated by the Perl program that resides on the server. This is possible, since the coefficients never change. So the Perl program can do the CSD conversions and spit out the code.
But what happens if you have constant multipliers that aren't so constant? For example, suppose you have a whole bunch of points in 2D space that need to rotate about the origin by the same angle -- but an angle which can change. Or, similarly, you need to compute the burst-relative IQ color components of a pixel for which you have absolute color burst IQ components and absolute pixel IQ components. In the latter case the color burst components (burst(i) and burst(q) in the following formula) change only once per line, but the pixel components change for every pixel:
So, if you have multipliers that change infrequently compared to the number of times they're used, but change nonetheless, is there a way to produce the optimum multiplication code on the fly in the Propeller? Yes, of course there is! But first, you have to compute the CSD coefficients of the new multiplier.
Attached is a program that does the CSD conversion in both Spin and PASM for comparison. Both are based on the following subroutine that I used in the Perl FIR2PASM code (for all you Perl-heads out there):
In the next post (tomorrow, hopefully), I'll show how a PASM program can write it's own multiplication subroutine on the fly.
-Phil
In my Hilbert Transform and [url=http://forums.parallax.com/showthread.php?133173[/url]FIR2PASM[/url] threads, I mentioned and made considerable use of a number representation called Canonical Signed Digits (CSD). This representation is used to minimize the number of steps required to multiply a variable by a constant. For example, to multiply a number by 15 (%01111) requires four adds and three shifts. But you can get the same result from multiplying by 16 (%10000) and subtracting one (%00001), which requires one add, one subtract and one shift. By breaking a number into its positive and negative CSD components (%10000 and %00001 in the previous example), it is possible to minimize the number of steps required to multiply by any constant value by an average of 33%. In fact, the worst case is reduced by half.
Every binary number has a unique CSD representation, such that the total number of "1" bits is minimized, and the bitwise OR of the positive and negative components has no runs of two or more sequential "1" bits. This means that the total number of "1" bits in a 32-bit CSD representation cannot exceed 16. Once a multiplier's CSD representation is found, it's simply a matter of writing an unrolled "loop" of shifts, adds, and subtracts to get a product. For example, x * 15 reduces to:
neg product,x shl x,#4 add product,x
In the FIR2PASM example, the PASM code to perform the coefficient multiplications is generated by the Perl program that resides on the server. This is possible, since the coefficients never change. So the Perl program can do the CSD conversions and spit out the code.
But what happens if you have constant multipliers that aren't so constant? For example, suppose you have a whole bunch of points in 2D space that need to rotate about the origin by the same angle -- but an angle which can change. Or, similarly, you need to compute the burst-relative IQ color components of a pixel for which you have absolute color burst IQ components and absolute pixel IQ components. In the latter case the color burst components (burst(i) and burst(q) in the following formula) change only once per line, but the pixel components change for every pixel:
I = k ( pixel(i) burst(q) - pixel(q) burst(i) )
Q = -k ( pixel(q) burst(q) + pixel(i) burst(i) )
Q = -k ( pixel(q) burst(q) + pixel(i) burst(i) )
So, if you have multipliers that change infrequently compared to the number of times they're used, but change nonetheless, is there a way to produce the optimum multiplication code on the fly in the Propeller? Yes, of course there is! But first, you have to compute the CSD coefficients of the new multiplier.
Attached is a program that does the CSD conversion in both Spin and PASM for comparison. Both are based on the following subroutine that I used in the Perl FIR2PASM code (for all you Perl-heads out there):
sub bin2csd { my $b = shift; my $orig = $b; my ($p, $c) = (0, 0); foreach my $nh (0 .. 31) { my $i = 31 - $nh; my $twobits = $b & 3; if ($twobits == 0) { if ($c) { $p |= 0x8000_0000; $c = 0; } } elsif ($twobits == 1) { $p |= 0x8000_0000 unless $c } elsif ($twobits == 3) { $c = 1 } if ($i) { $p >>= 1; $b >>= 1; } } return ($p, $p - $orig) }
In the next post (tomorrow, hopefully), I'll show how a PASM program can write it's own multiplication subroutine on the fly.
-Phil
spin
5K
Comments
Here's sample output from one iteration:
-Phil
But in reality, I have never thought about doing it in a micro. This has some really interesting prospects for speeding up programs, particularly the sort you are working on
Very interesting, thanks for posting.
Cheers,
Peter (pjv)