Shop OBEX P1 Docs P2 Docs Learn Events
Derivation of Algorithms for Multiplication by a Constant Fraction — Parallax Forums

Derivation of Algorithms for Multiplication by a Constant Fraction

Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
edited 2006-10-25 08:41 in Propeller 1
In a recent thread on the CORDIC algorithm, Chip demonstrated his derivation of a quick algorithm for multiplication of any multiplicand by the constant multiplier 0.607252935, rounded up to the nearest 16-bits, or $9B75. By recognizing a repeating bit pattern in the multiplier, he was able to accomplish the feat in only twelve assembly instructions. I thought it might be possible to automate, for any 16-bit constant fraction, the generation of fairly optimal code to perform multiplication by that fraction. This is because the search space, for a certain class of solutions at least, is not terribly large and can be scanned exhaustively for the best solution in a matter of minutes.

The class of solutions I considered encompasses what I call one- and two-tier algorithms. A one-tier algorithm is one in which the multiplicand is shifted repeatedly and either added to or subtracted from the product at each step. A two-tier algorithm is one in which a partial product is formed from the multiplier and multiplicand, as in the one-tier case, after which this partial product is shifted and added to or subtracted from the product. Chip's algorithm, by contrast, is three-tiered: a second partial product was formed by shifting and adding the first one before being used to operate on the final product. My feeling was that, by trying to encompass three-tiered algorithms, the search space would simply be too large to scan exhaustively.

The attached Spin program takes a single variable: a sixteen-bit fraction for which the multiply algorithm is to be derived. It uses TV_wtext to display its progress. At the end, it doesn't actually display any assembler code, but rather a pattern and a mask from which the desired code can be derived. A typical result will look something like this:

DC85 1101110010000101 <- Target
8000 ++0+++00+0000+0+ 16
8000 0-0+++00+0000+0+ 15
8000 00-0-+00+0000+0+ 13
8000 00-00-00+0000+0+ 11




The first line represents the hex and binary expression of the chosen fraction. Each line after that displays a successively better solution to the problem, and consists of:
  • A pattern in hex.
  • An add/subtract mask.
  • The number of assembly instructions required to implement the solution.
The assembly program represented by the pattern and mask will first create a subproduct based on the pattern (if the pattern contains more than a single "one"-bit), then shift the subproduct to each successive non-zero position in the mask and perform the indicated operation. For example, the best solution to the above problem would be:

[b]mov[/b]   t,x
[b]sar[/b]   t,#3
[b]sub[/b]   x,t
[b]sar[/b]   t,#3
[b]sub[/b]   x,t
[b]sar[/b]   t,#3
[b]add[/b]   x,t
[b]sar[/b]   t,#5
[b]add[/b]   x,t
[b]sar[/b]   t,#2
[b]add[/b]   x,t




Likewise, the best two-tier solution for Chip's constant, would look like this (along with the implied code):

9B75 1001101101110101 <- Target
8800 +00+00+00+0+0000 13

[b]sar[/b]   x,#1  'x := x * %0.10000
[b]mov[/b]   t,x   'Copy x to t.
[b]sar[/b]   t,#4  'Shift it right by four
[b]add[/b]   x,t   'Add back to x, yielding 0.10001 of x
[b]mov[/b]   t,x   'Copy .10001 x to t.
[b]sar[/b]   t,#3  'Shift and add per the derived mask.  
[b]add[/b]   x,t
[b]sar[/b]   t,#3
[b]add[/b]   x,t
[b]sar[/b]   t,#3
[b]add[/b]   x,t
[b]sar[/b]   t,#2
[b]add[/b]   x,t



As you can see, Chip's three-tier solution beats the best two-tier solution by one instruction.

In gauging how many instructions each solution requires, several code templates are pulled into play. The chosen template is determined by how many "one"-bits there are in the pattern, how many non-zero operations are in the mask, and the sign of the first non-zero operation. These templates are all listed in the source code.

Certainly, a lot more could be done with this. No attempt at any heuristic improvement was made. It just generates all the combinations and tests them. But, in a pinch, even that may be useful.

Hopefully, I've haven't made any blunders here, but I haven't given it a whole lot of testing either. So use at your own risk, and test your results! Also, if anyone can improve on the efficiency fo the templates, please let me know, and I'll include them in a new version.

Cheers!
Phil

Update (2006.10.22a): Replaced the uncommented Spin file with an archive containing the same file with comments.
Update (2006.10.22b): Fixed a bug that prevented the algortihm from terminating properly.

Post Edited (Phil Pilgrim (PhiPi)) : 10/23/2006 5:08:23 AM GMT

Comments

  • cgraceycgracey Posts: 14,206
    edited 2006-10-22 06:34
    Phil,

    That's a very useful tool you made for coming up with constant multiplier recipes. It really has universal application.·I'm glad you worked subtraction in, as well as addition, since it's harder to visualize.

    I recently had to make a divide-by-3 algorithm (or multiply by 1/3), and I found perhaps the ultimate example of geometric exploitation·in a constant multiplier. Here is what 1/3 is in 32-bit binary:

    %0.01010101010101010101010101010101

    To see this number, put your Windows Calculator in Scientific mode and type the following:

    Dec C 1 / 3 = * 2 x^y 32 = Bin

    Here is some code which divides a signed 32-bit number by 3 in just 13 instructions:

    ' Divide x by 3

    ···· sar···· x,#2······ 'get %0.01

    ···· mov···· t,x······· 'add %0.0001
    ···· sar···· t,#2······
    ···· add···· x,t······· 'sum %0.0101

    ···· mov···· t,x······· 'add %0.00000101
    ···· sar···· t,#4
    ···· add···· x,t······· 'sum %0.01010101

    ···· mov···· t,x······· 'add %0.0000000001010101
    ···· sar···· t,#8
    ···· add···· x,t······· 'sum %0.0101010101010101

    ···· mov···· t,x······· 'add %0.00000000000000000101010101010101
    ···· sar···· t,#16
    ···· add···· x,t······· 'sum %0.01010101010101010101010101010101

    I·found that many simple reciprocals have very exploitable·geometric patterns. For example, 1/5:

    %0.00110011001100110011001100110011...

    1/7:

    %0.00100100100100100100100100100100...

    1/9:

    %0.00011100011100011100011100011100...

    Even fractions like 3/5·are simple:

    %0.10011001100110011001100110011001... (hmmm... compare to 1/5)

    Implementing a hard-coded geometric multiplier for any of these would beat the pants off the alternatives.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 10/22/2006 7:09:53 AM GMT
  • Cliff L. BiffleCliff L. Biffle Posts: 206
    edited 2006-10-22 08:09
    Nicely done. If you don't have one already, pick up a copy of Henry S. Warren's "Hacker's Delight." You'll appreciate it. smile.gif
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2006-10-22 09:42
    The attached graphics illustrate another way look at the decompositions. I guess it's a tutorial for myself at least, interpreting Doctor PhiPi!

    The graphic for Phil's first example is drawn from the standpoint of a polynomial in powers of two, back to the basics of the number place system. It is to be understood that each term is multiplied times the variable, x.

    attachment.php?attachmentid=43751

    In theis example, the top line under Phil's notation is the straightforward multiplication by a binary fraction. The variable x is shifted by some number of positions and whenever there is a 1 in the corresponding term of the multiplier, that shift of x is added into the accumulator. In general, multiplicaton there are 16 shifts and 16 conditional adds, so the general purpose routine will take 32 operations. But when the multiplier is known in advance and there are zeros, the Prop can in one instruction shift over the "zero" bits to the next "one" bit. That procedure brings this particular example down to 16 operations total. The optimization possible is a straight function of the number of zeros in the multiplier.

    Phil's algorithm optimizes further by discovering the amazing fact that (x/2)+(x/4) = x-(x/4). So starting with x, it subracts (x/4) and thus reduces two operations to one. Carrying that further, it reduces the fractional computation to x-(x/8)-(x/32). You can verify that agrees with the previous algorithm to get 27x/32 at that point. It then further it reduces to x-(x/8)-(x/64), which is 55x/64. The subtractions, replacing all the additions, reduces the whole algorithm from 16 to 11 instructions.

    The "pattern" as Phil terms it is shown as $8000. That represents in its leftmost bit the factor of "1" in each numerator of the polynomial.

    In the example of the Cordic gain, the pattern is $8800, or %10001.... in binary, which comes out as decimal 17 in the polynomial. The diagam shows the layout, with the factor 17/32 in red. That factor is used repeatedly for the calculation.

    attachment.php?attachmentid=43752

    Again it is to be understood that the term 17x/32 is computed first, and the further calculations are based on that. The procedure takes 13 instructions. That can be compared with the 20 instructions it would take with the first tier zero-skipping optimization, the first polynomial with all 1s in the numerators.

    It is possible to roll the three successive divisions by 8 into a loop. In that approach the multiply takes only 11 program locations, however, it takes 18 clock cycles due to setting up and jumping in the loop.

    Chip's "third tier" algorithm further discovered that the red super-pattern in the last examle repeats twice. That insight allows doing the multiply in 12 instructions, 12 clock cycles.

    The immediate question is to characterize the set of numbers that can be written written in the form of a polynomial with a number like 17 in _all_ the numerators. I would hazard a guess that that exact class of numbers is sparse. On the other hand, the class of numbers that can be written with both a number like 17 and also 1 as coefficients of the polynomial would contain all numbers, because 1 is the universal basis. One could ask about numbers that can be written as polynomials with three coefficients, say 13, 17 and 1. Like Phil was saying, the combinatorics become daunting,. It could be worthwhile for longer word sizes.

    The optimization by finding the subtractions is yet another thing. There the coeffient is allowed t be either (+) or (-), which as Phil has shown can help significantly. Looking at the actual search code in the program, Phil, I'm kind of lost in the details.

    Below is the polynomial for multiplier =1/3 and the regrouping, the "geometric" folding in Chip's algorithm. The inner red is the first step, then that is repeated and added to itself divided by two, and then the whole thing is added to itself divided by 4, and then that whole thing added to itself divided by 8, and that whole thing (shown compressed) added to itself divided by 16.

    attachment.php?attachmentid=43753

    I guess this underscores the benefit of more tears, arrrrh, tiers, when dealing with small rational fractions.

    As an aside, if one has a 64 bit accumulator, the computation of 1/3 can be done as a multiply times 1431655765 (same number, decimal pont on the right instead of on the left, as in the following steps
    x
    x + (x*4)
    x + (x*4) + (x + (x*4) * 16)
    x + (x*4) + (x + (x*4) * 16) + (x + (x*4) + (x + (x*4) * 16))*256)
    x + (x*4) + (x + (x*4) * 16) + (x + (x*4) + (x + (x*4) * 16))*256) + ...
    . . . . . . . (x + (x*4) + (x + (x*4) * 16) + (x + (x*4) + (x + (x*4) * 16))*256))*65536

    and then shift right 32 bits to complete the computation of 1431655765x/(2^32) = x/3. Managing a 64 bit accumulator would take a lot of overhead. But this multiplicative approach would be practical when dealing with 16 bit words in a 32 bit accumulator. I just bring this up to illustrate how these methods are related and can be understood through the basis polynomial.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com

    Post Edited (Tracy Allen) : 10/22/2006 10:18:49 AM GMT
    551 x 349 - 9K
    590 x 94 - 8K
    832 x 134 - 15K
  • Graham StablerGraham Stabler Posts: 2,510
    edited 2006-10-22 13:54
    Great stuff Phil, strangely when you talked before about searching for a solution I was expecting the search to be carried out on a PC. To be perfectly honest I still need to sit down with a piece of paper and learn how Chip's origional method worked but its nice to know this tool is ready to be used.

    It also rminds me that the propeller itself is a very useful tool for teaching number concepts because it is so easy to display binary or hex as needed, it would be possible to program "animations" showing how various binary tricks work without too much work. I had considered doing a graphical demo of the cordic actually showing it moving to the solution using the graphics, that would be neat but I've got some other fish frying.

    Graham

    p.s. Phil's text object is here
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2006-10-22 21:07
    Phil,

    Is is correct to say that the factor that is extracted in your algorithm is always going to be one of the multiplicative factors of the target number? For example, the CORDIC gain is 0.60725..., but as a binary fraction to 16 bits that is approximated as 39797/65536. And 17 is a factor of that,
    39797 = 17 * 2341.
    It happens that 2341 is prime, but that is not so important.

    So, factor out the 17, and that is the first step in the algorithm, the computation of 17x/32. The computation is then
    17x/32 * (2341/2048).

    The first heuristic for the algorithm should then be to find all the factors of the target number and evaluate those. Does that make sense, or maybe that is what the algoritm is doing already? A number that has lots of prime factors would allow many possible algorithms but the inclusion of only the multipllicative factors narrows it down considerably.

    Suppose the number, x does not have a convenient multiplicative factor. Say, it is itself a large prime. It would be possible to partition the original number by addition.
    z = x + y
    where x has a convenient multiplicative factor and y is small and easy to compute. That would be the same as allowing 1 as a coefficient, in addition to +/-k and zero, and would allow a wide expansion of numbers that would be amenable to short algorithms. From the algorithm standpoint, allowing a coefficient of +/- 1 in addition to +/-k and zero does not have much cost, because the initial value of x can be stored in a register in addition to the precomputed k/2^? factor and shifted in where needed to make the up the small number y.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2006-10-22 22:53
    Cliff L. Biffle said...
    Nicely done. If you don't have one already, pick up a copy of Henry S. Warren's "Hacker's Delight." You'll appreciate it. smile.gif

    Cliff, that sounds like a great book. I notice that one of the reviewers on Amazon said the book is no longer relevant because computers no longer have barrel shifters. Well SAR ry,#9 to them!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2006-10-22 23:51
    'Just got back up from a major power outage here in town. A line blew up. Must've been halftime, with everyone melting cheese on their nachos at once.

    I've replaced the Spin file in the original post with an archive that includes the TV_wtext object. Graham, thanks for covering for me on that. Tracy, I've added comments, too, which I should've done from the get-go. I hope it's a little less opaque now.

    Chip, the repeating "binamals" (well, they're not decimals -- not sure what to call them) are truly fascinating. I'm wondering how to treat them at the endpoint if the repeat length isn't a power of two. Do you just keep shifting the pattern off the end in that case? I guess that would make the most sense. What's also interesting is that fractions like 1/7 that take 6 digits to repeat in decimal do so after only three in binary. But 7 is 23 - 1. Consider 999, which is 103 - 1. The decimal expansion for 1/999 is 0.001001001001... Hmmm.

    Cliff, I really need to get that book, if the chapter that's online is any indication of how good the rest of it is. (With a title like that, it's probably on a Homeland Security watchlist somewhere.)

    Tracy, I need to cogitate on the prime factor stuff a bit. But it looks like you may be onto something! If it works out, it will embolden me to extend to 32 bits.

    -Phil

    P.S. Graham, actually I was planning to do it on the PC at first. But that would've been in Perl, I'd've had to upload a 1M .exe just for PC users (only) to try it out, and no one would be able to view the internals — unless they were into Perl, that is. Doing it on the Prop gives everyone a chance to see what's going on and modify it, if they want. Plus it may even be faster, since the guts are done in assembly. (Unlike Chip, I don't touch assembly on the PC! smile.gif )

    Post Edited (Phil Pilgrim (PhiPi)) : 10/23/2006 12:14:30 AM GMT
  • Cliff L. BiffleCliff L. Biffle Posts: 206
    edited 2006-10-23 01:47
    Tracy Allen said...
    Cliff, that sounds like a great book.

    I dig it, though I've sworn to murder any of my coworkers if they ever use its trickery in production code. smile.gif

    I tore through it on the Caltrain when I was working JavaOne this year. ...which probably tells you something about how fantastically square I am.
    Tracy Allen said...
    I notice that one of the reviewers on Amazon said the book is no longer relevant because computers no longer have barrel shifters. Well SAR ry,#9 to them!

    That's precisely the sort of bitter thing I would have said, but it only really applies to the P4. PowerPCs, Pentiums before the P4, the Intel Core chips, and most embedded processors all have single-cycle shift operations.

    Personally, I'm doing horrible things with MUXC, RCL/RCR, and REV. The Propeller is great for this sort of thing.
  • cgraceycgracey Posts: 14,206
    edited 2006-10-23 03:46
    Phil Pilgrim (PhiPi) said...


    Chip, the repeating "binamals" (well, they're not decimals -- not sure what to call them) are truly fascinating. I'm wondering how to treat them at the endpoint if the repeat length isn't a power of two. Do you just keep shifting the pattern off the end in that case? I guess that would make the most sense. What's also interesting is that fractions like 1/7 that take 6 digits to repeat in decimal do so after only three in binary. But 7 is 23 - 1. Consider 999, which is 103 - 1. The decimal expansion for 1/999 is 0.001001001001... Hmmm.
    Yeah, I would just keep doing the geometric shifts and adds, even if a big chunk of the last one ran below the sum's lsb. It would·finish the job·and only cost 3 instructions.

    All this math talk·lately has been·getting new gears turning in my head and I love it! I'm dreaming at night (between 3am and 8am) about sines, arctangents, e-to-a-power, logarithms, antilogs, and this thing called the Zeta function that Tracy was talking about the other day (even though I don't really know WHAT to think about it, other than it's fascinating and hints at many other likelihoods and possibilities). There is just so much to learn and I feel like we are on the cusps of all kinds of neat little (perhaps re-)discoveries that could make very fun and interesting things possible with only a smidgeon of hardware and software. I mean, what can be done with just shifts and adds is amazing, but apply those techniques to log-arized numbers and you are now dealing in the exponential domain. I am really excited by·the x-ray perspective·many of you·share on computing. It's so much more stimulating to·ponder the·whole gamut, rather·than only·the high-level, and then dismissively suppose that our foundation is already optimal and now we are "smart" because of it.

    I used to have a book about the PC's insides, and each chapter started with some notable quote. One of the chapters started with this: "Human invention has long since reached its potential and I see no hope for further developments. - Julias Fontinus, Rome, 10 A.D." Imagine that!·If we think we know everything, we are hopelessly dumb.
    ·


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 10/23/2006 3:53:34 AM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2006-10-23 06:08
    Tracy,

    I tried my search algorithm on some prime numbers for the target (f), restricting solutions to those that had a pattern of two 1-bits. It still managed to find solutions. For example, the number 50929 = $C6F1 decomposed to "C000 00-0-0+00000-0-0". However, in each case the first operation was a subtract. What this means is that if k is the kernel (i.e. the pattern shifted right until the lsb == 1), then a two-tiered solution will still exist if

    ····(216 - f) = 0 (mod k)

    In the above example (65536 - 50929) = 14607, which is divisible by three.

    So f doesn't have to be a multiple of k, so long as 216 - f is. This expands the search space somewhat over just considering factors of f; but it's still a lot smaller than it was without considering them at all. Of course, now the computational burden shifts from combinatorics to factorization...

    -Phil
  • parskoparsko Posts: 501
    edited 2006-10-23 08:14
    Gentlemen,

    I've following both of these two threads, somewhat superficially. I am very interested in these math topics. Would anyone care to explain what the whole point is and explain a brief example?

    I understand you are trying to determine a way to do fast math, but is that simply it?

    The 4 or 5 of you prime contributors of these two threads are certainly more gifted nerds than I, so I don't expect to be able to contribute anything significant. But I am confused what exactly you are trying to do. I think it might also benefit the rest of the community too.

    Thanks,

    -Parsko

    PS- I know I'm going to have a need for some PASSY math for "conditioning" analog signals into usable/loggable 8 bit values, hence my interest...
  • Graham StablerGraham Stabler Posts: 2,510
    edited 2006-10-23 10:12
    Parkso,

    In my Cordic program I supply a vector length and angle and the algorithm provides R.cos(theta) and R.Sin(theta), before the algorithm comences you need to multiply the vector length by 0.607252935 which is a bit of a bummer really. However Chip looked at the binary for this number once converted to a binary fraction and noticed a pattern in the bits, using that pattern he came up with a program to do the multiplication in only 12 instructions.

    The question then became could this looking for the pattern be automated so that when you need to do a multiplication by a certain fractional constant you could run a search program and generate some code to stick in your program.

    Graham
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2006-10-23 17:52
    Parsko,

    Practically it does come down to speed and of code space (512 longs in a cog). Chip is workng on his voice synthesizer, which hinges on high speed adddition of rotating vectors. A lot of problems in signal processing and generation are like that and involve lots of repeated multiplications and fundamental math functions like sine/cosine/arctangent/power law/log/exponential. This thread has to do with how to multiply by a fixed constant as fast as possible using the smallest number of longs. Myself, I've long been fascinated by the properties of numbers and number representations, a pure math thing. It's fun to find a synergy of interests here on this forum.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2006-10-23 18:09
    Phil,

    I think that is an example of what I meant by decomposing the original number by a partition, x = y + z, where x is prime, but y has a convenient factor and z is easy to compute. In this case, (65536 - 50929) = 14607, is the same as
    50929 = 65536 - 14607
    where 14607 has convenient factors and the subtraction from 65536 is easy.

    There are many possible partitions. One might also try (as part of the combinatorics)
    50929 = 50928 + 1
    where the number 50928 has candidate prime factors 2*2*2*2*3*1061. All those factors of 2 should speed up the algorithm considerably. The final +1 represents a final addition of x/65536. That can be done first with one barrel shift, or keep x in a register and do it at the end. Any partiion of the form N = M +/- K where K is a power of two would be especially easy to compute, as it entails only one additional machine instruction.

    Here is another example, 50929 = 50925 + 4, which is 3*5*5*7*97 + 4. That offers nice possibilities. Again, the final term involving x*4/65536 is one barrel shift.


    Looking at the solution you mentioned,
    C000 00-0-0+00000-0-0
    That is for 14607, based on the factor $C=12, right? Then post hoc it has to find 50929 = 65536 - 14607?

    The point is that there are two fundamentally different kinds of subtraction going on. When your core algorithm posts a subtraction, that means it has found neighboring "1" bits so it can combine terms like x/2+x/4+x/8+x/16 into a much shorter term likex-(x/16). In that subtraction, both x and all of the terms involving x/(2^i) share the common factor. That is different from the subtraction in 50929 = 65536 - 14607, precisely because the terms in the partitian do _not_ share the same factors.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2006-10-23 20:15
    Tracy,

    Yes, if the first operation is a subtraction, that subtraction is different from all the rest — not only mathematically, but algorithmically too, since it requires special treatment in generating the code.

    Your partitioning idea does open up some huge possibilities — as well as enlarging the search space once again. smile.gif However, it also carries some costs: namely, the additional code required to save the original x and to do the final shift and add, which has to be factored into the solution's "score". But if doing so eliminates a leading subtract, that alone may offset the cost of that subtract.

    Interestingly, the leading subtract is really nothing more than a special case of the more general partitioning that you're suggesting, namely:

    ····f = ±p0p1...pn ± 2m

    Incidentally, I compared the optimal solutions (within my limited solution space) for 50929 with 50928 and 50925 and found the following:

    50929 (prime)        = C6F1: 8000 0-00+00-000-000+ (11 instructions)
    50928 (2*2*2*3*1061) = C6F0: 84A0 ++00000000000000 (11 instructions)
    50925 (3*5*5*7*97)   = C6ED: 8000 0-00+00-000-0-0+ (13 instructions)
    
    
    


    Two of these are one-tier solutions with a leading subtract. The best two-tier solutions for 50929 and 50925 required 15 and 13 instructions, respectively. (The kernel for the 50925 solution is 525 = 3*5*5*7.) One thing this illustrates is the incremental cost of adding a second or even third tier. I'm guessing that optimal three-tier solutions will be extremely rare.

    -Phil
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2006-10-25 08:41
    Chip Gracey (Parallax) said...

    I found that many simple reciprocals have very exploitable geometric patterns. For example, 1/5:
    %0.00110011001100110011001100110011...

    1/7:
    %0.00100100100100100100100100100100...

    1/9:
    %0.00011100011100011100011100011100...

    Even fractions like 3/5 are simple:
    %0.10011001100110011001100110011001... (hmmm... compare to 1/5)

    Implementing a hard-coded geometric multiplier for any of these would beat the pants off the alternatives.

    Back to that. I wanted to bone up on the theory. All rational numbers have a repeating pattern of digits . Of course, with only 32 bits only short patterns will be evident. Math talks about a pattern of bits that extends to infinity to the right of the decimal point. Anyway, here are the rules.

    The rational number is p/q, a fraction which has been reduced to lowest terms. E.g., 15/25 is reduced to 3/5, or 18/24 is reduced to 3/4.

    * The binary fraction terminates only if the denominator is a power of two, 2^u, and it terminates after u binary digits. E.g. 1/8 = %0.001 or 3/4 = %0.11
    For base 10 the rule is a little more complicated. The decimal representation terminates only if the denominator has exclusively factors of 2 or 5, that is q=2^j * 5^k, after the number of digits of whichever exponent is greater, j or k.

    * If the denominator is not an even number (no factors of two) then the binary fraction has a repeating pattern, , and the pattern starts right after the decimal point. The pattern is a cycle of a length v that is determined by the smallest solution of the equation 2^v mod q = 1. That is a modular equation in a power of a prime number, and a theorem by Fermat proves that there is always a solution. For example
    v equation
    1 2 mod 5 = 2
    2 4 mod 5 = 4
    3 8 mod 5 = 3 (8-5=3)
    4 16 mod 5 = 1 (16-15=1)
    That is, the first solution occurs at v=4 and as you can see all binary patterns with 5 in the denominator repeat in a cycle of 4.
    Other denominators and the cycle length:
    q=3, v=2
    q=5, v=4
    q=7, v=3
    q=9, v=6
    q=11, v=10
    q=13, v=12
    q=15, v=4
    q=17, v=8
    q=19, v=18
    q=21, v=6
    q=23, v=11
    q=25, v=20
    q=27, v=18
    q=29, v=28
    q=31, v=5

    * If the denominator factors into powers of two as well as odd numbers, then there will be a non-repeating preamble for the first u binary digits to the right of the decimal point, followed by a repeating cycle iof length v. The factors u and v are the same as above. The length of the repeating cycle is determined by the odd factor, while the length of the non-repeating lead-in is determined by the number of factors of two. For example,
    13/20 = %0.10100110011001....
    The factorization of 20 is 2*2*5, so it has a repeating pattern of length 5, (%1001), and a lead-in of length 2 (%10). Note that if you multiply 13/20 times 4, you get 4+3/5, and 3/5 is the pure repeating binary pattern.

    * irrational numbers (like square root 2, or pi) do not repeat. Conversely, any binary fraction that does not repeat is irrational (cannot be represented exactly as a ratio of two integers). Of course that is a moot point in a computer with finite word size.


    Thinking about Phil's algorithm, I was wondering what characterizes the numbers (known in advance as constants) that have the longest minimum algorithms for multiplication. We know that the minimum number of instructions is one. A number that has one bit in its binary fraction is a pure inverse power of two and the multiplication is simply a single barrel roll right. Similarly, a number that has one zero in a sea of 1s is relatively easy, as it takes one barrel roll and one subtraction. And as we are seeing, a number that has a strong pattern can be folded or otherwise massaged or tricked to take it down. What number then among all the 65536 is the toughest nut?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
Sign In or Register to comment.