Multiplication on 24bits

PHX · 2007-01-08 04:26

Hi,

I'm trying to modify the multiplication asm function but unsuccessfully. Could someone help me better understand how it's working ?
I'm implementing a digital filter with numbers in the [noparse][[/noparse]0,1] range (only decimal part) that are coded on 24bits pseudo-real , hence [noparse][[/noparse]0, 0xFFFFFF].
The multiplication result of 2 of those pseudo-real number is normally also in [noparse][[/noparse]0,1] range (so also in [noparse][[/noparse]0, xFFFFFF]).

Ex:
0x00FFFFFF x 0x007FFFFF should give 0x007FFFFF
0x007FFFFF x 0x007FFFFF should give 0x003FFFFF
etc

Thanks,

Richard

IanM · 2007-01-08 07:21

Hi Richard, because you're dealing with fractional numbers the results will be in the high order bits. So you'll need a 24x24 bit multiplier giving a 48 bit result of which you need to keep to the high order 24 bits so you need to shift the result right 24 bits.

It might be advantageous for MACS to use a 64 bit accumulator and then do the shift at the end to reduce rounding errors.

Cheers, Ian

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Ian Mitchell
www.research.utas.edu.au

Tracy Allen · 2007-01-08 09:15

Richard,

As Ian said, it is a fractional multiplication, which means you have to account for an implied denominator of D=2^24. Two input fractions are x/D and y/D. You need to find 24 bit integer z such that z/D = x/D * y/D. That simplifies to z = x * y / D. One way to do that is to form the 48 bit product and then divide by D, that is, shft right by 24 bits, or, the same thing, take the upper 24 bits as the answer.

Another approach is to treat one of the numbers, say y, as a binary fraction with the decimal point on the left of the 24 bit field, y/D. The algorithm will successively divide the other number, x by successive powers of 2 and add them into an accumulator based on the bits of y, starting from the decimal point and working back, 24 steps. The advantage of this is that it does not require a big accumulator. (This algorithm came up in an extended discussion of multiplication by the Cordic gain not long ago).

Here is the idea, unrolled to start. It's late here, so caveat emptor.

      shl y,#8       ' left justify 24 bits in 32 bit long
      mov z,#0        ' zero accumulator
      shr x,#1        ' x*1/2
      shl y,#1  wc    ' is 1/2 a factor?
if_c  add z,x         ' add if yes
      shr x, #1       ' x*1/4
      shl y, #1  wc   ' is is 1/4 a factor?
if_c  add z,x         ' add if yes
      shr x,#1
      shl y,#1  wc
if_c  add z,x
      ... etc to 24 steps

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 1/8/2007 8:35:42 PM GMT

PHX · 2007-01-08 21:27

Hi Tracy,

Thanks a lot for your explanation. I guess this is a variation of the Peasant multiplication, right ? It's working great but with a small imprecision.
I think I found something somehow better, based on yours, without really understanding what I did, but it looks like it has better precision.
See the shr r,#8 every 8 additions:

              mov       r, #0
              shr       y, #1   wc
              mov       t, x
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
              shr       r, #8
              mov       x, t
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
              shr       r, #8
              mov       x, t
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
        if_c  add       r, x
              shl       x, #1
              shr       y, #1   wc
              shr       r, #8

Tks anyway,

Richard

Tracy Allen · 2007-01-08 21:47

I edited in one small correction, a mov z,#0 where I had mistyped mov z,0, which would misinitialize z to the value that happened to be in register 0.

I think I see what your algorithm is doing. It works from low bit to high bit as in normal multiplication. But accomplishes the division by 2^24 in three steps of division by 2^8, and that keeps the answer within bounds of the 32 bit long without having to manage a 48 bit accumulator. Nice. I don't know if it is Peasant or pleasant multiplication, but it is always nice to see different ways of doing things side by side. Always educational. I'm puzzled by the difference in precision though.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Ym2413a · 2007-01-08 22:53

en.wikipedia.org/wiki/Ancient_Egyptian_multiplication
I think it's called Egyptian multiplication, but I may be wrong.
Peasant and Egyptian multiplication are almost the same from what Wiki has told me.

Anyway, I have used this before in programming and in my head since it works directly with binary numbers. (Me and the CPU's I program think better this way)
Also unlike some other ways of multiplication... You don't get stuck running a long clock eating DJNZ loop.

Tracy Allen · 2007-01-09 07:58

Richard,

I see what you mean about the imprecision of the descent algorithm. That is due to the bits that "fall off the edge" as x is shifted right. Individually each remainder is less than 1/D, where D=(2^24). But the sum of those remainders can be significant, making the result of the algorithm as much as 23 less than the true value for some large intial value of x and y.

Here is an alternative, which differs by having an initial shift of the 24 bit variable x to the left of the 32 bit field, so that the subsequent right shifts do not lose any bits or remainders until they drop off at a significance level of (1/D)/256. That gives a much tighter bound on the precision, I think no more than one count. At the end of the algorithm the result is divided by 256 to compensate for the initial *256.

' fractional multiplication, looped form
' multiplicands x and y, where x<D and y<D, with D=2^24 
' representing proper fractions x/D and y/D
' result product z representing z/D = x/D * y/D
frac24            shl y,#8       ' left justify 24 bits into 32 bit long, will pick off bits, msb first
                    shl x,#8       '  this *256 will be rescaled in the last step
                    mov z,#0        ' zero accumulator
                    mov n,#24     ' index
:loop              shr x,#1        ' x*1/2
                    shl y, #1  wc   ' is it a factor?
           if_c    add z,x         ' add iff yes
                    djnz n, #:loop
                    shr z,#8         ' remove *256 scale factor, z is result

I looked at that wiki page too. Interesting stuff, that so long ago they knew that yes/no decisions are simpler than multiple choice!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Ym2413a · 2007-01-09 18:07

I like your code Tracy.
I see you rolled it into a nice little space saving loop.

Tracy Allen · 2007-01-09 19:43

An email question asked how to do that in Spin. Easy, because Spin has the built in operator **, which returns the high word, upper 32 bits, of a multiplication. That is the same thing as multiplying x times y and dividing the result by 2^32. Like the Stamp ** operator, but with more bits.

If x, y and z represent the numerators of implied proper fractions, {x/D, y/D and z/D, with D=2^32}, then to find the solution for z in the implied equation,
z/D = x/D * y/D which is the rearranged as z = x * y / D
it becomes in Spin,
z := x ** y
If the implied denominator is to be d=2^24 instead of D=2^32, then the formula becomes
z/d = x/d * y/d or z= x * y / d or x * 256 * y / D
and in Spin,
z := (x << 8) ** y ' conditional on x<2^24

The ** operator has several interpretations in double precision and fractional multiplication.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

PHX · 2007-01-16 22:26

Hi Tracy,

Obviously I should have anticipated this....I need the division now....same rules (24bits unsigned integer representing [noparse][[/noparse]0,1] ). Could you help me there or point me to a reference I could use as a starting point. I'm still working on my low-pass resonant filter...just need to perform one division argh...to be precise I have to compute q/(1.0 - f) with q and f in ]0,1[noparse][[/noparse]
Your help would be greatly appreciated

Tks,
Richard

Tracy Allen · 2007-01-17 05:56

Hi Richard,

If the number f is in [noparse][[/noparse]0,1] then q/(1 - f) is going to be stiff as f approaches 1. How do you want that result to be scaled--no longer 24 bits--or is there some other constraint? It could go to floating point quantity to reflect the wider scale. Or, it could be done in integer math if there are some relatively narrow limits of range and precision for the result.

"Speaking" of resonant, were you following Chip's development of the formant filters? He used a CORDIC rotator.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

PHX · 2007-01-18 02:57

Hi Tracy,

My current sample rate is 48100hz. The filter (low pass) is hence almost useless above 22khz, so basically f will never be greater than 0.5. Does this make 24bits integer math an acceptable choice ?

I'll be very happy to look at the work Chip did ? Could you point me to the right thread ?
Thanks a lot for your help. I'll try to post a mp3 once the filter works :-D.

See below the algorithm:

//set feedback amount given f and q between 0 and 1
fb = q + q/(1.0 - f);

//for each sample...
buf0 = buf0 + f * (in - buf0 + fb * (buf0 - buf1));
buf1 = buf1 + f * (buf0 - buf1);
out = buf1;

Tracy Allen · 2007-01-18 04:01

The Cordic filter came up in the thread Chip started called "Singing Monk Squad",
http://forums.parallax.com/showthread.php?p=612545
Look at the vocal tract object posted there for some interesting filter ideas.

Division with f less than 0.5 makes the integer division feasible. You have, q in [noparse][[/noparse]0,1), and f in [noparse][[/noparse]0,0.5), which means the result of the division is in [noparse][[/noparse]0,2) and your variable fb will be in [noparse][[/noparse]1,3). The implied radix point is to the left of the 24th bit. It will be best to do the fb calculation so that it comes out in a form that can segue right into the multiplication with (buf0 - buf1).

I have to run right now, but I'll see what I can come up with if you don't get an answer sooner.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Tracy Allen · 2007-01-19 19:27

Hi Richard,

I think the following would come close to what you need.

A more general problem is to caluculate z = y / x, where the answer z is a long that will have the binary radix point implied to the left of bit 24. This number may have an integer part from 0 to 255 in the upper 8 bits, and a 24 bit fractional part. It is a question of renormalization, to find the number z that satisfies z/(2^24) = y/x.

Your particular case is a little simpler, because the integer part of the result will be either 0 or 1. That is, y=q represents a proper fraction in [noparse][[/noparse]0,1), and x=(1-f) represents a proper fraction in [noparse][[/noparse]0.5, 1), so the result of the division z = y/x will be an integer that represents an improper fraction in [noparse][[/noparse]0,2).

' renormalization of fraction z = y/x
' z/D = y/x with D = 2^24 implied, binary radix to the left of bit 24
' in this case, the resulting integer part is 0 or 1.
' enter with x and y, conditions:  x > y/2,   0<= y < 2^31
' exit with z: z/(2^24) = y/x best approximation


frac124    mov n,#25    ' number of bits (1 to left of radix + 24 to the right)
         mov z,#0    ' result variable
:loop       cmpsub y, x   wc    ' if y>=x then y:=y-x, carry:=1, else a unchanged, carry:=0
              rcl z,#1        ' shift in carry, double the estimate of z, appoximation z/(2^n) =~ y/x
              shl y,#1        ' double y, bounded to < x*2 due to the subtractions.
              djnz n,#:loop    ' do all bits
frac124_ret    ret

(note, edited the above to include wc effect with cmpsub

The cmpsub is done first, because the initial value of y may be as much as twice x, but once the subtraction is done, y will be less than x. After 25 shifts, the first resulting carry will end up as the integer part just to the left of the binary radix point.

Subequently in your filter, add q to that. q is already in the form with the radix to the left of bit 24. The resulting value fb will be in [noparse][[/noparse]0,3). Then it can be multiplied times the value (buf0 - buf1) using a method similar to the one from earlier in this thread.

What is the range of buf0 and buf1? Are those time-shifted data input values?

PHX said...
Hi Tracy,

My current sample rate is 48100hz. The filter (low pass) is hence almost useless above 22khz, so basically f will never be greater than 0.5. Does this make 24bits integer math an acceptable choice ?

I'll be very happy to look at the work Chip did ? Could you point me to the right thread ?
Thanks a lot for your help. I'll try to post a mp3 once the filter works :-D.

See below the algorithm:
//set feedback amount given f and q between 0 and 1
fb = q + q/(1.0 - f);

//for each sample...
buf0 = buf0 + f * (in - buf0 + fb * (buf0 - buf1));
buf1 = buf1 + f * (buf0 - buf1);
out = buf1;

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 1/22/2007 8:16:53 PM GMT

PHX · 2007-01-20 20:54

Hi Tracy,
Thank you so much for spending so much time on this.
Although the division you provided works great, I think I made a conceptual mistake and hence the filter is currently not working.
'In' is the result of my waveform generator and, as you remember, I decided to use unsigned 24-bits integer to represent [noparse][[/noparse]0,1] .
The problem although is that due to the substraction involved in the algortihm, I may have to deal with negative numbers. The division then is fine, but not anymore the mutiplication. What would you recommend ?

buf0 and buf1 are initialised at 0 for the first iteration. The only real input is the 'in' variable which is currently a 24-bits integer representing [noparse][[/noparse]0,1].
I did look to the cordic algorithm and to be honest, I didn't understand much unfortunately. Well I may have to go back to school to reactivate some old neurons.
Thanks for your help,

Richard

Tracy Allen · 2007-01-21 17:41

Richard,

I think the routines can be fixed to work with signed twos complement. At most they need a wrapper to strip off the signs and to restore them properly at the end. Internal to the multiply routine, the shr instruction(shift right from carry) can be replaced with (shift arithmetic right), in order to extend the sign. Alternatively, a straightforward 32x32 muliply should work fine with twos complement, taking the top 32 bits for the fractional multiply.

In the other routine, the cmpsub instruction (compare and subtract conditional) operates on unsigned values, so that one would have to be modified to work with twos complement. But I don't think it is giving you problems in this application, because your values of q an f are indeed both in [noparse][[/noparse]0.1).

It looks like q and f are fixed filter parameters, is that right? Do you have a reference that describes the filter algorithm? I'd like to learn a bit more about what it is doing. Why 24 bit, and not 16 or 20?

You're welcome for the help with this. I need these algorithms for my own projects too.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

PHX · 2007-01-21 20:27

Hi Tracy,

The algorithm I'm trying to implement is coming from http://www.musicdsp.org/archive.php?classid=3. More especially I'm working on http://www.musicdsp.org/archive.php?classid=3#29 since it looked as one of the simplest resonant filter...

You're right those value are fixed as per the filter, and I'd like to modulate them. However, I think I see your point, there's no real need to implement them in assembly for the moment.

I decided to use a 24bits phase-accumulator to generate the oscillator. It's pretty straightforward then to generate a sawtooth (basically taking the whole register) which is one of the most interesting waveform for audio. I thought that 24bits was good enough to maintain some good precision in the filter computation, avoiding to go with float numbers.

I'll try to manage the signed multiplication.
Tks a lot,

Richard

Tracy Allen · 2007-01-22 20:14

I just realized that I left the wc effect off of the cmpsub in the division algorithm I posted above. That is necessary for it to work correctly. I'll edit that in the listing.

Also, I left off the wr effect, but that is default. The v1.01 manual errata sheet notes an error in the manual, where nr is given incorrectly as the default.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

PHX · 2007-01-31 00:32

Hi Tracy,

Please don't laught

but I can't make it work. I changed my whole implementation to signed 32bits now and everything works fine....except the mutiplication. That must be my aging brain again...Since I need now a signed 32bits multiplication, do you think the log/antilog conversion could be a good alternative or I should stick to the straight 32 times addition ?
Tks,

Richard

Tracy Allen · 2007-01-31 05:51

Not even chuckling!
Is the following still the core of the algorithm? It looks like there are three multiplications. The parameters are positive numbers, f is the lowpass fraction in [noparse][[/noparse]0,1) and fb the positive feedback in [noparse][[/noparse]0,3). The variable "in" is the input in (-1,+1).

//for each sample...
buf0 = buf0 + f * (in - buf0 + fb * (buf0 - buf1));
buf1 = buf1 + f * (buf0 - buf1);
out = buf1

Each multiplication involves a definitely positive number (f or fb) times a number that can be either + or -. When doing the multiplication, always use the definitely positive number as the multiplier and the one that might be negative as the multiplicand. The multiplier is the one that has its bits tested and the multiplicand is the one that is conditionally added to the accumulator. {It is possible to do it the other way around, with the negative as the multiplier, but in that case the final value conditional on the sign bit has to be subtracted instead of added.}

I think I understand what you mean by signed 32 bits, but where is the binary radix point? Is represented at 2^31? It looks like there is a chance that with the feedback, the results in buf0 and buf1 can peak outside the limit of 31 bits plus sign. The success of the algorithm should not depend on a particular bit length.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

PHX · 2007-02-01 04:15

Hi Tracy,

I agree that bit length should not be part of algorithm success, although I feel more comfortable thinking 32bits now.
I managed to implement a 32bits signed (so 31 + 1) multiplication of ]-1,+1[noparse][[/noparse] numbers. See below:

Multiply31s_31s     mov  ResultMul, #0              ' zero accumulator
                    mov  Temp2, Multiplicant        '
                    xor  Temp2, Multiplier          ' bit 32 = 1 => negative result
                    cmps Multiplicant, #0 wc        ' Is it negative
        if_c        neg  Multiplicant, Multiplicant ' Make it positive       
                    cmps Multiplier, #0 wc          ' Is it negative
        if_c        neg  Multiplier, Multiplier     ' Make it positive
                    mov  Temp, #31                  ' index / No need for bit 32 (=0)
:loop               shr  Multiplier, #1 wc          ' is it a factor?
        if_c        add  ResultMul,Multiplicant     ' add iff yes
                    shr  ResultMul, #1              ' average 
                    djnz Temp, #:loop               ' loop back
                    test Temp2, TestBit32 wc        ' Test bit 32 ?
        if_c        neg ResultMul, ResultMul        ' Make it negative
Multiply31s_31s_ret ret

I'm not sure if this could be better optimized, but that's the best I could come up with.
I'll post the new result of my filter implementation.

Tks,

Richard

Post Edited (PHX) : 2/1/2007 4:21:22 AM GMT

Tracy Allen · 2007-02-01 21:38

Hi Richard,

The logic looks right. There will be an accumulation of error due to the shift right, due to dropping the remainder.
shr ResultMul, #1 ' average

That is similar to the imprecesion you noted in an earlier algorithm. In a full implementation, that remainder would be accumulated in another long and the carries from the accumulation would come over into ResultMult. Without that, the upper bound of error can be calculated at around 32 counts. That is to say, the final ResultMult could be up to 32 counts too low. That is the same issue we ran into with the 24 bit multiplication before left justifying it. The point with left justifying the 24 bits was to allow remainders to accumulate in the 8 lsbs, so the error could be reduced to 1 count without having to take the speed hit required by double precision.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

PHX · 2007-02-09 19:10

Hi Tracy,

Yep, the problem was indeed an overflow. As promised, you can listen to an audio demo of the filter here:
http://www.digital-art.rd-logic.com/Projects/ProperSound/news.html

I'm pretty happy with the sound quality especially with the sound dynamics.
I still have plenty of work but I'm confident I'll make it.

I wouldn't have been able to reach this step without your help. Truly appreciated.
Tks again,

Richard

yerpa58 · 2007-02-10 23:49

Hey PHX,

Nice sounding filter! Thanks for the sound sample. It could easily end up surpassing the capabilities of the the SID filter.

Multiplication on 24bits

Comments