Lokking for signed 16 bit multiply and divide in PASM
ManAtWork
Posts: 2,178
Hello,
I'm sure this has been discussed before but I haven't found anything with the forum search. I'm looking for a signed 16*16 multiply and 32:16 division routine in assembler. Well actually I only need the upper 16 bits of the multiplication result and a 16:16 bit divide would also do if I always get 16 significant result bits.·What I'd like to do is a simplified but fast floating point library.
If necessary I could do it on my own but I'm sure somebody has optimized it better.
Any links or hints welcome.
I'm sure this has been discussed before but I haven't found anything with the forum search. I'm looking for a signed 16*16 multiply and 32:16 division routine in assembler. Well actually I only need the upper 16 bits of the multiplication result and a 16:16 bit divide would also do if I always get 16 significant result bits.·What I'd like to do is a simplified but fast floating point library.
If necessary I could do it on my own but I'm sure somebody has optimized it better.
Any links or hints welcome.
Comments
The existing floating point library is pretty fast. Why not use that?
ok, the sign wrapper is the best method? Good to know, I thought there was a more elegant way.
Do you mean the routines in float32.spin? How fast is pretty fast? I have to calculate a PID·control·loop·once every 20 µs, possibly with feed forward, so that means <5µs for one multiplication and one add. I'd like to use floating point format because I think it would be actually faster than fixed point. The reason is that it's hard to foresee the exact range of the input and coefficients. These are "user inputs" and I would have to do a lot of careful shifts and clipping to avoid overflows. And I would have to calculate more bits to achieve the same accuracy when the·scale was·not set optimally. With floating point I don't need to care.
Ok, forget about division. I only need that to precalculate the coefficients. That's not real time and I can use the spin routines there. For the time critical code I think I will take the·source from float32 and modify it to fit my needs. I didn't look at it earlier because I had an aversion against IEEE math. Once I saw a floating point emulation library for the 68k/coldfire processor and it looked awfully complex. There was a lot of code to handle special cases like denormalized numbers, not-a-number and quasi-infinity. Nonsense, nobody needs that except for scientific applications maybe. The popeller code seems to be quite lean and straight forward.
Thanks
BTW, when converting back to integer... What happens if I shift more than 31 bits. For example what does
do? Is it the same as shr x,#1 or does it actuall clear x? The manual is unclear about that. Would be nice if it was the latter. Otherwise I had to check for underflows. However, FTrunc checks for exponent<-32 so I fear I also have to.
The mantissas in floating point operations don't have to be normalized. The normalization is just to maintain the greatest significance. Without the normalization, you've got fixed point arithmetic.
Post Edited (Mike Green) : 2/8/2010 2:56:43 PM GMT
thanks for the quick response. You are right, I don't necessarily have to normalize the P and D inputs because they have less than 16 bits of significance. To·left justify them·doesn't improve·precision. However the·integral part can have more than 16 bits. I'll do at least the first 3 checks (num4 to num2) to improve the worst case. The coefficients are constant and are already stored normalized, of course.
Now I think, I have everything together to write and test the code. ·
Thanks again for your help.
You may be able to use a method of block normalization.· It sounds like from your requirements that you need to use 16-bit multiplications to achieve your loop time constraint.· I believe the unsigned 16-bit multiply routine in the propeller manual uses 208 cycles (3*16 + 4) * 4), which is 2.6 usecs.· That gives you 2.4 usecs·for overhead and other misc stuff.
You would want to treat your coefficients and data as normalized data with values between 1.0 and -1.0.· An integer value of $7fff would represent the maximum positive number of 0.99997 and the maximum negative number could be $8000, which would be -1.0.· However, you would probably need to limit it to $8001, or -0.99997 to prevent problems in the unsigned 16-bit multiply routine.
It sounds like you are summing the products of 4 coefficients with 4 samples of data.· You will have to normalize your coefficients so the sum of the absolute values is less than 1.0 to prevent overflows.· The block of 4 samples should be normalized so the maximum absolute value is less than 1.0.· The normalization factor should be a power of 2 so you can use shifts to implement it.
So the steps are as follows:
1. Get the coefficients and pre-normalize the sum of absolute values to less than 1.0
2. Get four samples and compute the maximum absolute value
3. Normalize the four samples so the maximum absolute value is less than 1.0
4. Compute the sum of products
5. Shift the result down by the sum of the coefficent and data normalization shift values
You can use the code from the first part of the numexp routine to determine the normalization shift value.
Dave
·
If you accumulate the 32-bit product into a 32-bit accumulator you would not need to normalize the data.· However, if you accumulated only the upper 16· bits you would need to normalize.· The easiest and more accurate method would be to use the 32-bit products and accumulator.
Dave