Fixed point math on P2
Rayman
Posts: 14,779
Wish I would have looked more closely at this earlier, might have asked for a cordic option...
Looks like Q16.16 is going to make the most sense for a lot of applications.
I think this is how you do multiplication on P2:
Might have been nicer if there was a QMUL2 that did the SHR, SHL and ADD. Would that have added a lot?
It is nice to have the 32x32 multiply... Puts is close to ARM Cortex M3, apparently...
It's looking to me like both have to be positive for this to work right (or at least the same sign)...
I found "libfixmath" that does some clever stuff with the leftover bits... The upper bits can be used for overflow detection and the lower bits for rounding:
Looks like Q16.16 is going to make the most sense for a lot of applications.
I think this is how you do multiplication on P2:
qmul f2,f1 getqx f1 getqy f2 shr f2,#16 'lower bytes shl f1,#16 'make 16.16 add f1,f2
Might have been nicer if there was a QMUL2 that did the SHR, SHL and ADD. Would that have added a lot?
It is nice to have the 32x32 multiply... Puts is close to ARM Cortex M3, apparently...
It's looking to me like both have to be positive for this to work right (or at least the same sign)...
I found "libfixmath" that does some clever stuff with the leftover bits... The upper bits can be used for overflow detection and the lower bits for rounding:
/* 64-bit implementation for fix16_mul. Fastest version for e.g. ARM Cortex M3. * Performs a 32*32 -> 64bit multiplication. The middle 32 bits are the result, * bottom 16 bits are used for rounding, and upper 16 bits are used for overflow * detection. */ #if !defined(FIXMATH_NO_64BIT) && !defined(FIXMATH_OPTIMIZE_8BIT) fix16_t fix16_mul(fix16_t inArg0, fix16_t inArg1) { int64_t product = (int64_t)inArg0 * inArg1; #ifndef FIXMATH_NO_OVERFLOW // The upper 17 bits should all be the same (the sign). uint32_t upper = (product >> 47); #endif if (product < 0) { #ifndef FIXMATH_NO_OVERFLOW if (~upper) return fix16_overflow; #endif #ifndef FIXMATH_NO_ROUNDING // This adjustment is required in order to round -1/2 correctly product--; #endif } else { #ifndef FIXMATH_NO_OVERFLOW if (upper) return fix16_overflow; #endif } #ifdef FIXMATH_NO_ROUNDING return product >> 16; #else fix16_t result = product >> 16; result += (product & 0x8000) >> 15; return result; #endif } #endif
Comments
Yes, although I think I may change it to a routine that uses a series of 16x16 MUL instructions -- I did a test a while back and doing 3 MULs plus the appropriate shifts and adds is slightly faster than QMUL.
Still, cordic has this pipeline thing that might speed things up for me, we'll see...
IIRC CORDIC is kind of slow for multiplies (I think 16 cycles?? Plus 2 cycles to issue the qmul and 4 cycles to fetch the results). But it's been a while since I ran the tests.
No, and that would potentially improve performance at the cost of a more complicated compiler (and assuming there are useful things to do between the QMUL and GETQX).
Only time this fails is if it's a simple recursion that has to be serially processed.