Fast, Faster, Fastest Code: The smallest vs. the fastes SIN table read (in SPIN)
cessnapilot
Posts: 182
Hi,
For the 1.21 version of SPIN_TrigPack I have begun to streamline the code to gain more speed. The routines reading the SIN ROM table are called from many other procedures. In applications where sin/cos values are used frequently, FFT, frequency analysis or some 6DOF IMU navigation for example, the speed of these table reads is critical because of the likely high rate of access to the table. First I speeded up the SIN_ROM 20-25% with·a new code
although the result was not that I wanted. I obtained maybe the smalles SIN ROM table read procedure (34 bytes), which is probably faster than other published code up till now, but I wanted the fastest code. (Another mistake, think of Real Random SPIN code, that busts a common misbelief about the repeatability·of SPIN code on the Propeller.)· Can·we make·SIN_ROM faster? Yes, we can. Check out the unrolled version
I measured ellapsed clockticks by the snippet
The fastes SIN_ROM·takes about 40 usec to get a sine. FPU does it with 90-100 usec (with twice as many accurate digits, though). You might know, that I am practical enough to use the FPU, if that is simpler, faster or more precise in some situations than the Prop only design, but this performance of SPIN may be enough for some simple and fast frequency analysis. I know that a 8K FFT,·5 times per second, is possible with PASM (although some COGs will get hot) and now I guess that a 512( or 1K?) point FFT / sec is achieveable with SPIN only.
If you have downloaded SPIN_TrigPack v1.2, copy paste the fastes SIN_ROM procedure to speed up the object.·Similar coding·can be done for the COS, to have COS_ROM the same speed as SIN_ROM, but this is left for you as a practice.
Cheers,
Istvan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Intentionally Left Blank
For the 1.21 version of SPIN_TrigPack I have begun to streamline the code to gain more speed. The routines reading the SIN ROM table are called from many other procedures. In applications where sin/cos values are used frequently, FFT, frequency analysis or some 6DOF IMU navigation for example, the speed of these table reads is critical because of the likely high rate of access to the table. First I speeded up the SIN_ROM 20-25% with·a new code
PUB SIN_ROM(iA) : qV | q '------------------------------------------------------------------------- '--------------------------------┌─────────┐------------------------------ '--------------------------------│ SIN_ROM │------------------------------ '--------------------------------└─────────┘------------------------------ '------------------------------------------------------------------------- ' Action: - Gets quadrant ' - Folds iAngle into 2K address space [noparse][[/noparse]0, Pi/2] ' - Reads value from SIN Table from (folded) address ' - Sets sign of SIN value ' Parameters: Angle in iAngle (Index of Angle) units ' Result: Sine value for Angle in Qs15_16 Qvalue format '+Reads/Uses: - _BASE_SINTABLE (= $E000) ' - qValue from ROM SIN Table ' +Writes: None ' Calls: None ' Note: - Code size 34 bytes (Maybe the smalles SPIN version) ' - Average execution time 4352 cnt (55 usec) '------------------------------------------------------------------------- q := iA & %1_1000_0000_0000 'Get quadrant bits of address iA IF (q & %0_1000_0000_0000) 'Index runs reversed in 2nd and 4th. quad. -iA 'For the bits it is simply 2's complement qV := WORD[noparse][[/noparse]_BASE_SINTABLE][noparse][[/noparse]iA & $7FF] IF (q > %0_1000_0000_0000) 'In 3rd an 4th quadrant negate SIN value -qV RETURN qV '-------------------------------------------------------------------------
although the result was not that I wanted. I obtained maybe the smalles SIN ROM table read procedure (34 bytes), which is probably faster than other published code up till now, but I wanted the fastest code. (Another mistake, think of Real Random SPIN code, that busts a common misbelief about the repeatability·of SPIN code on the Propeller.)· Can·we make·SIN_ROM faster? Yes, we can. Check out the unrolled version
PUB SIN_ROM(iA) '------------------------------------------------------------------------- '--------------------------------┌─────────┐------------------------------ '--------------------------------│ SIN_ROM │------------------------------ '--------------------------------└─────────┘------------------------------ '------------------------------------------------------------------------- ' Action: - Reads value from SIN Table according to iAngle address ' Parameters: Angle in iAngle (Index of Angle) units ' Result: Sine value for Angle in Qs15_16 Qvalue format '+Reads/Uses: - _BASE_SINTABLE (= $E000) ' - qValue from ROM SIN Table ' +Writes: None ' Calls: None ' Note: - SIN table contains 2K 16-bit word data for the 1st ' quadrant in [noparse][[/noparse]$E000-$F000] 4KB locations ' - Word index goes up and down and up and down in quadrants ' [noparse][[/noparse]0, 90] [noparse][[/noparse]90, 180] [noparse][[/noparse]180, 270] [noparse][[/noparse]270, 380] ' up down up down ' ' quadrant: 1 2 3 4 ' angle: $0000...$07FF $0800...$0FFF $1000...$17FF $1800...$1FFF ' w.index: $0000...$07FF $0800...$0001 $0000...$07FF $0800...$0001 ' (The above 3 lines were taken after the Propeller Manual v1.1) ' ' - Code size 70 bytes ' - Average exec. time is 3192 cnt (40 usec at 80 MHz) '------------------------------------------------------------------------- CASE (iA & %1_1000_0000_0000) %0_0000_0000_0000: '1st quadrant [noparse][[/noparse]0, 90] RETURN WORD[noparse][[/noparse]_BASE_SINTABLE][noparse][[/noparse]iA & $7FF] %0_1000_0000_0000: '2nd quadrant [noparse][[/noparse]90, 180] RETURN WORD[noparse][[/noparse]_BASE_SINTABLE][noparse][[/noparse]-iA & $7FF] %1_0000_0000_0000: '3rd quadrant [noparse][[/noparse]180, 270] RETURN -WORD[noparse][[/noparse]_BASE_SINTABLE][noparse][[/noparse]iA & $7FF] %1_1000_0000_0000: '4th quadrant [noparse][[/noparse]270, 380] RETURN -WORD[noparse][[/noparse]_BASE_SINTABLE][noparse][[/noparse]-iA & $7FF] '-------------------------------------------------------------------------
I measured ellapsed clockticks by the snippet
a := CNT REPEAT c FROM 0 TO 4095 'To cover [noparse][[/noparse]0-360] 'd := c 'For base time d := Q.SIN(c) 'For total time b:= CNT PST.Str(STRING(" T = ")) PST.Dec(b-a) PST.Char(PST#NL)
The fastes SIN_ROM·takes about 40 usec to get a sine. FPU does it with 90-100 usec (with twice as many accurate digits, though). You might know, that I am practical enough to use the FPU, if that is simpler, faster or more precise in some situations than the Prop only design, but this performance of SPIN may be enough for some simple and fast frequency analysis. I know that a 8K FFT,·5 times per second, is possible with PASM (although some COGs will get hot) and now I guess that a 512( or 1K?) point FFT / sec is achieveable with SPIN only.
If you have downloaded SPIN_TrigPack v1.2, copy paste the fastes SIN_ROM procedure to speed up the object.·Similar coding·can be done for the COS, to have COS_ROM the same speed as SIN_ROM, but this is left for you as a practice.
Cheers,
Istvan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Intentionally Left Blank
Comments
1. QSqr change the qdiv 1.414... to qmul of 0.707106
2. atan2 I have a lot of cases where the inputs are ints i.e. the fractional part is 0, you can optimize that case to remove the 64bit maths