slow CRC computation
Larry Martin
Posts: 101
Hi, all -
I have a CRC computation that ran in tens of microseconds in SX assembly language, but is taking around a millisecond per byte in SPIN.
I have counted the resulting output pulses to verify that I am not mysteriously spinning through extra iterations. It takes 15 uS to shift left a long - that's 1200 machine cycles, right? I'm pretty sure SX48 did the whole computation in that time.
Questions:
1. Is there a way to speed this up by precompiling the SPIN code?
2. If I put in the time to learn PASM and recode the function, will I get performance closer to SX48?
3. Is there an example of assembly language computation, preferably CRC16, in the Object Exchange? Googling "crc pasm site: propeller.com" comes up dry.
[noparse][[/noparse]rant]
The context is, I'm nearly done porting a realtime control application from SX48 (single 8-bit core at 50 MHz) to Propeller (Multiple 32-bit cores at 80 MHz). Imagine my surprise to find that my program is now too slow in some ways. After a couple weeks' Propeller cavitation, I get that it's interpreted SPIN, etc., but it still seems kinda funny.
[noparse][[/noparse]/rant]
Thanks,
Larry
I have a CRC computation that ran in tens of microseconds in SX assembly language, but is taking around a millisecond per byte in SPIN.
CON TM_MSG_CRC_INIT = $FFFF TM_MSG_CCITT_CRC_POLY = $1021 pLLIOMarkOut = 5 pLLIOOverrunOut = 6 PUB TM_CalculateCRC(p_src_p, p_len) : crc | mask, bitctr, bytectr, ch crc := TM_MSG_CRC_INIT repeat bytectr from 0 to (p_len - 1) 'Align test bit with leftmost bit of the message byte. mask := $80 ch := BYTE[noparse][[/noparse]p_src_p + bytectr] outa[noparse][[/noparse]pLLIOOverrunOut] ~~ 'deleteme repeat bitctr from 0 to 7 ' times at 80 MHz (5 MHz crystal X 16 PLL) outa[noparse][[/noparse]pLLIOMarkOut] ~~ 'deleteme crc <<= 1 ' 15 uS outa[noparse][[/noparse]pLLIOMarkOut] ~ 'deleteme if ch & mask crc |= 1 ' 16 uS outa[noparse][[/noparse]pLLIOMarkOut] ~~ 'deleteme if crc & $10000 crc ^= TM_MSG_CCITT_CRC_POLY ' 25 uS outa[noparse][[/noparse]pLLIOMarkOut] ~ 'deleteme crc &= $FFFF ' 15 uS outa[noparse][[/noparse]pLLIOMarkOut] ~~ 'deleteme mask >>= 1 ' 15 uS =~ 1200 clock cycles outa[noparse][[/noparse]pLLIOMarkOut] ~ 'deleteme outa[noparse][[/noparse]pLLIOOverrunOut] ~ 'deleteme RESULT := crc & $FFFF
I have counted the resulting output pulses to verify that I am not mysteriously spinning through extra iterations. It takes 15 uS to shift left a long - that's 1200 machine cycles, right? I'm pretty sure SX48 did the whole computation in that time.
Questions:
1. Is there a way to speed this up by precompiling the SPIN code?
2. If I put in the time to learn PASM and recode the function, will I get performance closer to SX48?
3. Is there an example of assembly language computation, preferably CRC16, in the Object Exchange? Googling "crc pasm site: propeller.com" comes up dry.
[noparse][[/noparse]rant]
The context is, I'm nearly done porting a realtime control application from SX48 (single 8-bit core at 50 MHz) to Propeller (Multiple 32-bit cores at 80 MHz). Imagine my surprise to find that my program is now too slow in some ways. After a couple weeks' Propeller cavitation, I get that it's interpreted SPIN, etc., but it still seems kinda funny.
[noparse][[/noparse]/rant]
Thanks,
Larry
Comments
_CLKMODE = XTAL1 + PLL16X
_XINFREQ = 5_000_000
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
1) You can't "precompile" the Spin code. It's never compiled to machine instructions, but to interpretive codes.
2) You'll certainly get comparable times to the SX48, particularly when you make use of the additional functionality of the Propeller's instructions
3) I don't think so, but maybe someone else might know of one.
Spin is certainly slower than assembly language, perhaps by a factor of 20. That's pretty typical for interpreters. On the other hand, the interpretive code is very compact and there are features not available directly in assembly (like multiplication and division, subscripting, subroutine calls with parameters).
And yes, at 80 mips (5 Mhz * 16 == 80 MHZ) you should get speed comparable to the SX48 running at 50 MIPS.
I'll probably have another look at ImageCraft next week.
Thanks again,
Larry
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
Chuck - Good idea, but even a table driven implementation has a fair number of instructions, and I'd have to find or write a table that matches this device's CRC. Even "standard" CRC implementations differ in subtle ways. In 10 years of fairly intense integration of devices with CRC protected proprietary commo protocols, I have rarely been able to reuse a CRC function between devices.
An SX48 @ 50MHz will beat one Propeller COG @80 MHz in ASM.
SX48 uses a multi-level pipeline to do 1 instruction per clock tick.
Propeller executes 1 instruction every 4 clock ticks minimum (most instructions use 4).
Someone please correct me if I misunderstand this.
Now, if you apply multiple cogs to the problem, there is potential for Propeller to outperform.
I have not seen any examples of a "generic" solution though.
All CRC tables are built using the shift/xor algorithm with a given polynominal.
The starting value and inversion factor also come into play. Tables use memory of course.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Post Edited (jazzed) : 6/27/2008 4:13:44 PM GMT
My SX48 was loafing in every respect except for the soft UARTs. It ran two independent UARTs at 57600 baud, and controlled a machine, but I came to need a third data channel and could not make that work without water-cooling the chip.
I am adding the third serial channel now to my Propeller now. That will tell if I made the right choice. The rest is a mere matter of software.
Thanks for help, all. If questions arise about the PASM CRC, I will start a new thread.
Larry
The SPIN level sets up data and then sets the CC16ParamsReady flag. It then waits for CC16ResultValid and returns the result data. There is also an escape if something goes wrong (like you forget to Start the object [noparse]:)[/noparse]).
Sample call: crc := CompuCog.TM_CalcCRC16(@TargetTxBuf, lTTBCommandLength - 3) ' leave out STX and CRC bytes
Like I said, this pattern reduced my CRC computation time by nearly two orders of magnitude, from 7 mS to 150 mS for the same 14 bytes.
Thanks to all for help, support and sample code.
It's Miller time.