On another note...CRC

pedward · 2013-05-06 10:52

I was thinking about CRC again, and how to whittle down clock cycles.

CRC can be implemented with table lookups to do all the arithmetic, however there is still some transitory data handling that is necessary.

In broad strokes, here's how it works:

repeat for number of bytes
XOR data byte onto current CRC accumulator
lookup CRC value at table offset indicated by lower 8 bits
shift accumulator right 8 bits and XOR table value onto accumulator

In a PASM implementation, this might look like (for a memory sensitive model):

repd bytes, #6
nop
nop

rdbytec byte, ptra++
xor byte, crc
setspa byte
popa crcb
shr crc, #8
xor crc, crcb

Then for CRC32, you would have a drop through instruction of "NOT crc".

That works out to a best case of 6 clocks per byte, once hub aligned. If the data isn't in cache, it's a 3 clock hit if it's hub aligned, otherwise it's a 10 clock hit.

This time is constant whether you are doing CRC16 or CRC32, since they both use the same accumulator compounding.

I'm open to anyone coming up with a way to do this faster, trim some clocks, etc.

Bill Henning · 2013-05-06 11:06

Looks good.

Best if the start of the buffer is aligned on a 16 byte boundary in the hub.

Kye · 2013-05-06 17:59

CRC can be done on the Prop1 in 1 bit per 3 instructions. It should be possible to fit the USB state machine into a few instructions.

pedward · 2013-05-06 18:02

Having indirect/array lookup makes it a lot faster.

I know there is a somewhat crude USB host and client implementations for the Prop 1, and then there is V-usb for the AVR, so I have no doubt it will work.

At least the example given takes 3/4 of a clock per bit, which is a sight better than 1 bit per 12 clocks.

On another note...CRC

Comments