On another note...CRC
pedward
Posts: 1,642
I was thinking about CRC again, and how to whittle down clock cycles.
CRC can be implemented with table lookups to do all the arithmetic, however there is still some transitory data handling that is necessary.
In broad strokes, here's how it works:
repeat for number of bytes
XOR data byte onto current CRC accumulator
lookup CRC value at table offset indicated by lower 8 bits
shift accumulator right 8 bits and XOR table value onto accumulator
In a PASM implementation, this might look like (for a memory sensitive model):
repd bytes, #6
nop
nop
rdbytec byte, ptra++
xor byte, crc
setspa byte
popa crcb
shr crc, #8
xor crc, crcb
Then for CRC32, you would have a drop through instruction of "NOT crc".
That works out to a best case of 6 clocks per byte, once hub aligned. If the data isn't in cache, it's a 3 clock hit if it's hub aligned, otherwise it's a 10 clock hit.
This time is constant whether you are doing CRC16 or CRC32, since they both use the same accumulator compounding.
I'm open to anyone coming up with a way to do this faster, trim some clocks, etc.
CRC can be implemented with table lookups to do all the arithmetic, however there is still some transitory data handling that is necessary.
In broad strokes, here's how it works:
repeat for number of bytes
XOR data byte onto current CRC accumulator
lookup CRC value at table offset indicated by lower 8 bits
shift accumulator right 8 bits and XOR table value onto accumulator
In a PASM implementation, this might look like (for a memory sensitive model):
repd bytes, #6
nop
nop
rdbytec byte, ptra++
xor byte, crc
setspa byte
popa crcb
shr crc, #8
xor crc, crcb
Then for CRC32, you would have a drop through instruction of "NOT crc".
That works out to a best case of 6 clocks per byte, once hub aligned. If the data isn't in cache, it's a 3 clock hit if it's hub aligned, otherwise it's a 10 clock hit.
This time is constant whether you are doing CRC16 or CRC32, since they both use the same accumulator compounding.
I'm open to anyone coming up with a way to do this faster, trim some clocks, etc.
Comments
Best if the start of the buffer is aligned on a 16 byte boundary in the hub.
I know there is a somewhat crude USB host and client implementations for the Prop 1, and then there is V-usb for the AVR, so I have no doubt it will work.
At least the example given takes 3/4 of a clock per bit, which is a sight better than 1 bit per 12 clocks.