How to use CRCBIT and/or CRCNIB for CRC7 (for MMC)?

Cluso99 · 2020-06-03 20:37

Andy,
I tried all sorts of reversals and inversions and just couldn't get crcbit or crcnib to work. Didn't think of shifting the poly

Rayman · 2020-06-03 20:39

@Ariba it works! Thanks. I was almost there, but you saved me some time.

Maybe the code in the earlier posts would have worked too if I had done only 5 bytes instead of 5 and 7/8 bytes...

It's so bizarre now how that 5-7/8 byte code actually works... Those last seven zeros must clean up something not done right earlier...

Rayman · 2020-06-03 20:53

I also found a table method here that works with 5 bytes:
http://chibios.sourceforge.net/docs3/hal/hal__mmc__spi_8c_source.html

Rayman · 2020-06-03 21:00

This is great. I can now push the whole sector read code into assembly and hopefully get better performance.

BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...

Rayman · 2020-06-03 22:36

Looks like I’ll need crc16 to write to eMMC...

The crcnib will be a huge help there I hope

rogloh · 2020-06-03 23:04

Rayman wrote: »

Looks like I’ll need crc16 to write to eMMC...

The crcnib will be a huge help there I hope

If not, you could use a table driven approach from LUTRAM perhaps. Should still be quite fast.

Cluso99 · 2020-06-03 23:23

Rayman wrote: »

This is great. I can now push the whole sector read code into assembly and hopefully get better performance.

BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...

@Rayman
I suggest you look at my SD code I've just released if you want to get it running fast. It works in COG or LUT.
It should drop in to your code easily

For the CRC routine, I believe if you provide the last $00 (extra) byte (which gets replaced by the crc) but instead use $80 I believe you can then run the crc routine completely without the last statement that fiddles the routine to use 7 bits, and then does another fix. Haven't had time to try it yet.

evanh · 2024-05-05 21:01

I've optimised Andy's code a little:

PUB crc7(buf, len) : crc | val
    org    ' Reference code courtesy of Ariba
cr7lp
        rdword  val, buf
        add buf, #2
        movbyts val, #%00_01_10_11
        setq    val
        crcnib  crc, #$48    ' CCITT 7-bit polynomial is x7 + x3 + 1
        crcnib  crc, #$48    ' $09 reversed and shifted for CRCNIB
        sub len, #1  wcz
    if_ne   crcnib  crc, #$48
    if_ne   crcnib  crc, #$48
    if_ne   djnz    len, #cr7lp

        rev crc         ' correct the bit order to match standard
        shr crc, #24
        or  crc, #1
    end

@Ariba said:
or with the outer loop also in PASM:

PUB crc7(buf, len) : crc | val
    org
cr7lp
        rdbyte  val,buf
        add     buf,#1
        shl     val,#24
        setq    val
        crcnib  crc,#$90>>1
        crcnib  crc,#$90>>1
        djnz    len,#cr7lp
    end
    return (crc rev 6) << 1 | 1

Wuerfel_21 · 2024-05-05 21:03

You can't call it optimized if there's an RDxxxx in the hot loop

evanh · 2024-05-05 21:09

Oi, it's doubled the speed!

Wuerfel_21 · 2024-05-05 21:13

Also, instead of doing sub len,#1 and then testing the condition code, you can use the handy DJZ instruction to exit the loop early

evanh · 2024-05-05 21:14

Actually, if you have an efficient way to block read variables into Flexspin's Fcache I'm all ears.

EDIT: I guess allocating the space is where I need the help. Rather than the block copy.

Wuerfel_21 · 2024-05-05 21:17

You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)

evanh · 2024-05-05 21:25

@Wuerfel_21 said:
You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)

I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.

evanh · 2024-05-05 21:27

@Wuerfel_21 said:
Also, instead of doing sub len,#1 and then testing the condition code, you can use the handy DJZ instruction to exit the loop early

Huh, that now allows use of RDLONG ...

PUB crc7(buf, len) : crc | val
    org    ' Reference code courtesy of Ariba
crc7lp
        rdlong  val, buf
        add buf, #4
        movbyts val, #%00_01_10_11
        setq    val
        crcnib  crc, #$48    ' CCITT 7-bit polynomial is x7 + x3 + 1
        crcnib  crc, #$48    ' $09 reversed and shifted for CRCNIB
        djz len, #crc7done
        crcnib  crc, #$48
        crcnib  crc, #$48
        djz len, #crc7done
        crcnib  crc, #$48
        crcnib  crc, #$48
        djz len, #crc7done
        crcnib  crc, #$48
        crcnib  crc, #$48
        djnz    len, #crc7lp
crc7done
        rev crc         ' correct the bit order to match standard
        shr crc, #24
        or  crc, #1
    end

Wuerfel_21 · 2024-05-05 21:39

@evanh said:

@Wuerfel_21 said:
You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)

I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.

Yes, something like that. In C it always puts it on the stack I think. In Spin it can go to cogRAM, wherein it can be used for inline ASM block reads. Also pr0 through pr7 always exist and can also be used for such a purpose, too.

TonyB_ · 2024-05-06 17:49

@evanh said:

@Wuerfel_21 said:
Also, instead of doing sub len,#1 and then testing the condition code, you can use the handy DJZ instruction to exit the loop early

Huh, that now allows use of RDLONG ...

PUB crc7(buf, len) : crc | val
    org    ' Reference code courtesy of Ariba
crc7lp
      rdlong  val, buf
      add buf, #4
      movbyts val, #%00_01_10_11
      setq    val
      crcnib  crc, #$48    ' CCITT 7-bit polynomial is x7 + x3 + 1
      crcnib  crc, #$48    ' $09 reversed and shifted for CRCNIB
      djz len, #crc7done
      crcnib  crc, #$48
      crcnib  crc, #$48
      djz len, #crc7done
      crcnib  crc, #$48
      crcnib  crc, #$48
      djz len, #crc7done
      crcnib  crc, #$48
      crcnib  crc, #$48
      djnz    len, #crc7lp
crc7done
      rev crc         ' correct the bit order to match standard
      shr crc, #24
      or  crc, #1
    end

Assuming code in cog or LUT RAM, no interrupts and no long crossings, all the RDLONGs after the first take 9 cycles, the minimum possible.

evanh · 2024-05-06 18:36

Neat. I see how that works now. The other instructions add up to an even 32 ticks, 4 whole hub rotations. And with the RDLONG slice address incrementing each pass, that makes it fit the minimum of 9 ticks to execute.

How to use CRCBIT and/or CRCNIB for CRC7 (for MMC)?

Comments