Shop OBEX P1 Docs P2 Docs Learn Events
How to use CRCBIT and/or CRCNIB for CRC7 (for MMC)? - Page 2 — Parallax Forums

How to use CRCBIT and/or CRCNIB for CRC7 (for MMC)?

2»

Comments

  • Cluso99Cluso99 Posts: 18,069
    Andy,
    I tried all sorts of reversals and inversions and just couldn't get crcbit or crcnib to work. Didn't think of shifting the poly :(
  • RaymanRayman Posts: 14,867
    @Ariba it works! Thanks. I was almost there, but you saved me some time.

    Maybe the code in the earlier posts would have worked too if I had done only 5 bytes instead of 5 and 7/8 bytes...

    It's so bizarre now how that 5-7/8 byte code actually works... Those last seven zeros must clean up something not done right earlier...
  • RaymanRayman Posts: 14,867
    edited 2020-06-03 20:53
    I also found a table method here that works with 5 bytes:
    http://chibios.sourceforge.net/docs3/hal/hal__mmc__spi_8c_source.html

  • RaymanRayman Posts: 14,867
    edited 2020-06-03 21:01
    This is great. I can now push the whole sector read code into assembly and hopefully get better performance.

    BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...
  • RaymanRayman Posts: 14,867
    Looks like I’ll need crc16 to write to eMMC...

    The crcnib will be a huge help there I hope
  • Rayman wrote: »
    Looks like I’ll need crc16 to write to eMMC...

    The crcnib will be a huge help there I hope

    If not, you could use a table driven approach from LUTRAM perhaps. Should still be quite fast.
  • Cluso99Cluso99 Posts: 18,069
    Rayman wrote: »
    This is great. I can now push the whole sector read code into assembly and hopefully get better performance.

    BTW: This CRC code helped speed up the eMMC Spin code a little. But, I'm still seeing a series of single block reads as being much slower than a big multiblock read. Maybe that's just the way it is? I'm not seeing anything in my code that would account for this...

    @Rayman
    I suggest you look at my SD code I've just released if you want to get it running fast. It works in COG or LUT.
    It should drop in to your code easily :sunglasses:

    For the CRC routine, I believe if you provide the last $00 (extra) byte (which gets replaced by the crc) but instead use $80 I believe you can then run the crc routine completely without the last statement that fiddles the routine to use 7 bits, and then does another fix. Haven't had time to try it yet.
  • evanhevanh Posts: 16,134

    I've optimised Andy's code a little:

    PUB crc7(buf, len) : crc | val
        org    ' Reference code courtesy of Ariba
    cr7lp
            rdword  val, buf
            add buf, #2
            movbyts val, #%00_01_10_11
            setq    val
            crcnib  crc, #$48    ' CCITT 7-bit polynomial is x7 + x3 + 1
            crcnib  crc, #$48    ' $09 reversed and shifted for CRCNIB
            sub len, #1  wcz
        if_ne   crcnib  crc, #$48
        if_ne   crcnib  crc, #$48
        if_ne   djnz    len, #cr7lp
    
            rev crc         ' correct the bit order to match standard
            shr crc, #24
            or  crc, #1
        end
    

    @Ariba said:
    or with the outer loop also in PASM:

    PUB crc7(buf, len) : crc | val
        org
    cr7lp
            rdbyte  val,buf
            add     buf,#1
            shl     val,#24
            setq    val
            crcnib  crc,#$90>>1
            crcnib  crc,#$90>>1
            djnz    len,#cr7lp
        end
        return (crc rev 6) << 1 | 1
    
  • You can't call it optimized if there's an RDxxxx in the hot loop :)

  • evanhevanh Posts: 16,134

    Oi, it's doubled the speed!

  • Also, instead of doing sub len,#1 and then testing the condition code, you can use the handy DJZ instruction to exit the loop early

  • evanhevanh Posts: 16,134
    edited 2024-05-05 21:17

    Actually, if you have an efficient way to block read variables into Flexspin's Fcache I'm all ears.

    EDIT: I guess allocating the space is where I need the help. Rather than the block copy.

  • You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)

  • evanhevanh Posts: 16,134

    @Wuerfel_21 said:
    You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)

    I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.

  • evanhevanh Posts: 16,134
    edited 2024-05-05 21:58

    @Wuerfel_21 said:
    Also, instead of doing sub len,#1 and then testing the condition code, you can use the handy DJZ instruction to exit the loop early

    Huh, that now allows use of RDLONG ...

    PUB crc7(buf, len) : crc | val
        org    ' Reference code courtesy of Ariba
    crc7lp
            rdlong  val, buf
            add buf, #4
            movbyts val, #%00_01_10_11
            setq    val
            crcnib  crc, #$48    ' CCITT 7-bit polynomial is x7 + x3 + 1
            crcnib  crc, #$48    ' $09 reversed and shifted for CRCNIB
            djz len, #crc7done
            crcnib  crc, #$48
            crcnib  crc, #$48
            djz len, #crc7done
            crcnib  crc, #$48
            crcnib  crc, #$48
            djz len, #crc7done
            crcnib  crc, #$48
            crcnib  crc, #$48
            djnz    len, #crc7lp
    crc7done
            rev crc         ' correct the bit order to match standard
            shr crc, #24
            or  crc, #1
        end
    
  • @evanh said:

    @Wuerfel_21 said:
    You can declare an array and block-read into that (only works in Spin source). Or just use RDFAST and friends for something like this (only works in ORG/END mode, not ORGH/END or ASM/ENDASM)

    I thought all arrays went to hubRAM/Stack. There would need to be a notably small size limit for cogRAM based.

    Yes, something like that. In C it always puts it on the stack I think. In Spin it can go to cogRAM, wherein it can be used for inline ASM block reads. Also pr0 through pr7 always exist and can also be used for such a purpose, too.

  • @evanh said:

    @Wuerfel_21 said:
    Also, instead of doing sub len,#1 and then testing the condition code, you can use the handy DJZ instruction to exit the loop early

    Huh, that now allows use of RDLONG ...

    PUB crc7(buf, len) : crc | val
        org    ' Reference code courtesy of Ariba
    crc7lp
          rdlong  val, buf
          add buf, #4
          movbyts val, #%00_01_10_11
          setq    val
          crcnib  crc, #$48    ' CCITT 7-bit polynomial is x7 + x3 + 1
          crcnib  crc, #$48    ' $09 reversed and shifted for CRCNIB
          djz len, #crc7done
          crcnib  crc, #$48
          crcnib  crc, #$48
          djz len, #crc7done
          crcnib  crc, #$48
          crcnib  crc, #$48
          djz len, #crc7done
          crcnib  crc, #$48
          crcnib  crc, #$48
          djnz    len, #crc7lp
    crc7done
          rev crc         ' correct the bit order to match standard
          shr crc, #24
          or  crc, #1
        end
    

    Assuming code in cog or LUT RAM, no interrupts and no long crossings, all the RDLONGs after the first take 9 cycles, the minimum possible.

  • evanhevanh Posts: 16,134
    edited 2024-05-06 18:39

    Neat. I see how that works now. The other instructions add up to an even 32 ticks, 4 whole hub rotations. And with the RDLONG slice address incrementing each pass, that makes it fit the minimum of 9 ticks to execute.

Sign In or Register to comment.