Shop OBEX P1 Docs P2 Docs Learn Events
New SD mode P2 accessory board - Page 5 — Parallax Forums

New SD mode P2 accessory board

12357

Comments

  • roglohrogloh Posts: 5,173
    edited 2023-07-07 12:33

    CRCs can be a PITA, but thankfully the P2 can do a good job of them once you figure it all out and setup the poly and feed the bits or nibbles in the right order.

  • evanhevanh Posts: 15,198

    I'm rather pleased with my efficient command tx code. It easily computes the CRC in the middle of shifting out the command bits, so effectively zero overhead. :)

        _wrpin( PIN_CMD, P_SYNC_TX | P_OE | (PIN_CLK - PIN_CMD & 7)<<24 );    // setup CMD output smartpin mode
        _wxpin( PIN_CMD, 31 );    // 32-bit shifter, continuous mode
        _wypin( PIN_CMD, _rev( (0x40 | cmd)<<24 | arg>>8 ) );    // first 32 bits into tx shifter, continuous mode
        _wypin( PIN_CLK, 6 * 8 );    // begin SD clocks, tx smartpin won't see clocks for about 8 sysclock ticks
        _pinl( PIN_CMD );    // start tx shifter, in continuous mode
    
        __asm {    // SD spec 4.5
            mov pa, cmd
            or  pa, #0x40
            shl pa, #24
            setq    pa
            mov crc, #0
            crcnib  crc, #0x48    // CCITT polynomial is 1 + x3 + x7 (0x09 reversed for CRCNIB)
            crcnib  crc, #0x48
            setq    arg
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            crcnib  crc, #0x48
            rev crc
            shr crc, #24
            or  crc, #1    // stop-bit at end of command packet
        }
        _wypin( PIN_CMD, _rev( arg<<24 | crc<<16 ) );    // final 16 bits into tx buffer
    
        while( !_pinr( PIN_CLK ) );    // wait for tx completion
        _pinf( PIN_CMD );
        _wrpin( PIN_CMD, 0 );    // release CMD pin, ready for response
    
  • evanhevanh Posts: 15,198
    edited 2023-07-07 13:38

    Eventually, I want to make the command-response transition gapless clocking as well. But that will depend on whether it's possible to align the response start-bit at speed.

    At the moment, I test one bit at a time. Re-arming the rx shifter for each inter-command-response clock. It might actually be faster to bit-bash the response entirely.

        i = 64;    // max of 64 clocks for response starting timeout, SD spec 4.12.4
        do  {    // look for start-bit, one clock at a time
            _pinf( PIN_CMD );    // reset rx shifter
            _wypin( PIN_CLK, 1 );    // one SD clock
            _pinl( PIN_CMD );
            _waitx( CLK_DIV * 2 + 8 );    // clock smartpin cycle allowance + I/O lantency compensation
        } while( !_rdpin( PIN_LED ) && --i );
    
  • I was about to suggest using async RX mode, but the response is too long.

    The real problem I think is that CMD and DAT activity can overlap. So you need to do this sort of waiting for the start bit on both CMD and DAT0 simulataneously. I think, anyways.


    Interesting discovery I just made while looking at the SD spec in Okular (trying to find the TOC before remembering that it doesn't have one...): the watermark is on a separate layer that you can just turn off. Amazing. Does this work on your full version, too?

  • I was concerned about this overlap too although I never saw any overlap in my testing, but in theory based on that diagram I guess it could happen. If you have to cancel a read multiple operation where it might overlap at the final block, I expected that you can always ignore the final data block on the DAT lines while you are doing the CMD & RSP stuff (so just cancel after you have read enough blocks and ignore any extra block which might be arriving at that time). For writes you are in control of the DAT timing yourself.

  • evanhevanh Posts: 15,198

    @Wuerfel_21 said:
    ... Does this work on your full version, too?

    Huh, no, the option is not there with the pillaged PDF of full spec. The "layers" pane and widget itself disappears.

  • Maybe that's a side effect of running it through some online PDF unlock tool (which isn't needed in Okular because it can bypass PDF DRM itself - quality software)

  • evanhevanh Posts: 15,198
    edited 2023-07-07 23:09

    Looking at how Simon and Nicolas did gapless clocking with the Ethernet sync'ing I see they're adjusting the rx shifter width on the fly in the middle of valid data reception! I'm suitably impressed that even works without data loss. Kudos to Simon for thinking to try it - https://forums.parallax.com/discussion/comment/1537291/#Comment_1537291

  • evanhevanh Posts: 15,198
    edited 2023-07-09 05:05

    @rogloh said:
    I was concerned about this overlap too although I never saw any overlap in my testing, ...

    The block start delay on a single block read with my SanDisk is a surprise. It's something like 460 clocks after the command (CMD17), which is around 400 after the response ended.

    CMD17   51 00000000 55  >  1999   ff fc 44 00 00 24 01 9f ff ff ff ff ff ff ff ff
       offset = 14    >  11 00 00 09 00 67 ff ff ff ff ff ff ff ff ff ff  CRC = 67
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff f0 33 c0 fa 8e d8 8e d0 bc 00 7c 89 e6 06 57 8e c0 fb fc bf 00 06 b9 00 01 f3 a5 ea 1f
     06 00 00 52 52 b4 41 bb aa 55 31 c9 30 f6 f9 cd 13 72 13 81 fb 55 aa 75 0d d1 e9 73 09 66 c7 06
     8d 06 b4 42 eb 15 5a b4 08 cd 13 83 e1 3f 51 0f b6 c6 40 f7 e1 52 50 66 31 c0 66 99 e8 66 00 e8
     35 01 4d 69 73 73 69 6e 67 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 6d 2e 0d 0a 66 60 66
     31 d2 bb 00 7c 66 52 66 50 06 53 6a 01 6a 10 89 e6 66 f7 36 f4 7b c0 e4 06 88 e1 88 c5 92 f6 36
     f8 7b 88 c6 08 e1 41 b8 01 02 8a 16 fa 7b cd 13 8d 64 10 66 61 c3 e8 c4 ff be be 7d bf be 07 b9
     20 00 f3 a5 c3 66 60 89 e5 bb be 07 b9 04 00 31 c0 53 51 f6 07 80 74 03 40 89 de 83 c3 10 e2 f3
     48 74 5b 79 39 59 5b 8a 47 04 3c 0f 74 06 24 7f 3c 05 75 22 66 8b 47 08 66 8b 56 14 66 01 d0 66
     21 d2 75 03 66 89 c2 e8 ac ff 72 03 e8 b6 ff 66 8b 46 1c e8 a0 ff 83 c3 10 e2 cc 66 61 c3 e8 76
     00 4d 75 6c 74 69 70 6c 65 20 61 63 74 69 76 65 20 70 61 72 74 69 74 69 6f 6e 73 2e 0d 0a 66 8b
     44 08 66 03 46 1c 66 89 44 08 e8 30 ff 72 27 66 81 3e 00 7c 58 46 53 42 75 09 66 83 c0 04 e8 1c
     ff 72 13 81 3e fe 7d 55 aa 0f 85 f2 fe bc fa 7b 5a 5f 07 fa ff e4 e8 1e 00 4f 70 65 72 61 74 69
     6e 67 20 73 79 73 74 65 6d 20 6c 6f 61 64 20 65 72 72 6f 72 2e 0d 0a 5e ac b4 0e 8a 3e 62 04 b3
     07 cd 10 3c 0a 75 f1 cd 18 f4 eb fd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1 ee 0f 6a
     00 00 80 04 01 04 0b fe c2 ff 00 08 00 00 00 18 b7 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 55 aa 7d 49 f8 37 e1 3e 1e 4e ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    
  • Yeah in reality the cards would seem to need some time to process their requests. Not sure if there is a lower bound that you can always rely on though. Be good if there was. Maybe over time this delay would shrink as the cards become faster to respond, so they didn't want to nominate some minimum value for it.

  • roglohrogloh Posts: 5,173
    edited 2023-07-09 05:56

    Perhaps if you monitored for a start bit in the DAT lines while waiting for the CMD response you could in theory temporarily slow the clock down in the case there was some overlap, to be able to do things to keep up with the incoming data before the response fully arrived? I found you can slow/pause the clock in the middle of these transfers and it seems to work.

    Also once you see the start bit you could setup the streamer for the expected DAT transfer size for reads and enable a fast clock output. For writes you are in full control and can wait for the CMD response to complete before you begin sending any data. No overlap needed there.

  • evanhevanh Posts: 15,198

    Hmm, been mulling those ideas too. The above dump is from both a smartpin for command response and streamer for DAT pins sampling. So they are both collecting data concurrently with clock going at steady speed. Hence the post processing to find the response bits and realign them for running the CRC.

    I haven't done the same post processing of the data block as yet and aren't really contemplating it either. The code for post processing the response bits is a proper horror!

        for( i = 0; i < 8*8; i++ )  {
            if( !(resp[i>>3] & (1<<(7-i&7))) )
                break;
        }
        offset = i;
    
        for( i = 0; i < 8*8; i++ )  {
            bit = resp[(i+offset)>>3]>>(7-(i+offset)&7);
            resp[i>>3] = resp[i>>3] & ~1<<(7-i&7) | bit<<(7-i&7);
        }
    
  • roglohrogloh Posts: 5,173
    edited 2023-07-09 06:39

    Nah, post processing is not ideal. Best to have the data already aligned when read in. That is why the clock should be stopped after the start bit to set things up.

  • evanhevanh Posts: 15,198
    edited 2023-07-09 09:05

    Now that I've verified good data block retrieval and have a feel for nominal behaviour, I'll have a shot at dynamically adjusting the shifter width on the fly similar to the Ethernet driver code. If that is successful then I'll go ahead with full gapless clocking.

    EDIT: Grr, already having second thoughts. It'll probably still need post realigned. The Ethernet driver had a fixed length preamble to work with that isn't present here.

    EDIT2: Yeah, without any preamble to throw away, the first bits will always need aligned anyway. The code to do that may as well be applied to the whole response in real time - Which I've now done.

    Loop time is 26 sysclocks (plus extra for the hub write, so 27..34 ticks) to process 16 response bits. Actually, it can do 32 bits in the same time but that drags out more waiting for the serial shifter to fill up. Oops, it's now hubexec, so add another 12 or so ticks for the FIFO refill. It's good for sysclock/4.

  • evanhevanh Posts: 15,198
    edited 2023-07-09 12:04

    Ouch! Forgot about the over-reached clock smartpin inputs. That's not an issue when using the streamer. Almost enough incentive to do post-processing block copying right there.

    The remapping of smartpins needs careful documenting to keep it straight. Not to mention the extra instructions ... and CMD has to be repurposed in mid flight ... and it doesn't work for tx either, so streamer only for DAT tx pins. All very messy.

    EDIT: And block copying approach can happen concurrently with next streamer action too. So not necessarily slowing things down.

  • evanhevanh Posts: 15,198
    edited 2023-07-10 11:01

    Hmm, the real-time align and word merging code for the rx block data is too big to execute within sysclock/4. It's four times everything, so that's basically doubling the time required over what 32-bit shifter word size provides. Sysclock/8 should be doable with smartpins. Here's just the word merger and writing to hubRAM:

            rolbyte pb, pr3, #3
            rolbyte pb, pr2, #3
            rolbyte pb, pr1, #3
            rolbyte pb, pr0, #3
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            rolbyte pb, pr3, #2
            rolbyte pb, pr2, #2
            rolbyte pb, pr1, #2
            rolbyte pb, pr0, #2
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            rolbyte pb, pr3, #1
            rolbyte pb, pr2, #1
            rolbyte pb, pr1, #1
            rolbyte pb, pr0, #1
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            rolbyte pb, pr3, #0
            rolbyte pb, pr2, #0
            rolbyte pb, pr1, #0
            rolbyte pb, pr0, #0
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
    
  • evanhevanh Posts: 15,198

    haha, sexy!

    CMD17   51 00000000 55  >  11 00 00 09 00 67  CRC = 67
     datablock:  shl=29  shr=3
     33 c0 fa 8e d8 8e d0 bc 00 7c 89 e6 06 57 8e c0 fb fc bf 00 06 b9 00 01 f3 a5 ea 1f 06 00 00 52
     52 b4 41 bb aa 55 31 c9 30 f6 f9 cd 13 72 13 81 fb 55 aa 75 0d d1 e9 73 09 66 c7 06 8d 06 b4 42
     eb 15 5a b4 08 cd 13 83 e1 3f 51 0f b6 c6 40 f7 e1 52 50 66 31 c0 66 99 e8 66 00 e8 35 01 4d 69
     73 73 69 6e 67 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 6d 2e 0d 0a 66 60 66 31 d2 bb 00
     7c 66 52 66 50 06 53 6a 01 6a 10 89 e6 66 f7 36 f4 7b c0 e4 06 88 e1 88 c5 92 f6 36 f8 7b 88 c6
     08 e1 41 b8 01 02 8a 16 fa 7b cd 13 8d 64 10 66 61 c3 e8 c4 ff be be 7d bf be 07 b9 20 00 f3 a5
     c3 66 60 89 e5 bb be 07 b9 04 00 31 c0 53 51 f6 07 80 74 03 40 89 de 83 c3 10 e2 f3 48 74 5b 79
     39 59 5b 8a 47 04 3c 0f 74 06 24 7f 3c 05 75 22 66 8b 47 08 66 8b 56 14 66 01 d0 66 21 d2 75 03
     66 89 c2 e8 ac ff 72 03 e8 b6 ff 66 8b 46 1c e8 a0 ff 83 c3 10 e2 cc 66 61 c3 e8 76 00 4d 75 6c
     74 69 70 6c 65 20 61 63 74 69 76 65 20 70 61 72 74 69 74 69 6f 6e 73 2e 0d 0a 66 8b 44 08 66 03
     46 1c 66 89 44 08 e8 30 ff 72 27 66 81 3e 00 7c 58 46 53 42 75 09 66 83 c0 04 e8 1c ff 72 13 81
     3e fe 7d 55 aa 0f 85 f2 fe bc fa 7b 5a 5f 07 fa ff e4 e8 1e 00 4f 70 65 72 61 74 69 6e 67 20 73
     79 73 74 65 6d 20 6c 6f 61 64 20 65 72 72 6f 72 2e 0d 0a 5e ac b4 0e 8a 3e 62 04 b3 07 cd 10 3c
     0a 75 f1 cd 18 f4 eb fd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1 ee 0f 6a 00 00 80 04
     01 04 0b fe c2 ff 00 08 00 00 00 18 b7 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa
    
  • evanhevanh Posts: 15,198

    Here's the post-response assembly:

    // DATA BLOCK RECEIVE
            dirl    #PIN_CLK
            dirl    #PIN_CMD
            wrpin   ##P_SYNC_RX | (PIN_DAT2 - PIN_CMD & 7)<<28 | (PIN_CLK - PIN_CMD & 7)<<24, #PIN_CMD  // DAT2 pin
            wxpin   #31, #PIN_CMD    // DAT2 pin
            dirh    #PIN_DAT2 | 4<<6    // enable shifters for DAT0..DAT3 pins, and CLK pin
            wypin   ##2000, #PIN_CLK    // kick it in the guts
    .loop3
            waitse1    // trigger on PIN_CMD (DAT2 pin)
            rdpin   pr0, #PIN_DAT2    // DAT0 pin
            rdpin   pr1, #PIN_DAT3    // DAT1 pin
            rdpin   pr2, #PIN_CMD     // DAT2 pin
            rdpin   pr3, #PIN_LED     // DAT3 pin
            rev pr2
            not shiftr, pr2   wz
        if_z    jmp #.loop3
    
            encod   shiftr
            mov shiftl, shiftr
            subr    shiftl, #32
    
            rev pr0
            rev pr1
            rev pr3
            mov i, #32
            mov ptrb, buf
    .loop4
            waitse1
            rdpin   pr4, #PIN_DAT2    // DAT0 pin
            rdpin   pr5, #PIN_DAT3    // DAT1 pin
            rdpin   pr6, #PIN_CMD     // DAT2 pin
            rdpin   pr7, #PIN_LED     // DAT3 pin
            rev pr4
            rev pr5
            rev pr6
            rev pr7
    
            mov pb, pr4
            shr pb, shiftr
            shl pr0, shiftl
            or  pr0, pb
            mov pb, pr5
            shr pb, shiftr
            shl pr1, shiftl
            or  pr1, pb
            mov pb, pr6
            shr pb, shiftr
            shl pr2, shiftl
            or  pr2, pb
            mov pb, pr7
            shr pb, shiftr
            shl pr3, shiftl
            or  pr3, pb
    
            rolbyte pb, pr3, #3
            rolbyte pb, pr2, #3
            rolbyte pb, pr1, #3
            rolbyte pb, pr0, #3
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            rolbyte pb, pr3, #2
            rolbyte pb, pr2, #2
            rolbyte pb, pr1, #2
            rolbyte pb, pr0, #2
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            rolbyte pb, pr3, #1
            rolbyte pb, pr2, #1
            rolbyte pb, pr1, #1
            rolbyte pb, pr0, #1
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            rolbyte pb, pr3, #0
            rolbyte pb, pr2, #0
            rolbyte pb, pr1, #0
            rolbyte pb, pr0, #0
            mergeb  pb
            movbyts pb, #0b00_01_10_11
            wrlong  pb, ptrb++
            mov pr0, pr4
            mov pr1, pr5
            mov pr2, pr6
            mov pr3, pr7
            djnz    i, #.loop4
    
  • evanhevanh Posts: 15,198

    Hmm, darn, not even looking hopeful for even sysclock/8 without changing to cogexec. :(

  • roglohrogloh Posts: 5,173
    edited 2023-07-10 11:54

    Why do all this shifting stuff? Just setup the transfer after the start bit and it will be aligned. Sysclk/8 is only 40M nibbles per second or 20MB/s at 320MHz. I was able to get it clocking much faster than that (sysclk/2). You just give up the gapless CMD idea and it'll be fine. For performance you mainly want to optimize DAT reads, not so much the CMD+RESP timing stuff.

  • evanhevanh Posts: 15,198
    edited 2023-07-10 12:09

    Because the clock is going at speed. I've long ditched the bit-bashing start-bit search.
    There's definitely gains to be made by not doing a long pulse by pulse search to find the start of the block. There's 400 odd clocks to churn through checking on each block.

  • evanhevanh Posts: 15,198
    edited 2023-07-10 12:20

    BTW: I was hoping for better than sysclock/4 originally. Therefore, agreed, the above approach is rather dead in the water now. The word merge kills it before even getting to the shifts.

  • roglohrogloh Posts: 5,173
    edited 2023-07-10 12:25

    Yeah I reckon a 512 byte sector transfer time savings at sysclk/2 will swamp the gains from not having to bit bang the clock to search for the start bit and then transferring at sysclk/8. This is even more true once reading of multiple blocks are being done. Although admittedly if you want to strictly meet the 50MB/s max speed for normal SD cards, then the gain is lessened slightly.

    For polling the start bit with a bit bang clock before you start the streamer and speed up the clock I'd expect you should be able to do it pretty fast in PASM (guessing a couple of microseconds or so per request).

  • evanhevanh Posts: 15,198

    Next approach is streamer + block copying.

    The streamer hardware takes care of ordering so only the shifts need sorted ... and since the streamer sampling has to be tuned then it can also account for nibble alignment in the one parameter. Meaning it will be possible to tune the block copying so only an edge check has to be done to verify which byte to start at with no actual shifts required.

  • evanhevanh Posts: 15,198
    edited 2023-07-10 12:45

    All this coding is partly the science, the learning, and partly an "I've been there" experience. To see the required code to get it working each way. More than some guessing.

    And on that note, time to post what I've got:

  • evanhevanh Posts: 15,198

    Okay, change of tack, this is too good to leave alone ... My main aversion to the bit-bashed search of the start bit was the slow rate of pulse then examine - Waiting for all the latencies to clear, one cycle at a time. Well, I remembered the discussion about whether the PWM_SMPS smartpin mode could be used for a fast response trigger of some sort .... I thought, maybe, I could turn it into a clock gen that can abruptly shut off when an input goes low (The start bit arriving) .... ta-da!

        _pinstart( PIN_CLK, P_PWM_SMPS | P_OE | P_INVERT_OUTPUT |    // SD clock gen smartpin
                    (PIN_CMD - PIN_CLK & 7)<<28 | P_INVERT_A |    // smartA input select
                    (PIN_CMD - PIN_CLK & 7)<<24 | P_INVERT_B,    // smartB input select
                    1 | 8<<16, 4 );
    
        while( _pinr( PIN_CMD ) );    // wait for CMD low - found the start-bit
    
        _pinstart( PIN_CLK, P_PULSE | P_OE | P_INVERT_OUTPUT,    // reconfig back to regular clock gen
                    CLK_DIV | (CLK_DIV / 2)<<16, 0 );
    
  • Well that's a funny way to do it :)

    But does the latency period really care about the clock speed while waiting for the start bit? I'd think this is limited by the internal speed of the SD controller, but I never verified that assumption.

  • evanhevanh Posts: 15,198

    Here it is working with the command response. I'll apply it to the block read code tomorrow.

  • roglohrogloh Posts: 5,173
    edited 2023-07-12 07:56

    @evanh said:
    Okay, change of tack, this is too good to leave alone ... My main aversion to the bit-bashed search of the start bit was the slow rate of pulse then examine - Waiting for all the latencies to clear, one cycle at a time. Well, I remembered the discussion about whether the PWM_SMPS smartpin mode could be used for a fast response trigger of some sort .... I thought, maybe, I could turn it into a clock gen that can abruptly shut off when an input goes low (The start bit arriving) .... ta-da!

        _pinstart( PIN_CLK, P_PWM_SMPS | P_OE | P_INVERT_OUTPUT |    // SD clock gen smartpin
                    (PIN_CMD - PIN_CLK & 7)<<28 | P_INVERT_A |    // smartA input select
                    (PIN_CMD - PIN_CLK & 7)<<24 | P_INVERT_B,    // smartB input select
                    1 | 8<<16, 4 );
    
        while( _pinr( PIN_CMD ) );    // wait for CMD low - found the start-bit
    
        _pinstart( PIN_CLK, P_PULSE | P_OE | P_INVERT_OUTPUT,    // reconfig back to regular clock gen
                    CLK_DIV | (CLK_DIV / 2)<<16, 0 );
    

    This looks handy. If it works I can see it could potentially speed things up a fair bit and should let you clock faster while waiting for a start bit. Should then be no excuse to not work with fully aligned data after that! :smile:

    EDIT: looks like you have it working in latest post. Cool. I'll have to give it a try here at some point soon.

  • Just ran your test code with my board @evanh. This was the output I received from a 16G Sandisk card:
    Looks okay so far but the first data sector is still offset. I guess that is next for you to sort out.

    loadp2 -b 230400 -t sdmode_framing.binary 
    ( Entering terminal mode.  Press Ctrl-] or Ctrl-Z to exit. )
       clkfreq = 4000000   clkmode = 0x10005eb
     Card detected ... 40 ms power cycle of SD card ... 
     DONE!
    CMD0   40 00000000 95  >  TIMED-OUT!
    CMD8   48 0000015a 9b  >  08 00 00 01 5a 0f  CRC = 0f
      R7 0000015a - v2.0+ SD Card
    CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
    CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
    CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
    CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
    CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
    CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
    CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
    CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
    CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
    CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
    CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
    CMD41   69 40100000 cd  >  3f c0 ff 80 00 ff  CRC = 63
      OCR c0ff8000
    CMD2   42 00000000 4d  >  3f 03 53 44 53 4c 31 36 47 00 20 b9 34 bb 01 16 f7  CRC = f7
    CMD3   43 00000000 21  >  03 aa aa 05 20 d1  CRC = d1
      RCA aaaa0000
    CMD7   47 aaaa0000 cd  >  07 00 00 07 00 75  CRC = 75
    CMD55   77 aaaa0000 2b  >  37 00 00 09 20 33  CRC = 33
    CMD6   46 00000002 cb  >  06 00 00 09 20 b9  CRC = b9
     Card init complete
    Test RDPIN  9d049000
    CMD17   51 00000000 55  >  11 00 00 09 00 67   CRC = 67
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff f0 fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e
     c0 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07
     75 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      CRC = 01
    All finished  :)
    
Sign In or Register to comment.