New SD mode P2 accessory board

rogloh · 2023-07-07 12:32

CRCs can be a PITA, but thankfully the P2 can do a good job of them once you figure it all out and setup the poly and feed the bits or nibbles in the right order.

evanh · 2023-07-07 13:23

I'm rather pleased with my efficient command tx code. It easily computes the CRC in the middle of shifting out the command bits, so effectively zero overhead.

    _wrpin( PIN_CMD, P_SYNC_TX | P_OE | (PIN_CLK - PIN_CMD & 7)<<24 );    // setup CMD output smartpin mode
    _wxpin( PIN_CMD, 31 );    // 32-bit shifter, continuous mode
    _wypin( PIN_CMD, _rev( (0x40 | cmd)<<24 | arg>>8 ) );    // first 32 bits into tx shifter, continuous mode
    _wypin( PIN_CLK, 6 * 8 );    // begin SD clocks, tx smartpin won't see clocks for about 8 sysclock ticks
    _pinl( PIN_CMD );    // start tx shifter, in continuous mode

    __asm {    // SD spec 4.5
        mov pa, cmd
        or  pa, #0x40
        shl pa, #24
        setq    pa
        mov crc, #0
        crcnib  crc, #0x48    // CCITT polynomial is 1 + x3 + x7 (0x09 reversed for CRCNIB)
        crcnib  crc, #0x48
        setq    arg
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        crcnib  crc, #0x48
        rev crc
        shr crc, #24
        or  crc, #1    // stop-bit at end of command packet
    }
    _wypin( PIN_CMD, _rev( arg<<24 | crc<<16 ) );    // final 16 bits into tx buffer

    while( !_pinr( PIN_CLK ) );    // wait for tx completion
    _pinf( PIN_CMD );
    _wrpin( PIN_CMD, 0 );    // release CMD pin, ready for response

evanh · 2023-07-07 13:25

Eventually, I want to make the command-response transition gapless clocking as well. But that will depend on whether it's possible to align the response start-bit at speed.

At the moment, I test one bit at a time. Re-arming the rx shifter for each inter-command-response clock. It might actually be faster to bit-bash the response entirely.

    i = 64;    // max of 64 clocks for response starting timeout, SD spec 4.12.4
    do  {    // look for start-bit, one clock at a time
        _pinf( PIN_CMD );    // reset rx shifter
        _wypin( PIN_CLK, 1 );    // one SD clock
        _pinl( PIN_CMD );
        _waitx( CLK_DIV * 2 + 8 );    // clock smartpin cycle allowance + I/O lantency compensation
    } while( !_rdpin( PIN_LED ) && --i );

Wuerfel_21 · 2023-07-07 14:09

I was about to suggest using async RX mode, but the response is too long.

The real problem I think is that CMD and DAT activity can overlap. So you need to do this sort of waiting for the start bit on both CMD and DAT0 simulataneously. I think, anyways.

Interesting discovery I just made while looking at the SD spec in Okular (trying to find the TOC before remembering that it doesn't have one...): the watermark is on a separate layer that you can just turn off. Amazing. Does this work on your full version, too?

rogloh · 2023-07-07 14:50

I was concerned about this overlap too although I never saw any overlap in my testing, but in theory based on that diagram I guess it could happen. If you have to cancel a read multiple operation where it might overlap at the final block, I expected that you can always ignore the final data block on the DAT lines while you are doing the CMD & RSP stuff (so just cancel after you have read enough blocks and ignore any extra block which might be arriving at that time). For writes you are in control of the DAT timing yourself.

evanh · 2023-07-07 21:16

@Wuerfel_21 said:
... Does this work on your full version, too?

Huh, no, the option is not there with the pillaged PDF of full spec. The "layers" pane and widget itself disappears.

Wuerfel_21 · 2023-07-07 21:21

Maybe that's a side effect of running it through some online PDF unlock tool (which isn't needed in Okular because it can bypass PDF DRM itself - quality software)

evanh · 2023-07-07 23:06

Looking at how Simon and Nicolas did gapless clocking with the Ethernet sync'ing I see they're adjusting the rx shifter width on the fly in the middle of valid data reception! I'm suitably impressed that even works without data loss. Kudos to Simon for thinking to try it - https://forums.parallax.com/discussion/comment/1537291/#Comment_1537291

evanh · 2023-07-09 03:46

@rogloh said:
I was concerned about this overlap too although I never saw any overlap in my testing, ...

The block start delay on a single block read with my SanDisk is a surprise. It's something like 460 clocks after the command (CMD17), which is around 400 after the response ended.

CMD17   51 00000000 55  >  1999   ff fc 44 00 00 24 01 9f ff ff ff ff ff ff ff ff
   offset = 14    >  11 00 00 09 00 67 ff ff ff ff ff ff ff ff ff ff  CRC = 67
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff f0 33 c0 fa 8e d8 8e d0 bc 00 7c 89 e6 06 57 8e c0 fb fc bf 00 06 b9 00 01 f3 a5 ea 1f
 06 00 00 52 52 b4 41 bb aa 55 31 c9 30 f6 f9 cd 13 72 13 81 fb 55 aa 75 0d d1 e9 73 09 66 c7 06
 8d 06 b4 42 eb 15 5a b4 08 cd 13 83 e1 3f 51 0f b6 c6 40 f7 e1 52 50 66 31 c0 66 99 e8 66 00 e8
 35 01 4d 69 73 73 69 6e 67 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 6d 2e 0d 0a 66 60 66
 31 d2 bb 00 7c 66 52 66 50 06 53 6a 01 6a 10 89 e6 66 f7 36 f4 7b c0 e4 06 88 e1 88 c5 92 f6 36
 f8 7b 88 c6 08 e1 41 b8 01 02 8a 16 fa 7b cd 13 8d 64 10 66 61 c3 e8 c4 ff be be 7d bf be 07 b9
 20 00 f3 a5 c3 66 60 89 e5 bb be 07 b9 04 00 31 c0 53 51 f6 07 80 74 03 40 89 de 83 c3 10 e2 f3
 48 74 5b 79 39 59 5b 8a 47 04 3c 0f 74 06 24 7f 3c 05 75 22 66 8b 47 08 66 8b 56 14 66 01 d0 66
 21 d2 75 03 66 89 c2 e8 ac ff 72 03 e8 b6 ff 66 8b 46 1c e8 a0 ff 83 c3 10 e2 cc 66 61 c3 e8 76
 00 4d 75 6c 74 69 70 6c 65 20 61 63 74 69 76 65 20 70 61 72 74 69 74 69 6f 6e 73 2e 0d 0a 66 8b
 44 08 66 03 46 1c 66 89 44 08 e8 30 ff 72 27 66 81 3e 00 7c 58 46 53 42 75 09 66 83 c0 04 e8 1c
 ff 72 13 81 3e fe 7d 55 aa 0f 85 f2 fe bc fa 7b 5a 5f 07 fa ff e4 e8 1e 00 4f 70 65 72 61 74 69
 6e 67 20 73 79 73 74 65 6d 20 6c 6f 61 64 20 65 72 72 6f 72 2e 0d 0a 5e ac b4 0e 8a 3e 62 04 b3
 07 cd 10 3c 0a 75 f1 cd 18 f4 eb fd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1 ee 0f 6a
 00 00 80 04 01 04 0b fe c2 ff 00 08 00 00 00 18 b7 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 55 aa 7d 49 f8 37 e1 3e 1e 4e ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

rogloh · 2023-07-09 05:49

Yeah in reality the cards would seem to need some time to process their requests. Not sure if there is a lower bound that you can always rely on though. Be good if there was. Maybe over time this delay would shrink as the cards become faster to respond, so they didn't want to nominate some minimum value for it.

rogloh · 2023-07-09 05:51

Perhaps if you monitored for a start bit in the DAT lines while waiting for the CMD response you could in theory temporarily slow the clock down in the case there was some overlap, to be able to do things to keep up with the incoming data before the response fully arrived? I found you can slow/pause the clock in the middle of these transfers and it seems to work.

Also once you see the start bit you could setup the streamer for the expected DAT transfer size for reads and enable a fast clock output. For writes you are in full control and can wait for the CMD response to complete before you begin sending any data. No overlap needed there.

evanh · 2023-07-09 06:11

Hmm, been mulling those ideas too. The above dump is from both a smartpin for command response and streamer for DAT pins sampling. So they are both collecting data concurrently with clock going at steady speed. Hence the post processing to find the response bits and realign them for running the CRC.

I haven't done the same post processing of the data block as yet and aren't really contemplating it either. The code for post processing the response bits is a proper horror!

    for( i = 0; i < 8*8; i++ )  {
        if( !(resp[i>>3] & (1<<(7-i&7))) )
            break;
    }
    offset = i;

    for( i = 0; i < 8*8; i++ )  {
        bit = resp[(i+offset)>>3]>>(7-(i+offset)&7);
        resp[i>>3] = resp[i>>3] & ~1<<(7-i&7) | bit<<(7-i&7);
    }

rogloh · 2023-07-09 06:38

Nah, post processing is not ideal. Best to have the data already aligned when read in. That is why the clock should be stopped after the start bit to set things up.

evanh · 2023-07-09 06:49

Now that I've verified good data block retrieval and have a feel for nominal behaviour, I'll have a shot at dynamically adjusting the shifter width on the fly similar to the Ethernet driver code. If that is successful then I'll go ahead with full gapless clocking.

EDIT: Grr, already having second thoughts. It'll probably still need post realigned. The Ethernet driver had a fixed length preamble to work with that isn't present here.

EDIT2: Yeah, without any preamble to throw away, the first bits will always need aligned anyway. The code to do that may as well be applied to the whole response in real time - Which I've now done.

Loop time is 26 sysclocks (plus extra for the hub write, so 27..34 ticks) to process 16 response bits. Actually, it can do 32 bits in the same time but that drags out more waiting for the serial shifter to fill up. Oops, it's now hubexec, so add another 12 or so ticks for the FIFO refill. It's good for sysclock/4.

evanh · 2023-07-09 11:50

Ouch! Forgot about the over-reached clock smartpin inputs. That's not an issue when using the streamer. Almost enough incentive to do post-processing block copying right there.

The remapping of smartpins needs careful documenting to keep it straight. Not to mention the extra instructions ... and CMD has to be repurposed in mid flight ... and it doesn't work for tx either, so streamer only for DAT tx pins. All very messy.

EDIT: And block copying approach can happen concurrently with next streamer action too. So not necessarily slowing things down.

evanh · 2023-07-10 10:54

Hmm, the real-time align and word merging code for the rx block data is too big to execute within sysclock/4. It's four times everything, so that's basically doubling the time required over what 32-bit shifter word size provides. Sysclock/8 should be doable with smartpins. Here's just the word merger and writing to hubRAM:

        rolbyte pb, pr3, #3
        rolbyte pb, pr2, #3
        rolbyte pb, pr1, #3
        rolbyte pb, pr0, #3
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        rolbyte pb, pr3, #2
        rolbyte pb, pr2, #2
        rolbyte pb, pr1, #2
        rolbyte pb, pr0, #2
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        rolbyte pb, pr3, #1
        rolbyte pb, pr2, #1
        rolbyte pb, pr1, #1
        rolbyte pb, pr0, #1
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        rolbyte pb, pr3, #0
        rolbyte pb, pr2, #0
        rolbyte pb, pr1, #0
        rolbyte pb, pr0, #0
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++

evanh · 2023-07-10 11:12

haha, sexy!

CMD17   51 00000000 55  >  11 00 00 09 00 67  CRC = 67
 datablock:  shl=29  shr=3
 33 c0 fa 8e d8 8e d0 bc 00 7c 89 e6 06 57 8e c0 fb fc bf 00 06 b9 00 01 f3 a5 ea 1f 06 00 00 52
 52 b4 41 bb aa 55 31 c9 30 f6 f9 cd 13 72 13 81 fb 55 aa 75 0d d1 e9 73 09 66 c7 06 8d 06 b4 42
 eb 15 5a b4 08 cd 13 83 e1 3f 51 0f b6 c6 40 f7 e1 52 50 66 31 c0 66 99 e8 66 00 e8 35 01 4d 69
 73 73 69 6e 67 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 6d 2e 0d 0a 66 60 66 31 d2 bb 00
 7c 66 52 66 50 06 53 6a 01 6a 10 89 e6 66 f7 36 f4 7b c0 e4 06 88 e1 88 c5 92 f6 36 f8 7b 88 c6
 08 e1 41 b8 01 02 8a 16 fa 7b cd 13 8d 64 10 66 61 c3 e8 c4 ff be be 7d bf be 07 b9 20 00 f3 a5
 c3 66 60 89 e5 bb be 07 b9 04 00 31 c0 53 51 f6 07 80 74 03 40 89 de 83 c3 10 e2 f3 48 74 5b 79
 39 59 5b 8a 47 04 3c 0f 74 06 24 7f 3c 05 75 22 66 8b 47 08 66 8b 56 14 66 01 d0 66 21 d2 75 03
 66 89 c2 e8 ac ff 72 03 e8 b6 ff 66 8b 46 1c e8 a0 ff 83 c3 10 e2 cc 66 61 c3 e8 76 00 4d 75 6c
 74 69 70 6c 65 20 61 63 74 69 76 65 20 70 61 72 74 69 74 69 6f 6e 73 2e 0d 0a 66 8b 44 08 66 03
 46 1c 66 89 44 08 e8 30 ff 72 27 66 81 3e 00 7c 58 46 53 42 75 09 66 83 c0 04 e8 1c ff 72 13 81
 3e fe 7d 55 aa 0f 85 f2 fe bc fa 7b 5a 5f 07 fa ff e4 e8 1e 00 4f 70 65 72 61 74 69 6e 67 20 73
 79 73 74 65 6d 20 6c 6f 61 64 20 65 72 72 6f 72 2e 0d 0a 5e ac b4 0e 8a 3e 62 04 b3 07 cd 10 3c
 0a 75 f1 cd 18 f4 eb fd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1 ee 0f 6a 00 00 80 04
 01 04 0b fe c2 ff 00 08 00 00 00 18 b7 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa

evanh · 2023-07-10 11:15

Here's the post-response assembly:

// DATA BLOCK RECEIVE
        dirl    #PIN_CLK
        dirl    #PIN_CMD
        wrpin   ##P_SYNC_RX | (PIN_DAT2 - PIN_CMD & 7)<<28 | (PIN_CLK - PIN_CMD & 7)<<24, #PIN_CMD  // DAT2 pin
        wxpin   #31, #PIN_CMD    // DAT2 pin
        dirh    #PIN_DAT2 | 4<<6    // enable shifters for DAT0..DAT3 pins, and CLK pin
        wypin   ##2000, #PIN_CLK    // kick it in the guts
.loop3
        waitse1    // trigger on PIN_CMD (DAT2 pin)
        rdpin   pr0, #PIN_DAT2    // DAT0 pin
        rdpin   pr1, #PIN_DAT3    // DAT1 pin
        rdpin   pr2, #PIN_CMD     // DAT2 pin
        rdpin   pr3, #PIN_LED     // DAT3 pin
        rev pr2
        not shiftr, pr2   wz
    if_z    jmp #.loop3

        encod   shiftr
        mov shiftl, shiftr
        subr    shiftl, #32

        rev pr0
        rev pr1
        rev pr3
        mov i, #32
        mov ptrb, buf
.loop4
        waitse1
        rdpin   pr4, #PIN_DAT2    // DAT0 pin
        rdpin   pr5, #PIN_DAT3    // DAT1 pin
        rdpin   pr6, #PIN_CMD     // DAT2 pin
        rdpin   pr7, #PIN_LED     // DAT3 pin
        rev pr4
        rev pr5
        rev pr6
        rev pr7

        mov pb, pr4
        shr pb, shiftr
        shl pr0, shiftl
        or  pr0, pb
        mov pb, pr5
        shr pb, shiftr
        shl pr1, shiftl
        or  pr1, pb
        mov pb, pr6
        shr pb, shiftr
        shl pr2, shiftl
        or  pr2, pb
        mov pb, pr7
        shr pb, shiftr
        shl pr3, shiftl
        or  pr3, pb

        rolbyte pb, pr3, #3
        rolbyte pb, pr2, #3
        rolbyte pb, pr1, #3
        rolbyte pb, pr0, #3
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        rolbyte pb, pr3, #2
        rolbyte pb, pr2, #2
        rolbyte pb, pr1, #2
        rolbyte pb, pr0, #2
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        rolbyte pb, pr3, #1
        rolbyte pb, pr2, #1
        rolbyte pb, pr1, #1
        rolbyte pb, pr0, #1
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        rolbyte pb, pr3, #0
        rolbyte pb, pr2, #0
        rolbyte pb, pr1, #0
        rolbyte pb, pr0, #0
        mergeb  pb
        movbyts pb, #0b00_01_10_11
        wrlong  pb, ptrb++
        mov pr0, pr4
        mov pr1, pr5
        mov pr2, pr6
        mov pr3, pr7
        djnz    i, #.loop4

evanh · 2023-07-10 11:21

Hmm, darn, not even looking hopeful for even sysclock/8 without changing to cogexec.

rogloh · 2023-07-10 11:47

Why do all this shifting stuff? Just setup the transfer after the start bit and it will be aligned. Sysclk/8 is only 40M nibbles per second or 20MB/s at 320MHz. I was able to get it clocking much faster than that (sysclk/2). You just give up the gapless CMD idea and it'll be fine. For performance you mainly want to optimize DAT reads, not so much the CMD+RESP timing stuff.

evanh · 2023-07-10 12:05

Because the clock is going at speed. I've long ditched the bit-bashing start-bit search.
There's definitely gains to be made by not doing a long pulse by pulse search to find the start of the block. There's 400 odd clocks to churn through checking on each block.

evanh · 2023-07-10 12:17

BTW: I was hoping for better than sysclock/4 originally. Therefore, agreed, the above approach is rather dead in the water now. The word merge kills it before even getting to the shifts.

rogloh · 2023-07-10 12:24

Yeah I reckon a 512 byte sector transfer time savings at sysclk/2 will swamp the gains from not having to bit bang the clock to search for the start bit and then transferring at sysclk/8. This is even more true once reading of multiple blocks are being done. Although admittedly if you want to strictly meet the 50MB/s max speed for normal SD cards, then the gain is lessened slightly.

For polling the start bit with a bit bang clock before you start the streamer and speed up the clock I'd expect you should be able to do it pretty fast in PASM (guessing a couple of microseconds or so per request).

evanh · 2023-07-10 12:29

Next approach is streamer + block copying.

The streamer hardware takes care of ordering so only the shifts need sorted ... and since the streamer sampling has to be tuned then it can also account for nibble alignment in the one parameter. Meaning it will be possible to tune the block copying so only an edge check has to be done to verify which byte to start at with no actual shifts required.

evanh · 2023-07-10 12:41

All this coding is partly the science, the learning, and partly an "I've been there" experience. To see the required code to get it working each way. More than some guessing.

And on that note, time to post what I've got:

evanh · 2023-07-11 13:06

Okay, change of tack, this is too good to leave alone ... My main aversion to the bit-bashed search of the start bit was the slow rate of pulse then examine - Waiting for all the latencies to clear, one cycle at a time. Well, I remembered the discussion about whether the PWM_SMPS smartpin mode could be used for a fast response trigger of some sort .... I thought, maybe, I could turn it into a clock gen that can abruptly shut off when an input goes low (The start bit arriving) .... ta-da!

    _pinstart( PIN_CLK, P_PWM_SMPS | P_OE | P_INVERT_OUTPUT |    // SD clock gen smartpin
                (PIN_CMD - PIN_CLK & 7)<<28 | P_INVERT_A |    // smartA input select
                (PIN_CMD - PIN_CLK & 7)<<24 | P_INVERT_B,    // smartB input select
                1 | 8<<16, 4 );

    while( _pinr( PIN_CMD ) );    // wait for CMD low - found the start-bit

    _pinstart( PIN_CLK, P_PULSE | P_OE | P_INVERT_OUTPUT,    // reconfig back to regular clock gen
                CLK_DIV | (CLK_DIV / 2)<<16, 0 );

Wuerfel_21 · 2023-07-11 13:09

Well that's a funny way to do it

But does the latency period really care about the clock speed while waiting for the start bit? I'd think this is limited by the internal speed of the SD controller, but I never verified that assumption.

evanh · 2023-07-11 13:13

Here it is working with the command response. I'll apply it to the block read code tomorrow.

rogloh · 2023-07-12 07:55

@evanh said:
Okay, change of tack, this is too good to leave alone ... My main aversion to the bit-bashed search of the start bit was the slow rate of pulse then examine - Waiting for all the latencies to clear, one cycle at a time. Well, I remembered the discussion about whether the PWM_SMPS smartpin mode could be used for a fast response trigger of some sort .... I thought, maybe, I could turn it into a clock gen that can abruptly shut off when an input goes low (The start bit arriving) .... ta-da!
    _pinstart( PIN_CLK, P_PWM_SMPS | P_OE | P_INVERT_OUTPUT |    // SD clock gen smartpin
                (PIN_CMD - PIN_CLK & 7)<<28 | P_INVERT_A |    // smartA input select
                (PIN_CMD - PIN_CLK & 7)<<24 | P_INVERT_B,    // smartB input select
                1 | 8<<16, 4 );

    while( _pinr( PIN_CMD ) );    // wait for CMD low - found the start-bit

    _pinstart( PIN_CLK, P_PULSE | P_OE | P_INVERT_OUTPUT,    // reconfig back to regular clock gen
                CLK_DIV | (CLK_DIV / 2)<<16, 0 );

This looks handy. If it works I can see it could potentially speed things up a fair bit and should let you clock faster while waiting for a start bit. Should then be no excuse to not work with fully aligned data after that!

EDIT: looks like you have it working in latest post. Cool. I'll have to give it a try here at some point soon.

rogloh · 2023-07-12 10:00

Just ran your test code with my board @evanh. This was the output I received from a 16G Sandisk card:
Looks okay so far but the first data sector is still offset. I guess that is next for you to sort out.

loadp2 -b 230400 -t sdmode_framing.binary 
( Entering terminal mode.  Press Ctrl-] or Ctrl-Z to exit. )
   clkfreq = 4000000   clkmode = 0x10005eb
 Card detected ... 40 ms power cycle of SD card ... 
 DONE!
CMD0   40 00000000 95  >  TIMED-OUT!
CMD8   48 0000015a 9b  >  08 00 00 01 5a 0f  CRC = 0f
  R7 0000015a - v2.0+ SD Card
CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
CMD41   69 40100000 cd  >  3f 00 ff 80 00 ff  CRC = c7
CMD55   77 00000000 65  >  37 00 00 01 20 83  CRC = 83
CMD41   69 40100000 cd  >  3f c0 ff 80 00 ff  CRC = 63
  OCR c0ff8000
CMD2   42 00000000 4d  >  3f 03 53 44 53 4c 31 36 47 00 20 b9 34 bb 01 16 f7  CRC = f7
CMD3   43 00000000 21  >  03 aa aa 05 20 d1  CRC = d1
  RCA aaaa0000
CMD7   47 aaaa0000 cd  >  07 00 00 07 00 75  CRC = 75
CMD55   77 aaaa0000 2b  >  37 00 00 09 20 33  CRC = 33
CMD6   46 00000002 cb  >  06 00 00 09 20 b9  CRC = b9
 Card init complete
Test RDPIN  9d049000
CMD17   51 00000000 55  >  11 00 00 09 00 67   CRC = 67
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff f0 fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e
 c0 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07
 75 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  CRC = 01
All finished  :)

New SD mode P2 accessory board

Comments