Shop Learn P1 Docs P2 Docs
Improving SD Card performance — Parallax Forums

Improving SD Card performance

iseriesiseries Posts: 1,274

It's been over a year since I looked at this and @evanh came up with a test program that I thought I would try out. It turns out it didn't work for me at first. I quickly discovered I was running out of memory and needed to make some changes to the code.

It also uncovered a bug in my SD card file support functions. I rewrote the flexspin file system functions to make some improvements and added a bug in the conversion process. It's one of those if the function returns 0 is that true and if it's not true is it really true bug.

It turns out that one of the nice things about the way flexspin implements SD card support is you can completely rewrite that whole thing and not mess with any of the existing code that is there. Kind of like a plug and play deal.

I also tried to add multi volume or drive support as well. This turns out to add a little bit of overhead with passing around the volume number, but I know somebody is going to want to attach more than one SD card to the P2 and want that function.

I also moved this code over to P2LLVM so that it would have SD card support. The low-level code is about the same but the upper level not so much.

In addition it looks like FatFs came out with a small update that removed a lot of custom memory code and went back to using the standard library functions. For P2LLVM it was a cut and past and Flexspin needed a few more tweaks.

SD Code

From the diagram the filesystem program is the high level C library functions that are hooked by virtual function pointers except for the mount and directory functions. This system then uses FatFs to convert these library functions into calls to the diskio code that uses the sd_mmc low level code to actually read and write data to the SD card. So one code replace the sd_mmc code with say flash or a usb drive code.

So all the performance functions lie in the low level driver sd_mmc.

Now back to running some performance test and see how we did.
Here is a copy of the performance program that I used:

#include <stdio.h>
#include <propeller.h>
#include <sys/vfs.h>

uint32_t randfill(uint32_t *, size_t);
int  compare(uint32_t *, uint32_t *, size_t);

#define PIN_SS   23
#define PIN_MISO 20
#define PIN_CLK  21
#define PIN_MOSI 22

uint32_t  data1[25000];
uint32_t  data2[25000];


int main(int argc, char** argv)
{
    FILE  *fh;
    uint32_t  ticks;

    printf( " clkfreq = %d   clkmode = 0x%x\n", _clockfreq(), _clockmode() );
    printf( " Randfill ticks = %d\n", randfill( data1, sizeof(data1) ) );

    printf( " Mounting: " );
    mount( "/sd", _vfs_open_sdcardx(PIN_CLK, PIN_SS, PIN_MOSI, PIN_MISO) );

    if( fh = fopen( "/sd/speed2.bin", "w" ) )
    {
        ticks = _getms();
        fwrite( data1, 1, sizeof(data1), fh );
        fclose( fh );
        ticks = _getms() - ticks;
        printf( " Writing %u bytes at %u kB/s\n", sizeof(data1), (sizeof(data1) * 1000 / ticks + 512) >> 10 );
    } else  printf( " SD card write error!\n" );

    if( fh = fopen( "/sd/speed2.bin", "r" ) )
    {
        ticks = _getms();
        fread( data2, 1, sizeof(data2), fh );
        fclose( fh );
        ticks = _getms() - ticks;
        printf( " Reading %u bytes at %u kB/s\n", sizeof(data2), (sizeof(data2) * 1000 / ticks + 512) >> 10 );
        if( compare( data1, data2, sizeof(data2) ) )  printf( " Matches!  :)\n" );
        else    printf( " Mis-matches!  :(\n" );
    } else  printf( " SD card read error!\n" );

    while (1)
    {
        _waitms(500);
    }
}

uint32_t  randfill( uint32_t *addr, size_t size )
{
    uint32_t  ticks;

    size >>= 2;
    ticks = _cnt();
    do {
        *(addr++) = _rnd();
    } while( --size );

    return( _cnt() - ticks );
}


int  compare( uint32_t *addr1, uint32_t *addr2, size_t size )
{
    uint32_t  pass = 1;

    size >>= 2;
    do {
        if( *(addr1++) != *(addr2++) )  pass = 0;
    } while( --size );

    return( pass );
}

Setup
For these tests I used a standard Parallax SD card Adapter and some loose wires.

First test with the standard flexspin driver:

Entering terminal mode.  Press Ctrl-] to exit. 
 clkfreq = 200000000   clkmode = 0x10009fb
 Randfill ticks = 225062
 Mounting:  Writing 100000 bytes at 136 kB/s
 Reading 100000 bytes at 314 kB/s
 Matches!  :)

Second test using my SD card functions changing one line of code:

mount( "/sd", _vfs_open_sd(PIN_SS, PIN_CLK, PIN_MOSI, PIN_MISO) );
Entering terminal mode.  Press Ctrl-] to exit. 
 clkfreq = 200000000   clkmode = 0x10009fb
 Randfill ticks = 225070
 Mounting:  Writing 100000 bytes at 164 kB/s
 Reading 100000 bytes at 718 kB/s
 Matches!  :)

Third test using P2LLVM. This required a few more changes:

uint32_t randfill(uint32_t *, size_t);
int  compare(uint32_t *, uint32_t *, size_t);

#define PIN_SS   23
#define PIN_MISO 20
#define PIN_CLK  21
#define PIN_MOSI 22

uint32_t  data1[25000];
uint32_t  data2[25000];


int main(int argc, char** argv)
{
    FILE  *fh;
    uint32_t  ticks;

    printf( " clkfreq = %d   clkmode = 0x%x\n", _clkfreq, _clkmode);
    printf( " Randfill ticks = %d\n", randfill( data1, sizeof(data1) ) );

    printf( " Mounting: " );
    sd_mount(0, PIN_SS, PIN_CLK, PIN_MOSI, PIN_MISO);

    if( (fh = fopen( "SD0:/speed2.bin", "w" )) > 0 )
    {
        ticks = getms();
        fwrite( data1, 1, sizeof(data1), fh );
        fclose( fh );
        ticks = getms() - ticks;
        printf( " Writing %u bytes at %u kB/s\n", sizeof(data1), (sizeof(data1) * 1000 / ticks + 512) >> 10 );
    } else  printf( " SD card write error!\n" );

    if( (fh = fopen( "SD0:/speed2.bin", "r" )) > 0 )
    {
        ticks = getms();
        fread( data2, 1, sizeof(data2), fh );
        fclose( fh );
        ticks = getms() - ticks;
        printf( " Reading %u bytes at %u kB/s\n", sizeof(data2), (sizeof(data2) * 1000 / ticks + 512) >> 10 );
        if( compare( data1, data2, sizeof(data2) ) )  printf( " Matches!  :)\n" );
        else    printf( " Mis-matches!  :(\n" );
    } else  printf( " SD card read error!\n" );

    while (1)
    {
        wait(500);
    }
}

uint32_t  randfill( uint32_t *addr, size_t size )
{
    uint32_t  ticks;

    size >>= 2;
    ticks = _cnt();
    do {
        *(addr++) = rand();
    } while( --size );

    return( _cnt() - ticks );
}


int  compare( uint32_t *addr1, uint32_t *addr2, size_t size )
{
    uint32_t  pass = 1;

    size >>= 2;
    do {
        if( *(addr1++) != *(addr2++) )  pass = 0;
    } while( --size );

    return( pass );
}

P2LLVM was setup to do multi volume support so the 0 is the volume number used. This also slowed performance some:

Entering terminal mode.  Press Ctrl-] to exit. 
 clkfreq = 200000000   clkmode = 0x14cc8fb
 Randfill ticks = 6574977
 Mounting:  Writing 100000 bytes at 149 kB/s
 Reading 100000 bytes at 794 kB/s
 Matches!  :)

Here is the assembly code used to read the SD card and the internal assembly code:

void ReceiveSD(char *buff, unsigned int bc)
{
    char *b = buff;
    unsigned int c = bc;
    int m = Pins.MOSI;
    int n = Pins.MISO;
    int x = Pins.CLK;
    int i;
    int j;
    int v;

    __asm {
           drvh m
           waitx #8
           mov i, c
    loopi  mov v, #0
           mov j, #8
    loopj  testp n wc
           drvh x
           rcl v, #1
           drvl x
           waitx #6
           djnz j, #loopj
           wrbyte v, b
           add b, #1
           djnz i, #loopi
    }
}
/***********************/
_ReceiveSD
    add ptr__dat__, ##201926
    rdbyte  _var01, ptr__dat__
    add ptr__dat__, #1
    rdbyte  _var02, ptr__dat__
    sub ptr__dat__, #2
    rdbyte  _var03, ptr__dat__
    sub ptr__dat__, ##201925
    zerox   _var03, #7
    drvh    _var01
    waitx   #8
LR__0687
    mov _var04, #0
    loc pa, #(@LR__0690-@LR__0688)
    call    #FCACHE_LOAD_
LR__0688
    rep @LR__0691, #8
LR__0689
    testp   _var02 wc
    drvh    _var03
    rcl _var04, #1
    drvl    _var03
    waitx   #6
LR__0690
LR__0691
    wrbyte  _var04, arg01
    add arg01, #1
    djnz    arg02, #LR__0687
_ReceiveSD_ret
    ret

And this is P2LLVM code for the same:

void ReceiveSD(int drive, BYTE *buff, unsigned int bc)
{
    BYTE *b = buff;
    unsigned int c = bc;
    int m = (Pins[drive] >> 16) & 0xff;
    int n = (Pins[drive] >> 24) & 0xff;
    int x = (Pins[drive] >> 8) & 0xff;
    int i = 0;
    int j = 0;
    int v = 0;

    asm volatile ("drvh %[m]\n"
                  "waitx #8\n"
                  "mov %[i], %[c]\n"
                  ".li: mov %[v], #0\n"
                  "mov %[j], #8\n"
                  ".lj: testp %[n] wc\n"
                  "drvh %[x]\n"
                  "rcl %[v], #1\n"
                  "drvl %[x]\n"
                  "djnz %[j], #.lj\n"
                  "wrbyte %[v], %[b]\n"
                  "add %[b], #1\n"
                  "djnz %[i], #.li\n"
                  :[i]"+r"(i), [j]"+r"(j), [v]"+r"(v), [b]"+r"(b)
                  :[x]"r"(x), [n]"r"(n), [m]"r"(m), [c]"r"(c));
}
/***********************/
00007468 <ReceiveSD>:
    7468: 28 02 64 fd            setq #1
    746c: 61 a1 67 fc            wrlong r0, ptra++
    7470: 28 08 64 fd            setq #4
    7474: 61 a7 67 fc            wrlong r3, ptra++
    7478: 02 a0 67 f0            shl r0, #2 
    747c: 4f 02 00 ff            augs #591
    7480: 34 a7 07 f6            mov r3, #308   
    7484: d0 a7 03 f1            add r3, r0 
    7488: d3 a1 03 fb            rdlong r0, r3  
    748c: d0 a7 03 f6            mov r3, r0 
    7490: 08 a6 47 f0            shr r3, #8 
    7494: ff a6 07 f5            and r3, #255   
    7498: d0 a9 03 f6            mov r4, r0 
    749c: 18 a8 47 f0            shr r4, #24    
    74a0: 10 a0 47 f0            shr r0, #16    
    74a4: ff a0 07 f5            and r0, #255   
    74a8: 00 aa 07 f6            mov r5, #0 
    74ac: d5 ad 03 f6            mov r6, r5 
    74b0: d5 af 03 f6            mov r7, r5 
    74b4: 59 a0 63 fd            drvh r0    
    74b8: 1f 10 64 fd            waitx #8   
    74bc: d2 ab 03 f6            mov r5, r2 

000074c0 <.li>:
    74c0: 00 ae 07 f6            mov r7, #0 
    74c4: 08 ac 07 f6            mov r6, #8 

000074c8 <.lj>:
    74c8: 40 a8 73 fd            testp r4   wc
    74cc: 59 a6 63 fd            drvh r3    
    74d0: 01 ae a7 f0            rcl r7, #1 
    74d4: 58 a6 63 fd            drvl r3    
    74d8: fb ad 6f fb            djnz r6, #-5
    74dc: d1 af 43 fc            wrbyte r7, r1
    74e0: 01 a2 07 f1            add r1, #1 
    74e4: f6 ab 6f fb            djnz r5, #-10
    74e8: 28 08 64 fd            setq #4
    74ec: 5f a7 07 fb            rdlong r3, --ptra  
    74f0: 28 02 64 fd            setq #1
    74f4: 5f a1 07 fb            rdlong r0, --ptra  
    74f8: 2e 00 64 fd            reta   

It looks like from the assembled code that flexspin is using FCACHE where as P2LLVM is not yet the speed number is higher for P2LLVM which questions is there is any performance gain by using it. Also my DJNZ loop was converted to a repeat instruction in flexspin.

Hopefully I didn't make to many mistakes in putting this together.

Mike

«1

Comments

  • evanhevanh Posts: 13,619

    Ha, didn't notice you post this last night. Cool looking Edge Card carrier, BTW. I want to make something similar.

    @iseries said:
    It looks like from the assembled code that flexspin is using FCACHE where as P2LLVM is not yet the speed number is higher for P2LLVM which questions is there is any performance gain by using it. Also my DJNZ loop was converted to a repeat instruction in flexspin.

    That WAITX #6 is basically negating the cogexec advantage. Have you tried my code again?

  • evanhevanh Posts: 13,619
    edited 2022-04-01 07:03

    Your speeds are a little slow for the original FlexC code. I'm getting around 340 kB/s reading at 200 MHz sysclock.

    Try this sdmm.cc with it's debug code turned on. It might tell us a little more why.

  • iseriesiseries Posts: 1,274
    edited 2022-04-01 10:18

    @evanh ,

    Those test were done using a old 16Mb SD card I had laying around. Kind of worse case test.

    Here is with a new 32Gb SanDisk card rated at 130Mb per second:

    Entering terminal mode.  Press Ctrl-] to exit. 
     clkfreq = 200000000   clkmode = 0x10009fb
     Randfill ticks = 225062
     Mounting:  Writing 100000 bytes at 395 kB/s
     Reading 100000 bytes at 344 kB/s
     Matches!  :)
    

    And here is with the new code:

    Entering terminal mode.  Press Ctrl-] to exit. 
     clkfreq = 200000000   clkmode = 0x10009fb
     Randfill ticks = 225062
     Mounting:  Writing 100000 bytes at 880 kB/s
     Reading 100000 bytes at 781 kB/s
     Matches!  :)
    

    And here is with P2LLVM:

    Entering terminal mode.  Press Ctrl-] to exit. 
     clkfreq = 200000000   clkmode = 0x14cc8fb
     Randfill ticks = 6574977
     Mounting:  Writing 100000 bytes at 501 kB/s
     Reading 100000 bytes at 872 kB/s
     Matches!  :)
    

    Mike

    PS: most of the slow downs in flexc is because of too long a delay before checking ready state.

  • evanhevanh Posts: 13,619
    edited 2022-04-01 12:51

    @iseries said:
    PS: most of the slow downs in flexc is because of too long a delay before checking ready state.

    Where is that in the source? Found it wait_ready() in sdmm.cc.
    select() hits it a lot!

    EDIT: Testing that ... while there is some select() delays for block writes, for the block reading, select() is returning on first pass. 100% no pausing for wait_ready() through the whole file. And only one select() per file cluster too. So I don't see that as really impacting read performance.

  • evanhevanh Posts: 13,619
    edited 2022-04-01 13:18

    Oh, now I see it. There's a duplicate of wait_ready()'s code inside of rcvr_datablock() ... which is used for every block read, one by one ... yep, that repeats ... at least one delay of 100 us preceding each block. More at cluster start.

  • evanhevanh Posts: 13,619
    edited 2022-04-01 14:47

    Wow, you're totally right. Knocked the set 100 us delay down to 10 us which bumped me from 1560 kB/s up to 2250 kB/s, at 200 MHz sysclock.

    EDIT: And 1 us delays at 360 MHz: :D

     clkfreq = 360000000   clkmode = 0x10011fb
     Randfill ticks = 477062
     Mounting:  Writing 212000 bytes at 3185 kB/s
     Reading 212000 bytes at 4059 kB/s
     Matches!  :)
    

    Read averaging 10.8 sysclock ticks per bit. 45 MHz SPI clock.

    And the 200 MHz report using 1 us delays:

     clkfreq = 200000000   clkmode = 0x10009fb
     Randfill ticks = 477070
     Mounting:  Writing 212000 bytes at 2010 kB/s
     Reading 212000 bytes at 2326 kB/s
     Matches!  :)
    
  • Wuerfel_21Wuerfel_21 Posts: 3,138
    edited 2022-04-01 14:21

    That delay is totally pointless, you can just read dummy bytes until you get a block start, no delay needed at all. Also, the only reason for a timeout IME is to handle sudden disconnect of the SD card or bung commands from higher up. A valid command will never time out.

  • evanhevanh Posts: 13,619
    edited 2022-04-01 14:56

    Hmm, I hadn't really thought about why it was there. I see Eric comments 500 ms and 100 ms timeouts. Which are now wildly missed with the much smaller 1 us delays.

    Okay, reverted the delays back to 100 us for the old code ... but, in new code, replaced the ready loop with an actual timeout and no delays. eg:

        tmr = _getms();
        for(;;)
        {
            rcvr_mmc(d, 1);
            if (d[0] != 0xFF) break;
            if( _getms() - tmr >= 100 )  break; /* Wait for data packet in timeout of 100ms */
        }
    
  • iseriesiseries Posts: 1,274

    @evanh ,

    That's amazing. By using RDFAST and WRFAST you can double the input/output speed to the SD card. How can this be.

    It doesn't work with my old 16Meg memory card though:

    Entering terminal mode.  Press Ctrl-] to exit. 
     clkfreq = 200000000   clkmode = 0x10009fb
     Randfill ticks = 225070
     Mounting:  Writing 100000 bytes at 174 kB/s
     Reading 100000 bytes at 1480 kB/s
     Matches!  :)
    

    This can also mean that byte reads and writes are slow to hub memory.

    So if I switch to writing 32 bits at a time I will see a speed increase?

    Mike

  • evanhevanh Posts: 13,619
    edited 2022-04-01 16:36

    @iseries said:
    @evanh ,

    That's amazing. By using RDFAST and WRFAST you can double the input/output speed to the SD card. How can this be.

    Not just using the FIFO. It's tricky to figure out the I/O pin latencies and work with them so as to remove that WAITX from the inner loop.

    There also needs to be enough lead time to allow for external latencies too. I make use of the fact that there is 8 sysclock ticks per bit to give me slack for this without needing to calculate a compensation based on sysclock frequency.

    It doesn't work with my old 16Meg memory card though:

    Entering terminal mode.  Press Ctrl-] to exit. 
     clkfreq = 200000000   clkmode = 0x10009fb
     Randfill ticks = 225070
     Mounting:  Writing 100000 bytes at 174 kB/s
     Reading 100000 bytes at 1480 kB/s
     Matches!  :)
    

    That's your slow card still working isn't it. That's normal. Slow SD cards are slow. EDIT: BTW, run it a second time and the write speed might be faster the second time.

    This can also mean that byte reads and writes are slow to hub memory.

    So if I switch to writing 32 bits at a time I will see a speed increase?

    I tried that at one stage but it failed badly. Didn't try to sort out why though.

  • @evanh said:

    This can also mean that byte reads and writes are slow to hub memory.

    So if I switch to writing 32 bits at a time I will see a speed increase?

    I tried that at one stage but it failed badly. Didn't try to sort out why though.

    You have to take care of the bit order - just shifting bits into a long does not suffice, you'll end up with the first byte received in the last byte of the long (a single MOVBYTS can fix this)

  • evanhevanh Posts: 13,619
    edited 2022-04-01 17:53

    Cool. Should try it again I guess ... yep, that worked. :) Gained another 10% extra read speed at 200 MHz. Maybe 7% gain for write speed.

  • evanhevanh Posts: 13,619
    edited 2022-04-02 07:35

    Ada,
    My quick hack there for 32-bit inner loop is split into two separate blobs of assembly with a simple logic decision as to which one gets used. I've now got a more robust logic solution that can use both paths but it also merges the two blobs into one. So I have my doubts as to its benefit because of the larger all-in-one blob size.

    Do you know if Fcache always reloads into cogRAM/lutRAM or is there some smarts to reuse an already loaded blob?

    EDIT: So far, testing isn't revealing any advantage either way. Larger blob doesn't seem to be any hindrance ...
    EDIT2: Here's what I've got now:

  • @evanh said:
    Ada,
    My quick hack there for 32-bit inner loop is split into two separate blobs of assembly with a simple logic decision as to which one gets used. I've now got a more robust logic solution that can use both paths but it also merges the two blobs into one. So I have my doubts as to its benefit because of the larger all-in-one blob size.

    Do you know if Fcache always reloads into cogRAM/lutRAM or is there some smarts to reuse an already loaded blob?

    As is it always reloads, yes. But each additional instruction beyond the initial overhead just costs one cycle and you're saving some by doing the branch inside the FCACHE, so yeah, difference is minor.

    Also, to make it not explode with FCACHE disabled, I think an FCACHE_SIZE preprocessor symbol could be introduced...

  • evanhevanh Posts: 13,619
    edited 2022-04-02 08:18

    Thanks, hehe, I'm on to smartpins now. If this works out then it won't even need Fcache at all. Maybe all done in C too.

  • evanhevanh Posts: 13,619
    edited 2022-04-03 13:02

    Oh, wow, SD's SPI protocol is sensitive to, even unclocked, post-command trailing level of DI. Not very SPI compliant at all. Makes it tricky to keep the tx smartpin doing the right thing.

    The good news is it's now good enough to read and write blocks. Bad news is it corrupts the filesystem ...

    EDIT: Huh, I may have found a bug in the disk_write() function. The optimiser maybe reordering the multiple functions within an if() condition. Namely this:

            if ((send_cmd(CMD24, sect) == 0)    /* WRITE_BLOCK */
                && xmit_datablock(buff, 0xFE))
    

    I split them up to work out which one was giving an error and it started working ... err, well, still not getting 100% valid data read back yet but it is writing a complete file now without destroying the filesystem ... Update: One of my three uSD cards is passing the speed test with matching data ...

  • Wuerfel_21Wuerfel_21 Posts: 3,138
    edited 2022-04-04 00:38

    Well, I have created a PR to define __HAVE_FCACHE__ when FCACHE is available. That should solve the issue of making stuff work at -O0

  • evanhevanh Posts: 13,619
    edited 2022-04-04 00:50

    Thanks, that was a little hairy.

    I think I've solved the smartpins implementation too. I changed it from SPI clock mode 0 to mode 3. The clock pin now idles high. All three cards are passing the speed test - only tested at 10 MHz sysclock ...

  • Aaaand it's on master. That was quick.

  • evanhevanh Posts: 13,619
    edited 2022-04-04 13:25

    Got a problem detecting SPI mode on one card at high clock rate. It switches/detects and works first run, but rebooting the Prop2 with that SD card still powered in SPI mode then fails to detect. Amusingly it's the one that was happy in mode 0.

    EDIT: Oh, I guess it should be doing those steps at a low clock rate ...
    EDIT2: hmm, might be a latencies thing ... hmm, yeah, I guess I'm pushing the limits now ...
    EDIT3: Grrr, there's a missing pull-up on the Eval Board. DO is meant to have one according to SD specs. Important for idle detection. Good news is I can substitute with the rx pin's drive controls ... doesn't seem to make it any faster though.

    EDIT4: Oh, I wasn't expecting this:

       clkfreq = 200000000   clkmode = 0x10009fb
    data1 = 0xd6f4  data2 = 0x25d94   Randfill ticks = 225070
     Written 100000 of 100000 bytes at 3537 kB/s
     Read 100000 of 100000 bytes at 4767 kB/s  Matches!  :)
    

    EDIT5: Ah, here we go:

     Read 212000 of 212000 bytes at 4999 kB/s  Mis-match! :(  211968
    data1 = 0xd878  data2 = 0x41498   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3829 kB/s
     Read 212000 of 212000 bytes at 4954 kB/s  Matches!  :)
    data1 = 0xd878  data2 = 0x41498   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3814 kB/s
     Read 212000 of 212000 bytes at 4997 kB/s  Mis-match! :(  211968
    data1 = 0xd878  data2 = 0x41498   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3673 kB/s
     Read 212000 of 212000 bytes at 4937 kB/s  Mis-match! :(  211968
    data1 = 0xd878  data2 = 0x41498   Randfill ticks = 477070
     Written 212000 of 212000 bytes at 3697 kB/s
     Read 212000 of 212000 bytes at 4969 kB/s  Mis-match! :(  211968
    data1 = 0xd878  data2 = 0x12698   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 2357 kB/s
     Read 20000 of 20000 bytes at 4308 kB/s  Mis-match! :(  19968
    data1 = 0xd878  data2 = 0x12698   Randfill ticks = 45070
     Written 20000 of 20000 bytes at 1839 kB/s
     Read 20000 of 20000 bytes at 4303 kB/s  Matches!  :)
    data1 = 0xd878  data2 = 0x12698   Randfill ticks = 45070
    

    PS: Those results are 200 MHz and sysclock/4, ie: 50 MHz SPI clock. I was surprised it worked at all at sysclock/4. The SPI clock config to prevent glitches on the DI pin means I can't setup the rx smartpin to change phase like I would with a regular SPI memory.

  • evanhevanh Posts: 13,619
    edited 2022-04-04 16:45

    Soldered up some wires for powering and programming an Edge Card. Plugged it all together and slotted a uSD card to see if the tests were any better there ... Nope.

    First thing I managed to do was supply 24 Volts to the Edge Card. Didn't seem to hurt it luckily.

  • evanhevanh Posts: 13,619
    edited 2022-04-05 14:24

    That's surprising. The Eval Board is faster than the Edge Card. Something is messing with data out from the SD card. Maybe reflections. Dunno, way too fast for the scope.

    Ouch! And it's bad enough to mess with the optimised bit-bashed code I posted earlier too. Bugger, 200 MHz sysclock is all it can do on the Edge. :(

    I'll be damned! The Sandisk Extreme is fine all the way up to max test frequency of 360 MHz! Drive strength clearly makes a difference.

    I better try reading the SD standard to see what queries can be made for speed limits ...

    PS: On the current state of the smartpins development: I've mostly solved the card detection issue on reboot. Just needed a small delay added after the first CMD0 to wait for switching out of transfer mode. It was of those it-never-happened-with-debug-enabled situations. Because the time required to report status was enough to make the SD card happy.

    There still is occasional issue after I've had massive errors due to bad timings and then need to power down to recover.

  • evanhevanh Posts: 13,619
    edited 2022-04-06 16:45

    Doing some more clean up today ... and after discovering the above bit-bashing issue I figured it'd also be worth bringing that method up to speed with the other fixes I'd done while developing the smartpins method. And voila, it works fully on the Edge Card. :) I had to adjust rx phase to suit clock idle high instead of low, but not much else other than remove the smartpin init code.

    So both methods are getting pretty robust. Oops, spoke too soon. Timings too tight in the new smartpins method ...

    Update: Here's the latest code. It's set to bit-bashed. I'd like to get others testing the revised bit-bashed method.

  • evanhevanh Posts: 13,619
    edited 2022-04-07 09:24

    Eliminated the need for Fcache - without clock stretching within a block transfer. And in the process found out I'd wrongly assumed SD was sensitive to trailing DI level when it's not.

    So now there is a possibility of adjusting rx phase at faster clock rates and solidly hitting 50 MHz SPI clock. Or maybe not. It's way overkill for the size of RAM.

    EDIT: Scratching my head lots on this tight timings issue. It looks like the main factor is actually the smartB clock input for both tx and rx smartpins. It seems that registering just the clock pin irons out a lot of inconsistencies. It also ups the lag effect out to +5 sysclock ticks on the tx smartpin. I'd previously avoided even trying it because of the extra lag.

  • evanhevanh Posts: 13,619
    edited 2022-04-09 02:25

    Okay, this time. Smartpin solution is enabled again.

    EDIT: Oops, left debug enabled. Fixed now.
    Updated sdmm.cc: Some minor tweaks and comments added.
    Updated speedtest.c: Removed the spin2 library code.

  • iseriesiseries Posts: 1,274

    @evanh ,

    Looking at the code I think it would be better to have two sets of code. There is a block read and write section that could have there own block read/write.

    Leave the simple 1, 2 byte send/receive functions simple as not much speed can be gained there and build block read/write functions into the block code.

    static
    int rcvr_datablock (    /* 1:OK, 0:Failed */
        BYTE *buff,         /* Data buffer to store received data */
        UINT btr            /* Byte count */
    )
    {
        BYTE d[2];
        UINT tmr, tmout;
    
        tmr = _cnt();
        tmout = _clockfreq() >> 3;  // 125 ms timeout
        for(;;) {
            rcvr_mmc( &d[0], 1 );
            if( d[0] != 0xFF )  break;
            if( _cnt() - tmr >= tmout )  break;
        }
        if (d[0] != 0xFE) return 0;     /* If not valid data token, return with error */
    
        rcvr_mmc_block(buff, btr);          /* Receive the data block into buffer */
        rcvr_mmc(d, 2);             /* Discard CRC */
    
        return 1;               /* Return with success */
    }
    

    In the above code use block function to read N number of blocks.

    Mike

  • iseriesiseries Posts: 1,274

    @ersmith ,

    I have made a lot of changes to your vfs_sdcard functions and move 99% of the code you put into FatFs out. They have come out with a newer version and to update it only required coping in the new files and changing two or three lines of code in the FatFs system.

    Along with Evans changes we could have a very efficient SD card support.

    The only thing is the vfs_sdcard parameters seems to be a sticking point. On the P1 we always had to provide them and don't see a problem with passing them in. I know the P2 has the SD card mounted and would use the same pins but passing them in seems to be simple enough.

    I also added long file name support and date/time, file size, and entry type to the directory functions which aids in finding files on the SD card.

    Let me know what you think.

    Mike

  • evanhevanh Posts: 13,619
    edited 2022-04-07 17:27

    I've not tried to redesign how Eric's done the SD protocol verses block level functionality. In fact, you're more than welcome to work on that side of the code.

    BTW, Eric supports SDv1/MMC variable length blocks.

    EDIT: On that note, I'm procrastinating on a project I started with the Prop1. Time to get back to it. I'm much more comfortable with the Prop2 but the Prop1 is more than capable for the job and the DIP40 package is surprisingly convenient for the layout.

  • ersmithersmith Posts: 5,442

    @iseries said:
    I have made a lot of changes to your vfs_sdcard functions and move 99% of the code you put into FatFs out. They have come out with a newer version and to update it only required coping in the new files and changing two or three lines of code in the FatFs system.

    I'd certainly be interested to see what you've done. The SD card driver can definitely use some improvements (and I hope to incorporate @evanh 's changes when they're ready, as well).

    The only thing is the vfs_sdcard parameters seems to be a sticking point. On the P1 we always had to provide them and don't see a problem with passing them in.

    There's a vfs_sdcardx where the pin parameters are provided. Are there other parameters that we need? If so, we may also be able to provide them via mount() (I've recently changed the vfs code so the mount() parameter gets passed to an initialization routine in the VFS layer).

    I also added long file name support and date/time, file size, and entry type to the directory functions which aids in finding files on the SD card.

    I think we'll want to leave the long file name support as optional (I think it can be enabled by a compile time define). Extending the directory functions is probably a good idea in principle, but we'll need to make sure the other file systems can support it.

    Thanks,
    Eric

  • Wuerfel_21Wuerfel_21 Posts: 3,138
    edited 2022-04-07 23:13

    @ersmith said:

    @iseries said:
    I also added long file name support and date/time, file size, and entry type to the directory functions which aids in finding files on the SD card.

    I think we'll want to leave the long file name support as optional (I think it can be enabled by a compile time define). Extending the directory functions is probably a good idea in principle, but we'll need to make sure the other file systems can support it.

    Oh, that'd be useful indeed. readdir is a bit silly as-is because AFAICT you have to separately stat each file to figure out what the entry actually is.

    Also, there's a weird issue I had to work around to get scanning subdirectories to work in megayume: The SD root directory can only be read if a trailing slash is given, but subdirectories only if it is not given.
    Relatedly, I think the true root directory / can not be enumerated at all?

Sign In or Register to comment.