Improving SD Card performance

pik33 · 2022-04-08 06:18

The SD root directory can only be read if a trailing slash is given, but subdirectories only if it is not given.
Relatedly, I think the true root directory / can not be enumerated at all?

I didn't notice this / - no / difference in Basic, but I was unable (?) to list / so I decided to not go there in the player (and forget about this). There was also problem with lack of .. (up-dir) directory listed. I had to add it artificially to the directory list. Then getting current directory was also not working as I expected. Maybe it is now corrected but the player keeps the current directory itself.

I have LFN enabled in the compiler ( -DFF_USE_LFN) . This is 21st century, LFNs are everywhere, so maybe this should be the default option, and -DFF_DISABLE_LFN if someone don't want it and needs several KB of HUB instead. I even added LFN reading to P1's KyeFAT for use in P1 player several years ago.

What should be available and useful for file handling in the application like the multimedia player is:

file length and date/time
file delete

iseries · 2022-04-08 21:29

@ersmith ,

Test program for SD card:

#include <stdio.h>
#include <propeller.h>
#include <sys/vfs.h>
#include <sys/time.h>


uint32_t randfill(uint32_t *, size_t);
int  compare(uint32_t *, uint32_t *, size_t);

#define PIN_SS   23
#define PIN_MISO 20
#define PIN_CLK  21
#define PIN_MOSI 22

uint32_t  data1[25000];
uint32_t  data2[25000];
vfs_t *fs;
struct tm tv;


int main(int argc, char** argv)
{
    FILE  *fh;
    uint32_t  ticks;
    time_t t;
    struct timeval x;

    tv.tm_year = 2022 - 1900;
    tv.tm_mon = 3;
    tv.tm_mday = 7;
    tv.tm_hour = 6;
    tv.tm_min = 0;
    tv.tm_sec = 0;
    t = mktime(&tv);
    x.tv_sec = t;
    x.tv_usec = 0;

    settimeofday(&x, 0);

    printf( " clkfreq = %d   clkmode = 0x%x\n", _clockfreq(), _clockmode() );
    printf( " Randfill ticks = %d\n", randfill( data1, sizeof(data1) ) );

    printf( " Mounting: " );
    fs = _vfs_open_sd(PIN_SS, PIN_CLK, PIN_MOSI, PIN_MISO);
    //fs = _vfs_open_sdm(0, PIN_SS, PIN_CLK, PIN_MOSI, PIN_MISO);
    mount( "/sd", fs);

    if( fh = fopen( "/sd/speed2.bin", "w" ) )
    {
        ticks = _getms();
        fwrite( data1, 1, sizeof(data1), fh );
        fclose( fh );
        ticks = _getms() - ticks;
        printf( " Writing %u bytes at %u kB/s\n", sizeof(data1), (sizeof(data1) * 1000 / ticks + 512) >> 10 );
    } else  printf( " SD card write error!\n" );

    if( fh = fopen( "/sd/speed2.bin", "r" ) )
    {
        ticks = _getms();
        fread( data2, 1, sizeof(data2), fh );
        fclose( fh );
        ticks = _getms() - ticks;
        printf( " Reading %u bytes at %u kB/s\n", sizeof(data2), (sizeof(data2) * 1000 / ticks + 512) >> 10 );
        if( compare( data1, data2, sizeof(data2) ) )  printf( " Matches!  :)\n" );
        else    printf( " Mis-matches!  :(\n" );
    } else  printf( " SD card read error!\n" );

    while (1)
    {
        _waitms(500);
    }
}

uint32_t  randfill( uint32_t *addr, size_t size )
{
    uint32_t  ticks;

    size >>= 2;
    ticks = _cnt();
    do {
        *(addr++) = _rnd();
    } while( --size );

    return( _cnt() - ticks );
}


int  compare( uint32_t *addr1, uint32_t *addr2, size_t size )
{
    uint32_t  pass = 1;

    size >>= 2;
    do {
        if( *(addr1++) != *(addr2++) )  pass = 0;
    } while( --size );

    return( pass );
}

Mike

evanh · 2022-04-09 02:27

I've updated the speed test program a little along the way.
- speed calculation is now much higher precision
- compare now reports where the mismatch occurred
https://forums.parallax.com/discussion/comment/1537616/#Comment_1537616

I'd like to see some more test results from others using sysclock/6. The source code needs changed to set that divider though ... at the end of disk_initialize() in the sd_mm.cc file, replace the code with this:

// Performance option (Up to 50 MHz SPI clock)
    if( tmr <= 150_000_000 )  tmout = 0x0002_0004;  // sysclock/4
    else if( tmr <= 250_000_000 )  tmout = 0x0002_0005;  // sysclock/5
    else if( tmr <= 350_000_000 )  tmout = 0x0002_0006;  // sysclock/6
    else  tmout = 0x0003_0008;  // sysclock/8

The report of interest is for 350 MHz sysclock (58 MHz SPI clock).

iseries · 2022-04-09 10:09

@evanh ,

I wasn't using the speed test as a benchmark but rather a line in the sand to compare results of changes to the code.

Mike

iseries · 2022-04-09 10:12

Here is some code to show the directory functions and file operations:

#define PIN_MISO 20
#define PIN_CLK  21
#define PIN_MOSI 22
#define PIN_SS   23

#include <sys/types.h>
#include <stdio.h>
#include <sys/time.h>
#include <fcntl.h>
#include <string.h>
#include <propeller.h>
#include <sys/vfs.h>

void dodir(void);

char Buffer[512];
struct tm tv;
vfs_t *fs;


int main(int argc, char** argv)
{
    uint32_t w;
    int i;
    int fd;
    time_t t;
    struct timeval x;

    tv.tm_year = 2022 - 1900;
    tv.tm_mon = 2;
    tv.tm_mday = 7;
    tv.tm_hour = 6;
    tv.tm_min = 0;
    tv.tm_sec = 0;
    t = mktime(&tv);
    x.tv_sec = t;
    x.tv_usec = 0;

    settimeofday(&x, 0);

    printf("Mounting...\n");
    fs = _vfs_open_sd(PIN_SS, PIN_CLK, PIN_MOSI, PIN_MISO);
    //fs = _vfs_open_sdm(0, PIN_SS, PIN_CLK, PIN_MOSI, PIN_MISO);
    i = mount("/sd", fs);

    if (i != 0)
    {
        printf("mount failed\n");
        while (1)
            _waitms(1000);
    }

    dodir();

    printf("Opening file test.html\n");

    fd = open("/sd/OneCall.txt", O_RDONLY, 0);
    if (fd < 0)
    {
        printf("File Not Found!\n");
        while (1)
            _waitms(1000);
    }

    i = read(fd, Buffer, 256);

    Buffer[i] = 0;
    printf("Buffer: %s\n", Buffer);

    close(fd);

    printf("Writing log file\n");

    fd = open("/sd/logfile.txt", O_WRONLY | O_CREAT | O_APPEND, 644);
    if (fd < 0)
    {
        printf("File Creation Error!\n");
        while (1)
            _waitms(1000);
    }

    i =  write(fd, "Test Data only y\n", 18);

    printf("Wrote %d\n", i);

    close(fd);

    printf("Done\n");


    while (1)
    {
        _waitms(1000);
    }
}

void dodir()
{
    DIR *dir;
    struct dirent *ent;

    dir = opendir("/sd/");

    if (dir == NULL)
    {
        printf("Directory Failed!(%d)\n", errno);
        while (1)
            _waitms(1000);
    }

    printf("Mounted..\n");

    while (ent = readdir(dir))
    {
        tv.tm_year = (ent->d_date >> 9) + 80;
        tv.tm_mon = ((ent->d_date >> 5) & 0x0f) - 1;
        tv.tm_mday = (ent->d_date) & 0x01f;
        tv.tm_hour = (ent->d_time >> 11);
        tv.tm_min = (ent->d_time >> 5) & 0x3f;
        tv.tm_sec = (ent->d_time) & 0x1f;

        if (ent->d_type & ATTR_DIRECTORY)
            printf("d  ");
        else
            printf("   ");

        printf("%s %d %s", ent->d_name, ent->d_size, asctime(&tv));
    }

    closedir(dir);

    printf("\n");
}

Mike

evanh · 2022-04-09 10:14

Yes, same. But it wasn't very precise at small durations. Some tests were only a few milliseconds, so I fixed that anyway.

Wuerfel_21 · 2022-04-09 10:28

Just tried the speed test (at its default 250 MHz, current flexspin master, -O1 (-O2 chokes on something)).
Seems busted to me, on P2EDGE-32MB with a SanDisk Ultra 16GB.

$ loadp2 -p COM7 -t -b 230400 sdfat-speedtest.binary
( Entering terminal mode.  Press Ctrl-] to exit. )
   clkfreq = 250000000   clkmode = 0x10418fb
addr1 = 0xd278  addr2 = 0x25918   Randfill ticks = 225070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x25918   Randfill ticks = 225070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x25918   Randfill ticks = 225070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x25918   Randfill ticks = 225070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x40e98   Randfill ticks = 477070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x40e98   Randfill ticks = 477070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x40e98   Randfill ticks = 477070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x40e98   Randfill ticks = 477070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x40e98   Randfill ticks = 477070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x12098   Randfill ticks = 45070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x12098   Randfill ticks = 45070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x12098   Randfill ticks = 45070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x12098   Randfill ticks = 45070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0x12098   Randfill ticks = 45070

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xda48   Randfill ticks = 4574

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xda48   Randfill ticks = 4574

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xda48   Randfill ticks = 4574

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xda48   Randfill ticks = 4574

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xda48   Randfill ticks = 4574

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xd340   Randfill ticks = 518

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xd340   Randfill ticks = 518

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xd340   Randfill ticks = 518

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xd340   Randfill ticks = 518

 SD card write error!

 SD card read error!
addr1 = 0xd278  addr2 = 0xd340   Randfill ticks = 518

 SD card write error!

 SD card read error!

I've tried building megayume with it and it seems to work fine in there (at it's 325-ish MHz) for some reason. Significant speedup there.

evanh · 2022-04-09 10:39

Thanks Ada ...

Wuerfel_21 · 2022-04-09 11:38

In particular, it seems that CSE chokes on P_SYNC_IO for some reason...

$ flexspin -2 -O1,cse sdfat-speedtest.c
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
Version 5.9.10-beta-v5.9.9-155-g49b31c5d Compiled on: Apr  9 2022
sdfat-speedtest.c
fopen.c
fwrite.c
mount.c
fmt.c
posixio.c
isatty.c
fputs.c
fflush.c
bufio.c
errno.c
posixio.c
fatfs_vfs.c
|-ff.cc
bufio.c
strncmp.c
ioctl.c
fatfs_vfs.c
|-ff.cc
vfs.c
strcpy.c
strncpy.c
strncat.c
memset.c
sdmm.cc
stat.c
malloc.c
strcpy.c
memset.c
sdfat-speedtest.p2asm
C:/zeug/include/filesys/fatfs/sdmm.cc:628: error: unknown identifier P_SYNC_IO used in memory reference check
Done.

evanh · 2022-04-09 11:50

Compiler bug I presume.

Okay, I've identified an improvement for my code too: Remove the P_SYNC_IO from the tx pin mode at higher sysclocks for small dividers ...

I should have done this earlier. There is definitely an extra tick of lag being injected in the tx pin timing. Here's a capture of the prop2 output timings using the existing tx mode settings running at 350 MHz sysclock and clock divider of 6, 0x0002_0006

Orange trace is SPI clock
Blue trace is tx data (DI)

So, the second falling edge of clock is 6 sysclocks from the first. The tx output should match this edge but is aprox 3 ns behind - one sysclock tick!

And here's the same test again but with P_SYNC_IO removed from tx pin mode:

I guess loading on the clock pin impacts this ... but shouldn't be more than +1 lag. None of these should be enough to create errors. There's a space of four sysclock ticks to the subsequent rising clock edge.

Worth further real tests anyway ...

evanh · 2022-04-09 12:50

Ah, but what about bounce/reflections crossing VIO/2 threshold incorrectly ... time for P_SCHMITT_A ...

Try this everyone:
Updated: A little looser timings. Run tests at 150 MHz, 200 MHz, 280 MHz, and 350 MHz.

evanh · 2022-04-09 13:30

Nothing's perfect I'm afraid. The propagation times of I/O signals synchronising with the internal sysclock are creating havoc. I'm definitely getting frustrated. I don't know what's the slowest area in the I/O. It might not be the I/O pins themselves.

That spreadsheet with the propagations listed had some really large values stated, like 8.0 to 10.0 ns. My best guess is those numbers were referring to the routes from each pin into the first stage I/O flops, with I/O registering turned off. But even given that scenario those are really big numbers for a 200 MHz rated part. I would have expected nothing larger than 5.0 ns.

End result is allowing more leeway is needed.

Please run the speed test at 250 MHz and at 350 MHz using the latest code above. Those should be the two worst cases.

evanh · 2022-04-09 16:30

Here's a demo of what's going on. This is 155 MHz sysclock, with SPI clock divider of 4, 0x0002_0004. The amount of lag is four sysclock ticks (Best a smartpin can do) at slower clock rates but as you can see it's jittering between four and five.

And here's same again but at 160 MHz sysclock. It's now cleanly at five lag.

evanh · 2022-04-09 16:43

Using P_SYNC_IO on the clock pin mode helps by raising the sensitive jitter threshold from 150 MHz area to 250 MHz area, but it don't eliminate the problem. And adding P_SCHMITT_A on the clock pin moves it down a little to 240 MHz.

Ha, it's just dawned on me it meets the 200 MHz rating when using P_SYNC_IO. Anything more is an overclock.

EDIT: Updated again - https://forums.parallax.com/discussion/comment/1537713/#Comment_1537713

evanh · 2022-04-11 12:29

And here's the other part of the puzzle:

Orange is SPI clock from the prop2.
Blue is SPI DO back to the prop2.

Ignore the filled blue section. The important factors are the phase lag from falling clock edge to data edge, and the curved slope of the data. Both these combine to add latency to the data arriving at the rx smartpin. This example still works reliably. But this example is only 100 MHz sysclock (With SPI clock divider of 4).

A sync serial smartpin is limited in its ability to handle such a phase shift. Namely, it samples the data pin either the sysclock tick after rising clock detection, X[5]=1, or it samples on the prior sysclock tick, X[5]=0.

Both those two options fail quite early in the 200-350 MHz frequency range. There is, however, one bonus tick that can be added to the sampling phase. The clock input, smartB, to the rx smartpin can also have a delay added to it, thereby giving the external clock a head start. I didn't work this out until having tested it. This is done by registering the clock pin.

Ooo, ah, I just realised I might be able to remove the effect that this registering also imposes on the tx smartpin ... bugger, nope. The routes exist but the mode select bits don't cater for it. I thought maybe the partnered pin's low level pinB could be directed to IN but the only way that can happen is via the comparator against pinA, which is in use as the SD card enable.

PS: The setup in this example, with clock pin registered and x[5]=1 and only 100 MHz sysclock, the rx smartpin sampling point will be the rising clock edge + 2 sysclock ticks. Which is the beginning of the following falling clock edge. One whole SPI clock cycle of grace. Perfect for a divider of 4.

Different story as the frequency climbs and the rising clock edge takes longer than one sysclock tick.

PPS: I guess it goes without saying, pointing out the drive strength of the SD data out is quite weak. I presume this is by design to reduce noise while meeting the 25 MHz requirement of basic speed. That trace is from the Adata but the Sandisk Extreme's trace looks exactly the same. (EDIT: It actually looks more attenuated!) Attenuation is going to be bad at 50 MHz SPI clock rate. So, it kind of looks like the SD card has to be told to up its game if we want better performance.

evanh · 2022-04-13 05:04

Started investigating using CMD6 to switch to High-Speed mode - https://forums.parallax.com/discussion/comment/1537797/#Comment_1537797

Some experimenting later ... It's not terribly great news. Although CMD6 is mandatory for SPI interface, the parameters can all be locked to base defaults. Which most are on most cards. The Sandisk in particular has everything locked to default for SPI.

And those that do support switching to high-speed mode seem to just shift DO to the rising clock edge. The drive strength looks the same at first glance ...

Here's an example debug output, for the Apacer card: CMD6 GET status block: 00 64 80 01 80 01 80 01 80 01 80 01 80 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The 00 64 is 100 mA (or 360 mW). This value doubles when HS mode is set.
The five repeating 80 01 means base default support for each of those five parameters.
The final 80 03 means it supports both the base standard-speed and also high-speed.

The Sandisk's status is this: CMD6 GET status block: 00 64 80 01 80 01 80 01 80 01 c0 01 80 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
And attempts to tell it to try other than base settings results in error responses and no change of mode.

PS: The c0 01 means support for base and "Vendor Specific". That particular parameter is for changing the "command system". To me that says need proprietary documentation. And probably nothing to do with interface speed.

evanh · 2022-04-14 01:32

Well, I've now got three profiles built. The new one runs up to 60 MHz SPI clock when high-speed mode is available.

But there is a caveat! It assumes SPI high-speed simply shifts the SD card's DO clock edge from falling to rising. It just so happens that is an advantage for smartpins. I can drop the use of registering the clock pin because rx smartpin now has more time to sample the data. This in turn frees up the tx smartpin to reduce its lag back to 4 ticks.

If the SD card's DO doesn't change its clocking edge for SPI high-speed mode then this'll immediately go splat. Possibly over top the FAT file system.

Everyone: Please test this again. It might make a mess, so do use a card that you won't mind having to reformat afterwards.
PS: Debug is turned on. I'll be interested in reported output from various SD cards.
[broken file removed]

rogloh · 2022-04-14 04:00

@evanh Sounds like the same fundamental input delay timing issue we had to solve for reading from external memory, possibly compounded a bit more by the weak drive strength of some SD cards, and timing tweaks will be needed at different frequency ranges. Unfortuantely this is likely to plague any IO driver design when operating at high speed, and it's not going away.

evanh · 2022-04-14 06:31

Yep. Main difference is the latencies are relative to SPI clock input to the rx and tx smartpins rather than from software. Less stages. But on the flip side there is less opportunity to compensate.

The variation with edge switching is peculiar to SPI high-speed mode as empirically discovered. So , the problem here is possibly not all SD cards do this for SPI high-speed mode.

The lack of true synchronous I/O registers in the propellers means that we rely on asynchronous over-sampling instead. To be fair, I have no idea how much trouble it is to implement direct clocking of an I/O register. I'm guessing some sort of minimal FIFO, maybe just double buffering, is needed to interface between internal clocking and arbitrary synchronous I/O.

evanh · 2022-04-14 09:46

Hmm, no, SPI High-Speed mode ain't gonna be a reliable approach at all. Not really implemented seriously by any card. Only way to get stronger pin drive is start using SD protocol instead of SPI. EDIT: And that ain't gonna happen any time soon - partly because it's just overkill for any Prop2 memory sizes.

I think it might be time to tidy up and trim back what I've got ...

EDIT: Okay, final smartpins edition: Please, everyone give this a thorough test to ensure timing is robust.
EDIT2: Already a fix

ersmith · 2022-04-20 15:41

Thanks @evanh and @iseries for your suggestions and code. I've merged Evan's smartpin code into the flexspin libraries, and following Mike's suggestions I've updated the FAT file system code to the newest FatFs. All of this is checked into the spin2cpp (and flexprop) github now. Thank you!

Wuerfel_21 · 2022-04-20 16:30

@ersmith said:
Thanks @evanh and @iseries for your suggestions and code. I've merged Evan's smartpin code into the flexspin libraries, and following Mike's suggestions I've updated the FAT file system code to the newest FatFs. All of this is checked into the spin2cpp (and flexprop) github now. Thank you!

With that added and all the PRs dealt with, it perhaps is time to bundle up a release version....

EDIT: Speaking of, with the new optimizations and the LLVM port being a thing, perhaps it is also time for some light-hearted benchmarking action.

evanh · 2022-04-20 22:50

Cool! Now I have to wait to get home. Silly I didn't take a board with me.

evanh · 2022-05-11 11:56

Although SD cards are missing drive strength adjustment in SPI mode, I've since learnt SD drive strength wasn't what is slowing the DO signal down. It's because of an inline 240 ohm protection resistor in case of collisions with the EEPROM - https://forums.parallax.com/discussion/169233/boot-issue-accessing-spi-flash-after-sd-access/p1

evanh · 2022-05-22 12:33

Started some careful probing of streamer ops tonight and found they aren't quite as I was expecting. I was expecting an equivalent period cycling to how the pulse mode smartpins work. Where, once started, the period metronomically cycles even when no pulses are presenting to the pin.

Well, with a streamer, when data runs out, the XFRQ period seems to halt ... resuming on the next XCONT/XZERO. What this means is there is no delay from instruction to first bit output. Whereas, the smartpin pulse modes wait until the current idling period is completed before a new pulse will be output.

I was hoping to phase align the two systems with each other on the same period. Eliminating the need to realign on every command. But, alas, not to be.

Wuerfel_21 · 2022-05-22 12:59

The Way(tm) (at aleast for HyperRAM and it's sysclk/4 clock pin) looks a bit like this:

              drvl #HYPER_CLK ' Init clock pin with correct (?) alignment
              xinit mk_hyper_addr_cmd1,pa
              wypin mk_memtmp0,#HYPER_CLK ' setup clock periods

Not sure if it works like that for other dividers (maybe with an appropriate WAITX between DRVL and XINIT?).

evanh · 2022-05-22 21:40

Yes, the restarting of the clock smartpin is the critical element. So it needs also an earlier DIRL (or FLTL). I was hoping to find a way to not need that mechanism beyond initial setup. Alas, the streamer and smartpin internals are too different.

I've also used another, slightly more complicated, method in the past. Where the pulse period is adjusted to stretch the first clock, or a stretched initial blank streamer bit, to achieve fine control of the phase alignment. This method has no adjustment gaps in the full 360 degrees. Which makes it more useful if wanting to handle multiple sysclock ratios.

evanh · 2022-05-23 14:49

Hmm, the only fully flexible method I can see is this: EDIT2: Turns out the smaller WAITX method can handle variable ratios too.

// preambled bitstream
        setq    xfrq1
        xinit   pmode, #0      // phase delay at sysclock/1 (unbuffered command), length is dependent on spec'd rate
        setq    xfrq2
        xcont   txmode, haddr    // data out at spec'd rate (buffered command)
        dirh    #spi_ck        // restart clock period at spec'd rate
        wypin   clocks, #spi_ck    // produce clock pulses beginning from next period

EDIT: This works for both SPI clock mode 0 (CPOL=0, CPHA=0) and mode 3 (CPOL=1, CPHA=1)

For SPI mode = 0: pmode[15..0] for sysclock/2 to sysclock/20: 8,7,9, 8,9,9,11, 10,11,11,12, 12,13,13,15, 14,15,15,16
For SPI mode = 3: pmode[15..0] for sysclock/2 to sysclock/20: 9,9,11, 11,12,13,15, 15,16,17,18, 19,20,21,23, 23,24,25,26

There is a faster version just for SPI clock mode 3: pmode[15..0] for sysclock/2 to sysclock/20: 7,7,9, 7,10,6,9, 10,12,14,16, 18,6,7,9, 9,10,11,12

// preambled bitstream
        dirh    #spi_ck        // restart clock period at spec'd rate
        setq    xfrq1
        xinit   pmode, #0      // phase delay at sysclock/1 (unbuffered command), length is dependent on spec'd rate
        setq    xfrq2
        xcont   txmode, haddr    // data out at spec'd rate (buffered command)
        wypin   clocks, #spi_ck    // produce clock pulses beginning from next period

evanh · 2022-05-24 07:02

I can achieve a practical solution of your WAITX method by using data pin registration to introduce a one-sysclock lag on streamer output. This allows a way to avoid the timing limitation of that method. And as it turns out, the same +1 lag works fine at every sysclock ratio.

// delayed bitstream
        dirh    #spi_ck
        waitx   pdelay
        xinit   txmode, haddr
        wypin   clocks, #spi_ck

Here's the pdelay needed for sysclock/2 to sysclock/20: 0,0,0, 1,3,5,0, 0,1,2,3, 4,5,6,8, 8,9,10,11

PS: SPI clock mode = 3 (CPOL=1, CPHA=1) ... further experimenting ... looks like it's not possible to do this as mode 0, only the lucky sysclock/4 works out.

EDIT: Ha, lol, solved by removing the pin registration. So, SPI clock mode = 0 (CPOL=0, CPHA=0), pdelay for sysclock/2 to sysclock/20: 0,2,3, 4,1,2,5, 5,7,8,10, 11,13,14,1, 0,1,1,2

evanh · 2022-05-25 22:27

Some notes on findings in the above couple of posts:

SPI clock smartpin uses the pulse mode, therefore can have uneven high/low times:
- either P_PULSE | P_OE | P_INVERT_OUTPUT (CPOL=1)
- or P_PULSE | P_OE (CPOL=0)
When building the streamer pdelay tables I noticed sometimes the first bit output from the streamer was consistently extended by one sysclock tick. I have no explanation for this but it never changed for each timing. I mention it because I think it explains the quirkiness of the table values. Some entries are one sysclock tick smaller than otherwise expected ... not that I've confirmed this point. Oh, looking more carefully, they're all the non-powers-of-two. So only streamer clock dividers of 2,4,8,16,32,... have the ideal startup. That explains the consistency at least.

Improving SD Card performance

Comments