First thing to note is that while read performance is okay, write performance is way behind what the raw speed was doing in driver development. That's something I expect can be improved on as the filesystem code gets reworked.
Here's one of the better examples using the Sandisk Extreme 32 GB (2017). First the development program run (Using only an 8 kByte buffer size):
Hmm, maybe it's choking on lots of random sector seeks for updating the file system's FAT tables etc, although a lot of that activity should mostly happen once at the start of a large file write if the disk isn't fragmented and uses large cluster size. The cluster chain needs to be updated, presumably done as the file is being written in case of sudden loss. It'd be interesting to see the list of sectors being accessed by the filesystem. A timestamped debug log history in your driver might be needed to see this.
@evanh said:
I could do it real simple just by printing each start block number. There's no need for performance when mapping ...
Yeah then you'll know what the underlying file system is doing between all the sector transfers etc. Might be good to timestamp still, just for seeing relative gaps in calls etc. Absolute time is not so important.
The ones with +10 (16 block, 8 kB cluster size) are multi-block bursts. So clusters don't look to be grouped into a larger burst. Which means the larger buffer in the program isn't doing much good.
Looks like it's interleaving writes between two sector groups at the start. Maybe this part is updating the two FAT tables or something in preparation for the real write. Once the writes start after you print Buffer=256kB then it doesn't seem to touch the other sectors for a bit at least until the end, so this should be pretty fast. Are there gaps between each slowing things down I wonder. You probably want to log the timestamps in real time into a buffer for printing later. Then any gaps between calls will be noticed.
@rogloh said:
Looks like it's interleaving writes between two sector groups at the start. Maybe this part is updating the two FAT tables or something in preparation for the real write.
That'd be right. The fopen() is before it prints Buffer = 256 kB.
Once the writes start after you print Buffer=256kB then it doesn't seem to touch the other sectors for a bit at least until the end, so this should be pretty fast. Are there gaps between each slowing things down I wonder. You probably want to log the timestamps in real time into a buffer for printing later. Then any gaps between calls will be noticed.
Actually, funnily, although a long way short of the raw performance, it's quite fast at writing with 256 kB buffer. The slow ones are when using smaller buffer sizes.
Oh, oops, that block list is not 16 MB at all. It's only 512 kB, 2 x 256 kB.
If the reads were not being printed, then they might also be happening between some of the write calls as well if the filesystem keeps checking the table (hopefully it's not dumb and repeatedly reading the same thing). You'd expect to see some reads happening when the file is initially opened.
For comparison, using the SPI driver with the Samsung EVO 128 GB card:
clkfreq = 360000000 clkmode = 0x10011fb
Filesystem = fatfs, Driver = sdcardx
mount: OK
Buffer = 2 kB, Written 2048 kB at 299 kB/s, Verified, Read 2048 kB at 2441 kB/s
Buffer = 2 kB, Written 2048 kB at 289 kB/s, Verified, Read 2048 kB at 2277 kB/s
Buffer = 2 kB, Written 2048 kB at 289 kB/s, Verified, Read 2048 kB at 2461 kB/s
Buffer = 2 kB, Written 2048 kB at 291 kB/s, Verified, Read 2048 kB at 2286 kB/s
Buffer = 4 kB, Written 2048 kB at 496 kB/s, Verified, Read 2048 kB at 2873 kB/s
Buffer = 4 kB, Written 2048 kB at 499 kB/s, Verified, Read 2048 kB at 2877 kB/s
Buffer = 4 kB, Written 2048 kB at 494 kB/s, Verified, Read 2048 kB at 2855 kB/s
Buffer = 4 kB, Written 2048 kB at 498 kB/s, Verified, Read 2048 kB at 2785 kB/s
Buffer = 8 kB, Written 4096 kB at 736 kB/s, Verified, Read 4096 kB at 3210 kB/s
Buffer = 8 kB, Written 4096 kB at 735 kB/s, Verified, Read 4096 kB at 3195 kB/s
Buffer = 8 kB, Written 4096 kB at 731 kB/s, Verified, Read 4096 kB at 3216 kB/s
Buffer = 8 kB, Written 4096 kB at 722 kB/s, Verified, Read 4096 kB at 3199 kB/s
Buffer = 16 kB, Written 4096 kB at 960 kB/s, Verified, Read 4096 kB at 3383 kB/s
Buffer = 16 kB, Written 4096 kB at 963 kB/s, Verified, Read 4096 kB at 3375 kB/s
Buffer = 16 kB, Written 4096 kB at 963 kB/s, Verified, Read 4096 kB at 3367 kB/s
Buffer = 16 kB, Written 4096 kB at 961 kB/s, Verified, Read 4096 kB at 3382 kB/s
Buffer = 32 kB, Written 8192 kB at 1140 kB/s, Verified, Read 8192 kB at 3460 kB/s
Buffer = 32 kB, Written 8192 kB at 1157 kB/s, Verified, Read 8192 kB at 3466 kB/s
Buffer = 32 kB, Written 8192 kB at 1187 kB/s, Verified, Read 8192 kB at 3468 kB/s
Buffer = 32 kB, Written 8192 kB at 1257 kB/s, Verified, Read 8192 kB at 3461 kB/s
Buffer = 64 kB, Written 8192 kB at 1653 kB/s, Verified, Read 8192 kB at 3468 kB/s
Buffer = 64 kB, Written 8192 kB at 1653 kB/s, Verified, Read 8192 kB at 3459 kB/s
Buffer = 64 kB, Written 8192 kB at 1606 kB/s, Verified, Read 8192 kB at 3466 kB/s
Buffer = 64 kB, Written 8192 kB at 1681 kB/s, Verified, Read 8192 kB at 3470 kB/s
Buffer = 128 kB, Written 16384 kB at 1980 kB/s, Verified, Read 16384 kB at 3469 kB/s
Buffer = 128 kB, Written 16384 kB at 2033 kB/s, Verified, Read 16384 kB at 3468 kB/s
Buffer = 128 kB, Written 16384 kB at 2029 kB/s, Verified, Read 16384 kB at 3469 kB/s
Buffer = 128 kB, Written 16384 kB at 2034 kB/s, Verified, Read 16384 kB at 3469 kB/s
Buffer = 256 kB, Written 16384 kB at 2270 kB/s, Verified, Read 16384 kB at 3470 kB/s
Buffer = 256 kB, Written 16384 kB at 2290 kB/s, Verified, Read 16384 kB at 3470 kB/s
Buffer = 256 kB, Written 16384 kB at 2267 kB/s, Verified, Read 16384 kB at 3470 kB/s
Buffer = 256 kB, Written 16384 kB at 2282 kB/s, Verified, Read 16384 kB at 3469 kB/s
Write speeds are maybe 50% up in the newer driver, double speed with large buffer. Quite dismal considering I was claiming 10x uplift from the raw mode testing. Read speeds are something like 8x so not all gloom.
Sandisk Extreme 32GB using SPI driver is a little better on the write speeds. Starts +50% but moves above 3x at larger buffer sizes. Reads are a solid 8x all round. The Sandisk will be fighting a smaller cluster size too.
clkfreq = 360000000 clkmode = 0x10011fb
Filesystem = fatfs, Driver = sdcardx
mount: OK
Buffer = 2 kB, Written 2048 kB at 645 kB/s, Verified, Read 2048 kB at 2410 kB/s
Buffer = 2 kB, Written 2048 kB at 612 kB/s, Verified, Read 2048 kB at 2450 kB/s
Buffer = 2 kB, Written 2048 kB at 655 kB/s, Verified, Read 2048 kB at 2447 kB/s
Buffer = 2 kB, Written 2048 kB at 651 kB/s, Verified, Read 2048 kB at 2443 kB/s
Buffer = 4 kB, Written 2048 kB at 919 kB/s, Verified, Read 2048 kB at 2912 kB/s
Buffer = 4 kB, Written 2048 kB at 997 kB/s, Verified, Read 2048 kB at 2834 kB/s
Buffer = 4 kB, Written 2048 kB at 999 kB/s, Verified, Read 2048 kB at 2908 kB/s
Buffer = 4 kB, Written 2048 kB at 990 kB/s, Verified, Read 2048 kB at 2911 kB/s
Buffer = 8 kB, Written 4096 kB at 1234 kB/s, Verified, Read 4096 kB at 3212 kB/s
Buffer = 8 kB, Written 4096 kB at 1298 kB/s, Verified, Read 4096 kB at 3130 kB/s
Buffer = 8 kB, Written 4096 kB at 1323 kB/s, Verified, Read 4096 kB at 3219 kB/s
Buffer = 8 kB, Written 4096 kB at 1230 kB/s, Verified, Read 4096 kB at 3214 kB/s
Buffer = 16 kB, Written 4096 kB at 1589 kB/s, Verified, Read 4096 kB at 3380 kB/s
Buffer = 16 kB, Written 4096 kB at 1581 kB/s, Verified, Read 4096 kB at 3383 kB/s
Buffer = 16 kB, Written 4096 kB at 1594 kB/s, Verified, Read 4096 kB at 3379 kB/s
Buffer = 16 kB, Written 4096 kB at 1478 kB/s, Verified, Read 4096 kB at 3378 kB/s
Buffer = 32 kB, Written 8192 kB at 2038 kB/s, Verified, Read 8192 kB at 3382 kB/s
Buffer = 32 kB, Written 8192 kB at 1677 kB/s, Verified, Read 8192 kB at 3378 kB/s
Buffer = 32 kB, Written 8192 kB at 2022 kB/s, Verified, Read 8192 kB at 3379 kB/s
Buffer = 32 kB, Written 8192 kB at 1937 kB/s, Verified, Read 8192 kB at 3379 kB/s
Buffer = 64 kB, Written 8192 kB at 2350 kB/s, Verified, Read 8192 kB at 3380 kB/s
Buffer = 64 kB, Written 8192 kB at 2350 kB/s, Verified, Read 8192 kB at 3378 kB/s
Buffer = 64 kB, Written 8192 kB at 2347 kB/s, Verified, Read 8192 kB at 3380 kB/s
Buffer = 64 kB, Written 8192 kB at 2345 kB/s, Verified, Read 8192 kB at 3378 kB/s
Buffer = 128 kB, Written 16384 kB at 2555 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 128 kB, Written 16384 kB at 2486 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 128 kB, Written 16384 kB at 2533 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 128 kB, Written 16384 kB at 2565 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 256 kB, Written 16384 kB at 2677 kB/s, Verified, Read 16384 kB at 3372 kB/s
Buffer = 256 kB, Written 16384 kB at 2676 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 256 kB, Written 16384 kB at 2679 kB/s, Verified, Read 16384 kB at 3372 kB/s
Buffer = 256 kB, Written 16384 kB at 2574 kB/s, Verified, Read 16384 kB at 3373 kB/s
@Rayman said:
@evanh impressive speeds. So, it's ready to test? Does one just drop those files you posted into Flexprop to use it?
Yep, and yes. But you will need the accessory board that has 4-bit bus width. Roger designed the board, so Roger and I are the only two that have it.
Since the SPI wired bootable SD slot has that EEPROM clash limiting resistor it can't be clocked at extreme rates, so I've not bothered to support 1-bit mode. Although it certainly could be done in the future. There might be some speed-up since the new driver is using streamer ops for everything, which can operate at sysclock/2 even at high sysclocks. At high sysclocks the SPI driver steps down to sysclock/8.
Does the driver support a power enable pin?
Yes. Here's a report from the development code:
Card detected ... power cycle of SD card
power-down threshold = 37 pin state = 1power-down slope = 34914 us pin state = 0power-up threshold = 209 pin state = 0power-up slope = 1128 us pin state = 1
It's using the DAC-comparator pin mode to set each threshold (0.5 V low and 2.7 V high) and then measure how long it takes the power rail to transition. I take 32 samples spread across 1.0 ms, unanimous voting like the debounce circuit, to provide the spec'd hold time before state change. Voltage measuring is at the CLK pin. All I/O pins each have a 22 kR pull-up soldered on the accessory board.
I'm guessing Von is planning on making 4-bit versions for Parallax to sell.
Now that it's proven, I guess it wouldn't be such a huge effort to port the driver to Spin2 and place it in the Obex. I doubt I would have made it this far without printf() for debug. It's not just the ease of formatting the prints either, scrollable terminal history is a massive sanity saver.
Another thing I do now is, not unlike how Chip works, I have four separate editor windows open, tiled across the width of a single 43" 4k monitor. Each has multiple tabs that are mixtures of the files I'm examining. I can have all windows showing different parts of the one file if I like. None are full height. I have multiple terminals open below, and a multi-tabbed file manager, are fixtures.
@Rayman said:
Just looked at it and see it's all in C with inline PASM2.
Thought it might be spin2 that was converted with spin2cpp, but guess not.
printf() was a critical part of development.
Yeah, interfacing to FSRW should be next step. Can you provide the newest edition of that? I know it had lots of contributions over the years and many were broken.
Guess needs a better maintainer...
Also see a note where @ke4pjw wanted SD power to cycle with reset. Think that would be good enough?
Or, needs a dedicated pin?
@Rayman said:
@evanh impressive speeds. So, it's ready to test? Does one just drop those files you posted into Flexprop to use it?
Yep, and yes. But you will need the accessory board that has 4-bit bus width. Roger designed the board, so Roger and I are the only two that have it.
It's an open design shared earlier in this thread, and with any luck Parallax may run with a design based on it, or at least try to share the same pin mapping/functionality. It makes good use of the 8 pins of a P2 accessory module and evanh's driver obviously uses this pin mapping now too - so it's the "standard" right now. But anyone else making their own larger P2 boards can just use the same type of circuit and evanh's code should work at speed if the layout is not too crazy. The little pFET I just had in my spare parts bin could be changed to something else easier to solder if needed it just needs a reasonably low Rds to not drop voltage too much at SD card currents. Or it could be controlled by a regulator from 5V down to 3.3V. The card detect trick is handy too to share the pin and ensures power gets shut off when a card is ejected - also handy for controlling regulator use in low powered setups.
Ah, oh, I've found something - The filesystem is requesting a SYNC after each cluster is written. That's excessively pedantic behaviour, imho. I never imagined SYNC was even being used, to be honest. Maybe once upon file close seemed a reasonable place. Calling SYNC makes no diff to the SD card. It'll store the sent data just the same either way.
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this. ... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
@rogloh said:
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Doesn't help at the moment. The problem is those singles, they kill any chance of performance because they break up the contiguous order. I just tried, via the driver, concatenating the separate but contiguous cluster writes into one CMD25. And got it working seemingly reliably too. It made a difference, but only for buffers larger than the cluster size. So kind of annoying.
Eg: Samsung, using 256 kB buffer, went from 5 MB/s to 9.5 MB/s writing a 16 MB file.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this. ... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
Yeah, that's the sort of thing that needs addressed for sure.
@Rayman said:
going to try it with this very soon...
Nice. Are there any series resistors underneath the board, or were they skipped? Will be interesting to see what performance can be gained if you did omit them.
@rogloh said:
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Doesn't help at the moment. The problem is those singles, they kill any chance of performance because they break up the contiguous order. I just tried, via the driver, concatenating the separate but contiguous cluster writes into one CMD25. And got it working seemingly reliably too. It made a difference, but only for buffers larger than the cluster size. So kind of annoying.
Eg: Samsung, using 256 kB buffer, went from 5 MB/s to 9.5 MB/s writing a 16 MB file.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this. ... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
Yeah, that's the sort of thing that needs addressed for sure.
Where is the filesystem code that controls all this exactly? Is it part of flexspin source tree? I'd like to see the C code if it's published somewhere.
@rogloh said:
Where is the filesystem code that controls all this exactly? Is it part of flexspin source tree? I'd like to see the C code if it's published somewhere.
It'll be there in that same directory as the driver file. I copied it all into another directory because it needed the top level API altered to get the extra pins assigned. Presumably ff.c is the actual filesystem source since it's the big file.
If you wanted to have both the new and old drivers for two SD cards operating at once then there would presumably be two complete copies of the filesystem compiled in as well.
Another temporary hack by me really.
EDIT: Taking a peek, it's quite the beast I see. Supports long file names, TRIMming, and ExFAT too.
The Unicode stuff seems overkill. That's going to just be for filenames.
Comments
First thing to note is that while read performance is okay, write performance is way behind what the raw speed was doing in driver development. That's something I expect can be improved on as the filesystem code gets reworked.
Here's one of the better examples using the Sandisk Extreme 32 GB (2017). First the development program run (Using only an 8 kByte buffer size):
400 kHz SD clock divider = 900 CMD8 R7 0000015a - Valid v2.0+ SD Card ACMD41 busy duration: 184 ms ACMD41 OCR c0ff8000 - Valid SDHC/SDXC Card CMD2/CMD3 - Data Transfer Mode entered - Published RCA aaaa0000 ACMD6 - 4-bit data interface engaged CMD6 - High-Speed access mode engaged CMD9 - CSD backed-up CMD10 - CID backed-up Full Speed clock divider = 3 (120.0 MHz) rxlag=7 selected Lowest=6 Highest=8 CID decode: ManID=03 OEMID=SD Name=SE32G Ver=8.0 Serial=6018369B Date=2017-12 CSD decode: Ver 2 Unformatted User Capacity = 29.22 GiBytes Selected Bus Speed = 50.00 MHz Strength Adjust = no Supported Command Classes = basic blockread blockwrite erase lock appspec switch Unsupported Command Classes = queuing reserved writeprotect i/o extension Functions decode: Current limit = 200 mA (averaged) Function BitMap: G6=8001 G5=8001 G4=8001 G3=8001 G2=c001 G1=8003 Function Select: G6 = 0 G5 = 0 G4 = 0 G3 = 0 G2 = 0 G1 = 1 Init successful Other compile options: - Response handler #2 - Buffer size is 8 kiBytes - Multi-block-read loop - Read data CRC processed in parallel - P_PULSE clock-gen RX_SCHMITT=0 DAT_REG=0 CLK_REG=0 CLK_POL=0 CLK_DIV=3 Write blocks speed test (ACMD23 not used): 32768 blocks = 16384 kiB busy=37 rate = 42.4 MiB/s duration = 376912 us zero-overhead = 279620 us overheads = 25.8 % 16384 blocks = 8192 kiB busy=37 rate = 47.0 MiB/s duration = 170105 us zero-overhead = 139810 us overheads = 17.8 % 8192 blocks = 4096 kiB busy=37 rate = 47.2 MiB/s duration = 84577 us zero-overhead = 69905 us overheads = 17.3 % 4096 blocks = 2048 kiB busy=37 rate = 42.7 MiB/s duration = 46740 us zero-overhead = 34953 us overheads = 25.2 % 2048 blocks = 1024 kiB busy=37 rate = 46.5 MiB/s duration = 21497 us zero-overhead = 17476 us overheads = 18.7 % 1024 blocks = 512 kiB busy=37 rate = 45.5 MiB/s duration = 10987 us zero-overhead = 8738 us overheads = 20.4 % 512 blocks = 256 kiB busy=37 rate = 43.6 MiB/s duration = 5727 us zero-overhead = 4369 us overheads = 23.7 % 256 blocks = 128 kiB busy=37 rate = 40.3 MiB/s duration = 3098 us zero-overhead = 2185 us overheads = 29.4 % 128 blocks = 64 kiB busy=37 rate = 43.9 MiB/s duration = 1422 us zero-overhead = 1092 us overheads = 23.2 % 64 blocks = 32 kiB busy=37 rate = 40.8 MiB/s duration = 765 us zero-overhead = 546 us overheads = 28.6 % 32 blocks = 16 kiB busy=37 rate = 35.7 MiB/s duration = 437 us zero-overhead = 273 us overheads = 37.5 % 16 blocks = 8 kiB busy=37 rate = 28.6 MiB/s duration = 273 us zero-overhead = 137 us overheads = 49.8 % 8 blocks = 4 kiB busy=37 rate = 20.5 MiB/s duration = 190 us zero-overhead = 68 us overheads = 64.2 % 4 blocks = 2 kiB busy=37 rate = 13.1 MiB/s duration = 149 us zero-overhead = 34 us overheads = 77.1 % 2 blocks = 1 kiB busy=37 rate = 7.5 MiB/s duration = 129 us zero-overhead = 17 us overheads = 86.8 % Read blocks speed test: 32768 blocks = 16384 kiB rate = 47.3 MiB/s duration = 337991 us zero-overhead = 279620 us overheads = 17.2 % 16384 blocks = 8192 kiB rate = 47.5 MiB/s duration = 168375 us zero-overhead = 139810 us overheads = 16.9 % 8192 blocks = 4096 kiB rate = 47.4 MiB/s duration = 84252 us zero-overhead = 69905 us overheads = 17.0 % 4096 blocks = 2048 kiB rate = 47.3 MiB/s duration = 42251 us zero-overhead = 34953 us overheads = 17.2 % 2048 blocks = 1024 kiB rate = 47.0 MiB/s duration = 21251 us zero-overhead = 17476 us overheads = 17.7 % 1024 blocks = 512 kiB rate = 46.5 MiB/s duration = 10751 us zero-overhead = 8738 us overheads = 18.7 % 512 blocks = 256 kiB rate = 45.4 MiB/s duration = 5500 us zero-overhead = 4369 us overheads = 20.5 % 256 blocks = 128 kiB rate = 43.4 MiB/s duration = 2875 us zero-overhead = 2185 us overheads = 24.0 % 128 blocks = 64 kiB rate = 39.9 MiB/s duration = 1563 us zero-overhead = 1092 us overheads = 30.1 % 64 blocks = 32 kiB rate = 34.4 MiB/s duration = 907 us zero-overhead = 546 us overheads = 39.8 % 32 blocks = 16 kiB rate = 27.0 MiB/s duration = 578 us zero-overhead = 273 us overheads = 52.7 % 16 blocks = 8 kiB rate = 21.6 MiB/s duration = 361 us zero-overhead = 137 us overheads = 62.0 % 8 blocks = 4 kiB rate = 16.7 MiB/s duration = 233 us zero-overhead = 68 us overheads = 70.8 % 4 blocks = 2 kiB rate = 10.0 MiB/s duration = 195 us zero-overhead = 34 us overheads = 82.5 % 2 blocks = 1 kiB rate = 5.5 MiB/s duration = 175 us zero-overhead = 17 us overheads = 90.2 %
Then the filesystem tester run:
clkfreq = 360000000 clkmode = 0x10011fb Clock divider for SD card is 3 (120 MHz) mount: OK Buffer = 2 kB, Written 2048 kB at 741 kB/s, Verified, Read 2048 kB at 6353 kB/s Buffer = 2 kB, Written 2048 kB at 796 kB/s, Verified, Read 2048 kB at 6286 kB/s Buffer = 2 kB, Written 2048 kB at 756 kB/s, Verified, Read 2048 kB at 6299 kB/s Buffer = 2 kB, Written 2048 kB at 822 kB/s, Verified, Read 2048 kB at 6157 kB/s Buffer = 4 kB, Written 2048 kB at 1494 kB/s, Verified, Read 2048 kB at 11038 kB/s Buffer = 4 kB, Written 2048 kB at 1312 kB/s, Verified, Read 2048 kB at 11055 kB/s Buffer = 4 kB, Written 2048 kB at 1490 kB/s, Verified, Read 2048 kB at 11231 kB/s Buffer = 4 kB, Written 2048 kB at 1476 kB/s, Verified, Read 2048 kB at 10339 kB/s Buffer = 8 kB, Written 4096 kB at 2172 kB/s, Verified, Read 4096 kB at 18582 kB/s Buffer = 8 kB, Written 4096 kB at 2046 kB/s, Verified, Read 4096 kB at 16885 kB/s Buffer = 8 kB, Written 4096 kB at 2304 kB/s, Verified, Read 4096 kB at 18455 kB/s Buffer = 8 kB, Written 4096 kB at 2272 kB/s, Verified, Read 4096 kB at 18500 kB/s Buffer = 16 kB, Written 4096 kB at 2988 kB/s, Verified, Read 4096 kB at 27761 kB/s Buffer = 16 kB, Written 4096 kB at 2782 kB/s, Verified, Read 4096 kB at 25235 kB/s Buffer = 16 kB, Written 4096 kB at 3174 kB/s, Verified, Read 4096 kB at 27669 kB/s Buffer = 16 kB, Written 4096 kB at 3106 kB/s, Verified, Read 4096 kB at 27615 kB/s Buffer = 32 kB, Written 8192 kB at 4975 kB/s, Verified, Read 8192 kB at 26591 kB/s Buffer = 32 kB, Written 8192 kB at 3432 kB/s, Verified, Read 8192 kB at 27955 kB/s Buffer = 32 kB, Written 8192 kB at 4651 kB/s, Verified, Read 8192 kB at 26414 kB/s Buffer = 32 kB, Written 8192 kB at 4953 kB/s, Verified, Read 8192 kB at 27815 kB/s Buffer = 64 kB, Written 8192 kB at 6420 kB/s, Verified, Read 8192 kB at 26215 kB/s Buffer = 64 kB, Written 8192 kB at 5877 kB/s, Verified, Read 8192 kB at 27816 kB/s Buffer = 64 kB, Written 8192 kB at 5684 kB/s, Verified, Read 8192 kB at 27824 kB/s Buffer = 64 kB, Written 8192 kB at 6784 kB/s, Verified, Read 8192 kB at 27806 kB/s Buffer = 128 kB, Written 16384 kB at 5012 kB/s, Verified, Read 16384 kB at 26996 kB/s Buffer = 128 kB, Written 16384 kB at 8265 kB/s, Verified, Read 16384 kB at 26973 kB/s Buffer = 128 kB, Written 16384 kB at 8212 kB/s, Verified, Read 16384 kB at 27853 kB/s Buffer = 128 kB, Written 16384 kB at 7704 kB/s, Verified, Read 16384 kB at 26901 kB/s Buffer = 256 kB, Written 16384 kB at 9425 kB/s, Verified, Read 16384 kB at 26920 kB/s Buffer = 256 kB, Written 16384 kB at 9410 kB/s, Verified, Read 16384 kB at 26960 kB/s Buffer = 256 kB, Written 16384 kB at 9467 kB/s, Verified, Read 16384 kB at 27021 kB/s Buffer = 256 kB, Written 16384 kB at 9375 kB/s, Verified, Read 16384 kB at 27020 kB/s
And a lesser example using the Samsung EVO 128 GB (2023):
400 kHz SD clock divider = 900 CMD8 R7 0000015a - Valid v2.0+ SD Card ACMD41 busy duration: 150 ms ACMD41 OCR c0ff8000 - Valid SDHC/SDXC Card CMD2/CMD3 - Data Transfer Mode entered - Published RCA 59b40000 ACMD6 - 4-bit data interface engaged CMD6 - High-Speed access mode engaged CMD9 - CSD backed-up CMD10 - CID backed-up Full Speed clock divider = 3 (120.0 MHz) rxlag=7 selected Lowest=6 Highest=8 CID decode: ManID=1B OEMID=SM Name=ED2S5 Ver=3.0 Serial=49C16906 Date=2023-2 CSD decode: Ver 2 Unformatted User Capacity = 119.37 GiBytes Selected Bus Speed = 50.00 MHz Strength Adjust = no Supported Command Classes = basic queuing blockread blockwrite erase lock appspec switch extension Unsupported Command Classes = reserved writeprotect i/o Functions decode: Current limit = 200 mA (averaged) Function BitMap: G6=8001 G5=8001 G4=8001 G3=8001 G2=c001 G1=8003 Function Select: G6 = 0 G5 = 0 G4 = 0 G3 = 0 G2 = 0 G1 = 1 Init successful Other compile options: - Response handler #2 - Buffer size is 8 kiBytes - Multi-block-read loop - Read data CRC processed in parallel - P_PULSE clock-gen RX_SCHMITT=0 DAT_REG=0 CLK_REG=0 CLK_POL=0 CLK_DIV=3 Write blocks speed test (ACMD23 not used): 32768 blocks = 16384 kiB busy=37 rate = 43.9 MiB/s duration = 363813 us zero-overhead = 279620 us overheads = 23.1 % 16384 blocks = 8192 kiB busy=37 rate = 46.0 MiB/s duration = 173777 us zero-overhead = 139810 us overheads = 19.5 % 8192 blocks = 4096 kiB busy=37 rate = 37.0 MiB/s duration = 107824 us zero-overhead = 69905 us overheads = 35.1 % 4096 blocks = 2048 kiB busy=37 rate = 43.2 MiB/s duration = 46268 us zero-overhead = 34953 us overheads = 24.4 % 2048 blocks = 1024 kiB busy=37 rate = 43.5 MiB/s duration = 22966 us zero-overhead = 17476 us overheads = 23.9 % 1024 blocks = 512 kiB busy=37 rate = 35.1 MiB/s duration = 14235 us zero-overhead = 8738 us overheads = 38.6 % 512 blocks = 256 kiB busy=37 rate = 35.9 MiB/s duration = 6950 us zero-overhead = 4369 us overheads = 37.1 % 256 blocks = 128 kiB busy=37 rate = 20.0 MiB/s duration = 6226 us zero-overhead = 2185 us overheads = 64.9 % 128 blocks = 64 kiB busy=37 rate = 20.5 MiB/s duration = 3036 us zero-overhead = 1092 us overheads = 64.0 % 64 blocks = 32 kiB busy=37 rate = 7.3 MiB/s duration = 4225 us zero-overhead = 546 us overheads = 87.0 % 32 blocks = 16 kiB busy=37 rate = 7.0 MiB/s duration = 2220 us zero-overhead = 273 us overheads = 87.7 % 16 blocks = 8 kiB busy=37 rate = 3.6 MiB/s duration = 2164 us zero-overhead = 137 us overheads = 93.6 % 8 blocks = 4 kiB busy=37 rate = 1.7 MiB/s duration = 2215 us zero-overhead = 68 us overheads = 96.9 % 4 blocks = 2 kiB busy=37 rate = 0.8 MiB/s duration = 2242 us zero-overhead = 34 us overheads = 98.4 % 2 blocks = 1 kiB busy=37 rate = 0.4 MiB/s duration = 2321 us zero-overhead = 17 us overheads = 99.2 % Read blocks speed test: 32768 blocks = 16384 kiB rate = 35.1 MiB/s duration = 455120 us zero-overhead = 279620 us overheads = 38.5 % 16384 blocks = 8192 kiB rate = 32.4 MiB/s duration = 246691 us zero-overhead = 139810 us overheads = 43.3 % 8192 blocks = 4096 kiB rate = 29.9 MiB/s duration = 133429 us zero-overhead = 69905 us overheads = 47.6 % 4096 blocks = 2048 kiB rate = 28.1 MiB/s duration = 71126 us zero-overhead = 34953 us overheads = 50.8 % 2048 blocks = 1024 kiB rate = 27.2 MiB/s duration = 36760 us zero-overhead = 17476 us overheads = 52.4 % 1024 blocks = 512 kiB rate = 27.1 MiB/s duration = 18390 us zero-overhead = 8738 us overheads = 52.4 % 512 blocks = 256 kiB rate = 26.8 MiB/s duration = 9303 us zero-overhead = 4369 us overheads = 53.0 % 256 blocks = 128 kiB rate = 26.2 MiB/s duration = 4761 us zero-overhead = 2185 us overheads = 54.1 % 128 blocks = 64 kiB rate = 25.2 MiB/s duration = 2473 us zero-overhead = 1092 us overheads = 55.8 % 64 blocks = 32 kiB rate = 23.7 MiB/s duration = 1316 us zero-overhead = 546 us overheads = 58.5 % 32 blocks = 16 kiB rate = 20.4 MiB/s duration = 765 us zero-overhead = 273 us overheads = 64.3 % 16 blocks = 8 kiB rate = 16.9 MiB/s duration = 462 us zero-overhead = 137 us overheads = 70.3 % 8 blocks = 4 kiB rate = 12.8 MiB/s duration = 303 us zero-overhead = 68 us overheads = 77.5 % 4 blocks = 2 kiB rate = 7.3 MiB/s duration = 265 us zero-overhead = 34 us overheads = 87.1 % 2 blocks = 1 kiB rate = 3.9 MiB/s duration = 246 us zero-overhead = 17 us overheads = 93.0 %
clkfreq = 360000000 clkmode = 0x10011fb Clock divider for SD card is 3 (120 MHz) mount: OK Buffer = 2 kB, Written 2048 kB at 312 kB/s, Verified, Read 2048 kB at 6154 kB/s Buffer = 2 kB, Written 2048 kB at 313 kB/s, Verified, Read 2048 kB at 6193 kB/s Buffer = 2 kB, Written 2048 kB at 318 kB/s, Verified, Read 2048 kB at 6193 kB/s Buffer = 2 kB, Written 2048 kB at 331 kB/s, Verified, Read 2048 kB at 6421 kB/s Buffer = 4 kB, Written 2048 kB at 568 kB/s, Verified, Read 2048 kB at 11115 kB/s Buffer = 4 kB, Written 2048 kB at 570 kB/s, Verified, Read 2048 kB at 10954 kB/s Buffer = 4 kB, Written 2048 kB at 572 kB/s, Verified, Read 2048 kB at 11304 kB/s Buffer = 4 kB, Written 2048 kB at 574 kB/s, Verified, Read 2048 kB at 11379 kB/s Buffer = 8 kB, Written 4096 kB at 902 kB/s, Verified, Read 4096 kB at 15562 kB/s Buffer = 8 kB, Written 4096 kB at 910 kB/s, Verified, Read 4096 kB at 18040 kB/s Buffer = 8 kB, Written 4096 kB at 924 kB/s, Verified, Read 4096 kB at 16157 kB/s Buffer = 8 kB, Written 4096 kB at 916 kB/s, Verified, Read 4096 kB at 17806 kB/s Buffer = 16 kB, Written 4096 kB at 1295 kB/s, Verified, Read 4096 kB at 21790 kB/s Buffer = 16 kB, Written 4096 kB at 1288 kB/s, Verified, Read 4096 kB at 20542 kB/s Buffer = 16 kB, Written 4096 kB at 1284 kB/s, Verified, Read 4096 kB at 24078 kB/s Buffer = 16 kB, Written 4096 kB at 1289 kB/s, Verified, Read 4096 kB at 21756 kB/s Buffer = 32 kB, Written 8192 kB at 1770 kB/s, Verified, Read 8192 kB at 31619 kB/s Buffer = 32 kB, Written 8192 kB at 1690 kB/s, Verified, Read 8192 kB at 25854 kB/s Buffer = 32 kB, Written 8192 kB at 1622 kB/s, Verified, Read 8192 kB at 28498 kB/s Buffer = 32 kB, Written 8192 kB at 1675 kB/s, Verified, Read 8192 kB at 32336 kB/s Buffer = 64 kB, Written 8192 kB at 2748 kB/s, Verified, Read 8192 kB at 26696 kB/s Buffer = 64 kB, Written 8192 kB at 2893 kB/s, Verified, Read 8192 kB at 31585 kB/s Buffer = 64 kB, Written 8192 kB at 2728 kB/s, Verified, Read 8192 kB at 25030 kB/s Buffer = 64 kB, Written 8192 kB at 2757 kB/s, Verified, Read 8192 kB at 27363 kB/s Buffer = 128 kB, Written 16384 kB at 3909 kB/s, Verified, Read 16384 kB at 30130 kB/s Buffer = 128 kB, Written 16384 kB at 4107 kB/s, Verified, Read 16384 kB at 28206 kB/s Buffer = 128 kB, Written 16384 kB at 3826 kB/s, Verified, Read 16384 kB at 27884 kB/s Buffer = 128 kB, Written 16384 kB at 4128 kB/s, Verified, Read 16384 kB at 27955 kB/s Buffer = 256 kB, Written 16384 kB at 5071 kB/s, Verified, Read 16384 kB at 27712 kB/s Buffer = 256 kB, Written 16384 kB at 5299 kB/s, Verified, Read 16384 kB at 28316 kB/s Buffer = 256 kB, Written 16384 kB at 5022 kB/s, Verified, Read 16384 kB at 28461 kB/s Buffer = 256 kB, Written 16384 kB at 5261 kB/s, Verified, Read 16384 kB at 28816 kB/s
I just read this in your results:
So is this being tested at 360MHz? I'll be running around 297MHz or 270MHz I guess. Wonder what I can expect to see there.
Point is the filesystem code is showing its weaknesses now that we have a more performant driver.
I haven't tried looking into how it all works so aren't able to give any reason though.
Hmm, maybe it's choking on lots of random sector seeks for updating the file system's FAT tables etc, although a lot of that activity should mostly happen once at the start of a large file write if the disk isn't fragmented and uses large cluster size. The cluster chain needs to be updated, presumably done as the file is being written in case of sudden loss. It'd be interesting to see the list of sectors being accessed by the filesystem. A timestamped debug log history in your driver might be needed to see this.
I could do it real simple just by printing each start block number. There's no need for performance when mapping ...
Yeah then you'll know what the underlying file system is doing between all the sector transfers etc. Might be good to timestamp still, just for seeing relative gaps in calls etc. Absolute time is not so important.
Here's just the writes alone - for writing one 512 kB file, using a 256 kB buffer, to a 16 GB card:
wr#7e00 wr#84c wr#433c wr#84d wr#433d wr#84e wr#433e wr#84f wr#433f wr#850 wr#4340 wr#851 wr#4341 wr#852 wr#4342 wr#853 wr#4343 wr#854 wr#4344 wr#855 wr#4345 wr#856 wr#4346 wr#857 wr#4347 wr#858 wr#4348 wr#859 wr#4349 wr#85a wr#434a wr#85b wr#434b wr#85c wr#434c Buffer = 256 kB, wr#7e00 wr#801 wr#1de10+10 wr#1de20+10 wr#1de30+10 wr#1de40+10 wr#1de50+10 wr#1de60+10 wr#1de70+10 wr#1de80+10 wr#1de90+10 wr#1dea0+10 wr#1deb0+10 wr#1dec0+10 wr#1ded0+10 wr#1dee0+10 wr#1def0+10 wr#1df00+10 wr#1df10+10 wr#1df20+10 wr#1df30+10 wr#1df40+10 wr#1df50+10 wr#1df60+10 wr#1df70+10 wr#1df80+10 wr#1df90+10 wr#1dfa0+10 wr#1dfb0+10 wr#1dfc0+10 wr#1dfd0+10 wr#1dfe0+10 wr#1dff0+10 wr#1e000+10 wr#84c wr#433c wr#7e00 wr#801 wr#1e010+10 wr#1e020+10 wr#1e030+10 wr#1e040+10 wr#1e050+10 wr#1e060+10 wr#1e070+10 wr#1e080+10 wr#1e090+10 wr#1e0a0+10 wr#1e0b0+10 wr#1e0c0+10 wr#1e0d0+10 wr#1e0e0+10 wr#1e0f0+10 wr#1e100+10 wr#1e110+10 wr#1e120+10 wr#1e130+10 wr#1e140+10 wr#1e150+10 wr#1e160+10 wr#1e170+10 wr#1e180+10 wr#1e190+10 wr#1e1a0+10 wr#1e1b0+10 wr#1e1c0+10 wr#1e1d0+10 wr#1e1e0+10 wr#1e1f0+10 wr#1e200+10 wr#84c wr#433c wr#7e00 wr#801 Written 512 kB at 2411 kB/s, Verified, Read 512 kB at 13244 kB/s
The ones with
+10
(16 block, 8 kB cluster size) are multi-block bursts. So clusters don't look to be grouped into a larger burst. Which means the larger buffer in the program isn't doing much good.Looks like it's interleaving writes between two sector groups at the start. Maybe this part is updating the two FAT tables or something in preparation for the real write. Once the writes start after you print Buffer=256kB then it doesn't seem to touch the other sectors for a bit at least until the end, so this should be pretty fast. Are there gaps between each slowing things down I wonder. You probably want to log the timestamps in real time into a buffer for printing later. Then any gaps between calls will be noticed.
That'd be right. The
fopen()
is before it printsBuffer = 256 kB
.Actually, funnily, although a long way short of the raw performance, it's quite fast at writing with 256 kB buffer. The slow ones are when using smaller buffer sizes.
Oh, oops, that block list is not 16 MB at all. It's only 512 kB, 2 x 256 kB.
If the reads were not being printed, then they might also be happening between some of the write calls as well if the filesystem keeps checking the table (hopefully it's not dumb and repeatedly reading the same thing). You'd expect to see some reads happening when the file is initially opened.
That card above was a different one. Here's the same test again using the Samsung 128 GB card. You can see the cluster size is 4x larger.
wr#f740 wr#84b wr#7fcb wr#84c wr#7fcc wr#84d wr#7fcd wr#84e wr#7fce wr#84f wr#7fcf Buffer = 256 kB, wr#f740 wr#801 wr#25780+40 wr#257c0+40 wr#25800+40 wr#25840+40 wr#25880+40 wr#258c0+40 wr#25900+40 wr#25940+40 wr#84b wr#7fcb wr#f740 wr#801 wr#25980+40 wr#259c0+40 wr#25a00+40 wr#25a40+40 wr#25a80+40 wr#25ac0+40 wr#25b00+40 wr#25b40+40 wr#84b wr#7fcb wr#f740 wr#801 Written 512 kB at 3061 kB/s, Verified, Read 512 kB at 28504 kB/s
Another thing that becomes more obvious here is there is four single block writes between each 256 kB buffer written.
EDIT: Same again but with reads now reported as well:
RD0 RD800 RD801 RDf740 WRf740 RD84b WR84b WR7fcb RDf740 Buffer = 256 kB, WRf740 WR801 RD84b WR25780+40 WR257c0+40 WR25800+40 WR25840+40 WR25880+40 WR258c0+40 WR25900+40 WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 WR259c0+40 WR25a00+40 WR25a40+40 WR25a80+40 WR25ac0+40 WR25b00+40 WR25b40+40 WR84b WR7fcb RDf740 WRf740 WR801 Written 512 kB at 3458 kB/s, RDf740 RD25780+40 RD84b RD257c0+40 RD25800+40 RD25840+40 RD25880+40 RD258c0+40 RD25900+40 RD25940+40 RD25980+40 RD259c0+40 RD25a00+40 RD25a40+40 RD25a80+40 RD25ac0+40 RD25b00+40 RD25b40+40 Verified, RD25780+40 RD257c0+40 RD25800+40 RD25840+40 RD25880+40 RD258c0+40 RD25900+40 RD25940+40 RD25980+40 RD259c0+40 RD25a00+40 RD25a40+40 RD25a80+40 RD25ac0+40 RD25b00+40 RD25b40+40 Read 512 kB at 19893 kB/s
For comparison, using the SPI driver with the Samsung EVO 128 GB card:
clkfreq = 360000000 clkmode = 0x10011fb Filesystem = fatfs, Driver = sdcardx mount: OK Buffer = 2 kB, Written 2048 kB at 299 kB/s, Verified, Read 2048 kB at 2441 kB/s Buffer = 2 kB, Written 2048 kB at 289 kB/s, Verified, Read 2048 kB at 2277 kB/s Buffer = 2 kB, Written 2048 kB at 289 kB/s, Verified, Read 2048 kB at 2461 kB/s Buffer = 2 kB, Written 2048 kB at 291 kB/s, Verified, Read 2048 kB at 2286 kB/s Buffer = 4 kB, Written 2048 kB at 496 kB/s, Verified, Read 2048 kB at 2873 kB/s Buffer = 4 kB, Written 2048 kB at 499 kB/s, Verified, Read 2048 kB at 2877 kB/s Buffer = 4 kB, Written 2048 kB at 494 kB/s, Verified, Read 2048 kB at 2855 kB/s Buffer = 4 kB, Written 2048 kB at 498 kB/s, Verified, Read 2048 kB at 2785 kB/s Buffer = 8 kB, Written 4096 kB at 736 kB/s, Verified, Read 4096 kB at 3210 kB/s Buffer = 8 kB, Written 4096 kB at 735 kB/s, Verified, Read 4096 kB at 3195 kB/s Buffer = 8 kB, Written 4096 kB at 731 kB/s, Verified, Read 4096 kB at 3216 kB/s Buffer = 8 kB, Written 4096 kB at 722 kB/s, Verified, Read 4096 kB at 3199 kB/s Buffer = 16 kB, Written 4096 kB at 960 kB/s, Verified, Read 4096 kB at 3383 kB/s Buffer = 16 kB, Written 4096 kB at 963 kB/s, Verified, Read 4096 kB at 3375 kB/s Buffer = 16 kB, Written 4096 kB at 963 kB/s, Verified, Read 4096 kB at 3367 kB/s Buffer = 16 kB, Written 4096 kB at 961 kB/s, Verified, Read 4096 kB at 3382 kB/s Buffer = 32 kB, Written 8192 kB at 1140 kB/s, Verified, Read 8192 kB at 3460 kB/s Buffer = 32 kB, Written 8192 kB at 1157 kB/s, Verified, Read 8192 kB at 3466 kB/s Buffer = 32 kB, Written 8192 kB at 1187 kB/s, Verified, Read 8192 kB at 3468 kB/s Buffer = 32 kB, Written 8192 kB at 1257 kB/s, Verified, Read 8192 kB at 3461 kB/s Buffer = 64 kB, Written 8192 kB at 1653 kB/s, Verified, Read 8192 kB at 3468 kB/s Buffer = 64 kB, Written 8192 kB at 1653 kB/s, Verified, Read 8192 kB at 3459 kB/s Buffer = 64 kB, Written 8192 kB at 1606 kB/s, Verified, Read 8192 kB at 3466 kB/s Buffer = 64 kB, Written 8192 kB at 1681 kB/s, Verified, Read 8192 kB at 3470 kB/s Buffer = 128 kB, Written 16384 kB at 1980 kB/s, Verified, Read 16384 kB at 3469 kB/s Buffer = 128 kB, Written 16384 kB at 2033 kB/s, Verified, Read 16384 kB at 3468 kB/s Buffer = 128 kB, Written 16384 kB at 2029 kB/s, Verified, Read 16384 kB at 3469 kB/s Buffer = 128 kB, Written 16384 kB at 2034 kB/s, Verified, Read 16384 kB at 3469 kB/s Buffer = 256 kB, Written 16384 kB at 2270 kB/s, Verified, Read 16384 kB at 3470 kB/s Buffer = 256 kB, Written 16384 kB at 2290 kB/s, Verified, Read 16384 kB at 3470 kB/s Buffer = 256 kB, Written 16384 kB at 2267 kB/s, Verified, Read 16384 kB at 3470 kB/s Buffer = 256 kB, Written 16384 kB at 2282 kB/s, Verified, Read 16384 kB at 3469 kB/s
Write speeds are maybe 50% up in the newer driver, double speed with large buffer. Quite dismal considering I was claiming 10x uplift from the raw mode testing.
Read speeds are something like 8x so not all gloom. 
Sandisk Extreme 32GB using SPI driver is a little better on the write speeds. Starts +50% but moves above 3x at larger buffer sizes. Reads are a solid 8x all round. The Sandisk will be fighting a smaller cluster size too.
clkfreq = 360000000 clkmode = 0x10011fb Filesystem = fatfs, Driver = sdcardx mount: OK Buffer = 2 kB, Written 2048 kB at 645 kB/s, Verified, Read 2048 kB at 2410 kB/s Buffer = 2 kB, Written 2048 kB at 612 kB/s, Verified, Read 2048 kB at 2450 kB/s Buffer = 2 kB, Written 2048 kB at 655 kB/s, Verified, Read 2048 kB at 2447 kB/s Buffer = 2 kB, Written 2048 kB at 651 kB/s, Verified, Read 2048 kB at 2443 kB/s Buffer = 4 kB, Written 2048 kB at 919 kB/s, Verified, Read 2048 kB at 2912 kB/s Buffer = 4 kB, Written 2048 kB at 997 kB/s, Verified, Read 2048 kB at 2834 kB/s Buffer = 4 kB, Written 2048 kB at 999 kB/s, Verified, Read 2048 kB at 2908 kB/s Buffer = 4 kB, Written 2048 kB at 990 kB/s, Verified, Read 2048 kB at 2911 kB/s Buffer = 8 kB, Written 4096 kB at 1234 kB/s, Verified, Read 4096 kB at 3212 kB/s Buffer = 8 kB, Written 4096 kB at 1298 kB/s, Verified, Read 4096 kB at 3130 kB/s Buffer = 8 kB, Written 4096 kB at 1323 kB/s, Verified, Read 4096 kB at 3219 kB/s Buffer = 8 kB, Written 4096 kB at 1230 kB/s, Verified, Read 4096 kB at 3214 kB/s Buffer = 16 kB, Written 4096 kB at 1589 kB/s, Verified, Read 4096 kB at 3380 kB/s Buffer = 16 kB, Written 4096 kB at 1581 kB/s, Verified, Read 4096 kB at 3383 kB/s Buffer = 16 kB, Written 4096 kB at 1594 kB/s, Verified, Read 4096 kB at 3379 kB/s Buffer = 16 kB, Written 4096 kB at 1478 kB/s, Verified, Read 4096 kB at 3378 kB/s Buffer = 32 kB, Written 8192 kB at 2038 kB/s, Verified, Read 8192 kB at 3382 kB/s Buffer = 32 kB, Written 8192 kB at 1677 kB/s, Verified, Read 8192 kB at 3378 kB/s Buffer = 32 kB, Written 8192 kB at 2022 kB/s, Verified, Read 8192 kB at 3379 kB/s Buffer = 32 kB, Written 8192 kB at 1937 kB/s, Verified, Read 8192 kB at 3379 kB/s Buffer = 64 kB, Written 8192 kB at 2350 kB/s, Verified, Read 8192 kB at 3380 kB/s Buffer = 64 kB, Written 8192 kB at 2350 kB/s, Verified, Read 8192 kB at 3378 kB/s Buffer = 64 kB, Written 8192 kB at 2347 kB/s, Verified, Read 8192 kB at 3380 kB/s Buffer = 64 kB, Written 8192 kB at 2345 kB/s, Verified, Read 8192 kB at 3378 kB/s Buffer = 128 kB, Written 16384 kB at 2555 kB/s, Verified, Read 16384 kB at 3371 kB/s Buffer = 128 kB, Written 16384 kB at 2486 kB/s, Verified, Read 16384 kB at 3371 kB/s Buffer = 128 kB, Written 16384 kB at 2533 kB/s, Verified, Read 16384 kB at 3371 kB/s Buffer = 128 kB, Written 16384 kB at 2565 kB/s, Verified, Read 16384 kB at 3371 kB/s Buffer = 256 kB, Written 16384 kB at 2677 kB/s, Verified, Read 16384 kB at 3372 kB/s Buffer = 256 kB, Written 16384 kB at 2676 kB/s, Verified, Read 16384 kB at 3371 kB/s Buffer = 256 kB, Written 16384 kB at 2679 kB/s, Verified, Read 16384 kB at 3372 kB/s Buffer = 256 kB, Written 16384 kB at 2574 kB/s, Verified, Read 16384 kB at 3373 kB/s
@evanh impressive speeds. So, it's ready to test? Does one just drop those files you posted into Flexprop to use it?
Does the driver support a power enable pin?
As for 270 MHz, with the new driver. It barely affects write performance. Reads are down 15% maybe.
Sandisk Extreme 32 GB:
clkfreq = 270000000 clkmode = 0x1041afb Filesystem = fatfs, Driver = sdsdcard Clock divider for SD card is 3 (90 MHz) mount: OK Buffer = 2 kB, Written 2048 kB at 736 kB/s, Verified, Read 2048 kB at 5473 kB/s Buffer = 2 kB, Written 2048 kB at 814 kB/s, Verified, Read 2048 kB at 5954 kB/s Buffer = 2 kB, Written 2048 kB at 814 kB/s, Verified, Read 2048 kB at 5990 kB/s Buffer = 2 kB, Written 2048 kB at 753 kB/s, Verified, Read 2048 kB at 5996 kB/s Buffer = 4 kB, Written 2048 kB at 1473 kB/s, Verified, Read 2048 kB at 10225 kB/s Buffer = 4 kB, Written 2048 kB at 1456 kB/s, Verified, Read 2048 kB at 10375 kB/s Buffer = 4 kB, Written 2048 kB at 1257 kB/s, Verified, Read 2048 kB at 10277 kB/s Buffer = 4 kB, Written 2048 kB at 1425 kB/s, Verified, Read 2048 kB at 10149 kB/s Buffer = 8 kB, Written 4096 kB at 2237 kB/s, Verified, Read 4096 kB at 16406 kB/s Buffer = 8 kB, Written 4096 kB at 1262 kB/s, Verified, Read 4096 kB at 16523 kB/s Buffer = 8 kB, Written 4096 kB at 2237 kB/s, Verified, Read 4096 kB at 16251 kB/s Buffer = 8 kB, Written 4096 kB at 2224 kB/s, Verified, Read 4096 kB at 16375 kB/s Buffer = 16 kB, Written 4096 kB at 2659 kB/s, Verified, Read 4096 kB at 23517 kB/s Buffer = 16 kB, Written 4096 kB at 2963 kB/s, Verified, Read 4096 kB at 23622 kB/s Buffer = 16 kB, Written 4096 kB at 3049 kB/s, Verified, Read 4096 kB at 23307 kB/s Buffer = 16 kB, Written 4096 kB at 3024 kB/s, Verified, Read 4096 kB at 23495 kB/s Buffer = 32 kB, Written 8192 kB at 4205 kB/s, Verified, Read 8192 kB at 23583 kB/s Buffer = 32 kB, Written 8192 kB at 3033 kB/s, Verified, Read 8192 kB at 23411 kB/s Buffer = 32 kB, Written 8192 kB at 4730 kB/s, Verified, Read 8192 kB at 23561 kB/s Buffer = 32 kB, Written 8192 kB at 4287 kB/s, Verified, Read 8192 kB at 23471 kB/s Buffer = 64 kB, Written 8192 kB at 6326 kB/s, Verified, Read 8192 kB at 23566 kB/s Buffer = 64 kB, Written 8192 kB at 6382 kB/s, Verified, Read 8192 kB at 23476 kB/s Buffer = 64 kB, Written 8192 kB at 6268 kB/s, Verified, Read 8192 kB at 23438 kB/s Buffer = 64 kB, Written 8192 kB at 6002 kB/s, Verified, Read 8192 kB at 23441 kB/s Buffer = 128 kB, Written 16384 kB at 7243 kB/s, Verified, Read 16384 kB at 23200 kB/s Buffer = 128 kB, Written 16384 kB at 7092 kB/s, Verified, Read 16384 kB at 23128 kB/s Buffer = 128 kB, Written 16384 kB at 7662 kB/s, Verified, Read 16384 kB at 23197 kB/s Buffer = 128 kB, Written 16384 kB at 7663 kB/s, Verified, Read 16384 kB at 23201 kB/s Buffer = 256 kB, Written 16384 kB at 8927 kB/s, Verified, Read 16384 kB at 23128 kB/s Buffer = 256 kB, Written 16384 kB at 8915 kB/s, Verified, Read 16384 kB at 23229 kB/s Buffer = 256 kB, Written 16384 kB at 8957 kB/s, Verified, Read 16384 kB at 23211 kB/s Buffer = 256 kB, Written 16384 kB at 8976 kB/s, Verified, Read 16384 kB at 23214 kB/s
Read speeds flatline from 16 kB buffer size. Which tells me the cluster size is 16 kB.
Removing block read CRC processing and using sysclock/2 - Same story with 16 kB clusters:
clkfreq = 270000000 clkmode = 0x1041afb Filesystem = fatfs, Driver = sdsdcard Clock divider for SD card is 2 (135 MHz) mount: OK Buffer = 2 kB, Written 2048 kB at 750 kB/s, Verified, Read 2048 kB at 6597 kB/s Buffer = 2 kB, Written 2048 kB at 814 kB/s, Verified, Read 2048 kB at 6796 kB/s Buffer = 2 kB, Written 2048 kB at 742 kB/s, Verified, Read 2048 kB at 6527 kB/s Buffer = 2 kB, Written 2048 kB at 808 kB/s, Verified, Read 2048 kB at 6514 kB/s Buffer = 4 kB, Written 2048 kB at 1473 kB/s, Verified, Read 2048 kB at 12196 kB/s Buffer = 4 kB, Written 2048 kB at 1278 kB/s, Verified, Read 2048 kB at 11430 kB/s Buffer = 4 kB, Written 2048 kB at 1475 kB/s, Verified, Read 2048 kB at 11683 kB/s Buffer = 4 kB, Written 2048 kB at 1456 kB/s, Verified, Read 2048 kB at 11661 kB/s Buffer = 8 kB, Written 4096 kB at 2250 kB/s, Verified, Read 4096 kB at 19782 kB/s Buffer = 8 kB, Written 4096 kB at 2001 kB/s, Verified, Read 4096 kB at 19990 kB/s Buffer = 8 kB, Written 4096 kB at 2207 kB/s, Verified, Read 4096 kB at 19963 kB/s Buffer = 8 kB, Written 4096 kB at 1998 kB/s, Verified, Read 4096 kB at 19991 kB/s Buffer = 16 kB, Written 4096 kB at 2987 kB/s, Verified, Read 4096 kB at 29832 kB/s Buffer = 16 kB, Written 4096 kB at 3069 kB/s, Verified, Read 4096 kB at 30003 kB/s Buffer = 16 kB, Written 4096 kB at 3120 kB/s, Verified, Read 4096 kB at 29910 kB/s Buffer = 16 kB, Written 4096 kB at 3053 kB/s, Verified, Read 4096 kB at 29982 kB/s Buffer = 32 kB, Written 8192 kB at 2089 kB/s, Verified, Read 8192 kB at 28451 kB/s Buffer = 32 kB, Written 8192 kB at 4682 kB/s, Verified, Read 8192 kB at 30135 kB/s Buffer = 32 kB, Written 8192 kB at 3951 kB/s, Verified, Read 8192 kB at 29978 kB/s Buffer = 32 kB, Written 8192 kB at 3963 kB/s, Verified, Read 8192 kB at 30280 kB/s Buffer = 64 kB, Written 8192 kB at 6363 kB/s, Verified, Read 8192 kB at 28573 kB/s Buffer = 64 kB, Written 8192 kB at 6290 kB/s, Verified, Read 8192 kB at 29972 kB/s Buffer = 64 kB, Written 8192 kB at 6343 kB/s, Verified, Read 8192 kB at 28627 kB/s Buffer = 64 kB, Written 8192 kB at 5380 kB/s, Verified, Read 8192 kB at 30158 kB/s Buffer = 128 kB, Written 16384 kB at 7736 kB/s, Verified, Read 16384 kB at 29431 kB/s Buffer = 128 kB, Written 16384 kB at 7803 kB/s, Verified, Read 16384 kB at 29510 kB/s Buffer = 128 kB, Written 16384 kB at 7903 kB/s, Verified, Read 16384 kB at 29562 kB/s Buffer = 128 kB, Written 16384 kB at 7866 kB/s, Verified, Read 16384 kB at 29510 kB/s Buffer = 256 kB, Written 16384 kB at 8052 kB/s, Verified, Read 16384 kB at 29511 kB/s Buffer = 256 kB, Written 16384 kB at 7875 kB/s, Verified, Read 16384 kB at 29538 kB/s Buffer = 256 kB, Written 16384 kB at 8954 kB/s, Verified, Read 16384 kB at 29519 kB/s Buffer = 256 kB, Written 16384 kB at 8919 kB/s, Verified, Read 16384 kB at 29517 kB/s
Yep, and yes. But you will need the accessory board that has 4-bit bus width. Roger designed the board, so Roger and I are the only two that have it.
Since the SPI wired bootable SD slot has that EEPROM clash limiting resistor it can't be clocked at extreme rates, so I've not bothered to support 1-bit mode. Although it certainly could be done in the future. There might be some speed-up since the new driver is using streamer ops for everything, which can operate at sysclock/2 even at high sysclocks. At high sysclocks the SPI driver steps down to sysclock/8.
Yes. Here's a report from the development code:
Card detected ... power cycle of SD card power-down threshold = 37 pin state = 1 power-down slope = 34914 us pin state = 0 power-up threshold = 209 pin state = 0 power-up slope = 1128 us pin state = 1
It's using the DAC-comparator pin mode to set each threshold (0.5 V low and 2.7 V high) and then measure how long it takes the power rail to transition. I take 32 samples spread across 1.0 ms, unanimous voting like the debounce circuit, to provide the spec'd hold time before state change. Voltage measuring is at the CLK pin. All I/O pins each have a 22 kR pull-up soldered on the accessory board.
EDIT: Removed one repeated sentence for clarity.
I'm guessing Von is planning on making 4-bit versions for Parallax to sell.
Now that it's proven, I guess it wouldn't be such a huge effort to port the driver to Spin2 and place it in the Obex. I doubt I would have made it this far without printf() for debug. It's not just the ease of formatting the prints either, scrollable terminal history is a massive sanity saver.
Another thing I do now is, not unlike how Chip works, I have four separate editor windows open, tiled across the width of a single 43" 4k monitor. Each has multiple tabs that are mixtures of the files I'm examining. I can have all windows showing different parts of the one file if I like. None are full height. I have multiple terminals open below, and a multi-tabbed file manager, are fixtures.
It's not a specific IDE but it looks like one.
@evanh Awesome! I have to try it out. See how sensitive it is to trace lengths and all...
Wonder how hard would be to make this work with FSRW...
Maybe just replace the block driver with this one and it's done?
Just looked at it and see it's all in C with inline PASM2.
Thought it might be spin2 that was converted with spin2cpp, but guess not.
printf() was a critical part of development.
Yeah, interfacing to FSRW should be next step. Can you provide the newest edition of that? I know it had lots of contributions over the years and many were broken.
The latest might be here?
https://forums.parallax.com/discussion/173378/the-actually-functional-spin2-sd-driver-i-hope-kyefat-sdspi-with-audio/p2
Guess needs a better maintainer...
Also see a note where @ke4pjw wanted SD power to cycle with reset. Think that would be good enough?
Or, needs a dedicated pin?
It's an open design shared earlier in this thread, and with any luck Parallax may run with a design based on it, or at least try to share the same pin mapping/functionality. It makes good use of the 8 pins of a P2 accessory module and evanh's driver obviously uses this pin mapping now too - so it's the "standard" right now. But anyone else making their own larger P2 boards can just use the same type of circuit and evanh's code should work at speed if the layout is not too crazy. The little pFET I just had in my spare parts bin could be changed to something else easier to solder if needed it just needs a reasonably low Rds to not drop voltage too much at SD card currents. Or it could be controlled by a regulator from 5V down to 3.3V. The card detect trick is handy too to share the pin and ensures power gets shut off when a card is ejected - also handy for controlling regulator use in low powered setups.
Ah, oh, I've found something - The filesystem is requesting a SYNC after each cluster is written. That's excessively pedantic behaviour, imho. I never imagined SYNC was even being used, to be honest. Maybe once upon file close seemed a reasonable place. Calling SYNC makes no diff to the SD card. It'll store the sent data just the same either way.
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this.
... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
Doesn't help at the moment. The problem is those singles, they kill any chance of performance because they break up the contiguous order. I just tried, via the driver, concatenating the separate but contiguous cluster writes into one CMD25. And got it working seemingly reliably too. It made a difference, but only for buffers larger than the cluster size. So kind of annoying.
Eg: Samsung, using 256 kB buffer, went from 5 MB/s to 9.5 MB/s writing a 16 MB file.
Yeah, that's the sort of thing that needs addressed for sure.
going to try it with this very soon...
Never seen atexit() before... Interesting...
Nice. Are there any series resistors underneath the board, or were they skipped? Will be interesting to see what performance can be gained if you did omit them.
Where is the filesystem code that controls all this exactly? Is it part of flexspin source tree? I'd like to see the C code if it's published somewhere.
It'll be there in that same directory as the driver file. I copied it all into another directory because it needed the top level API altered to get the extra pins assigned. Presumably
ff.c
is the actual filesystem source since it's the big file.If you wanted to have both the new and old drivers for two SD cards operating at once then there would presumably be two complete copies of the filesystem compiled in as well.
Another temporary hack by me really.
EDIT: Taking a peek, it's quite the beast I see. Supports long file names, TRIMming, and ExFAT too.
The Unicode stuff seems overkill. That's going to just be for filenames.