First thing to note is that while read performance is okay, write performance is way behind what the raw speed was doing in driver development. That's something I expect can be improved on as the filesystem code gets reworked.
Here's one of the better examples using the Sandisk Extreme 32 GB (2017). First the development program run (Using only an 8 kByte buffer size):
Hmm, maybe it's choking on lots of random sector seeks for updating the file system's FAT tables etc, although a lot of that activity should mostly happen once at the start of a large file write if the disk isn't fragmented and uses large cluster size. The cluster chain needs to be updated, presumably done as the file is being written in case of sudden loss. It'd be interesting to see the list of sectors being accessed by the filesystem. A timestamped debug log history in your driver might be needed to see this.
@evanh said:
I could do it real simple just by printing each start block number. There's no need for performance when mapping ...
Yeah then you'll know what the underlying file system is doing between all the sector transfers etc. Might be good to timestamp still, just for seeing relative gaps in calls etc. Absolute time is not so important.
The ones with +10 (16 block, 8 kB cluster size) are multi-block bursts. So clusters don't look to be grouped into a larger burst. Which means the larger buffer in the program isn't doing much good.
Looks like it's interleaving writes between two sector groups at the start. Maybe this part is updating the two FAT tables or something in preparation for the real write. Once the writes start after you print Buffer=256kB then it doesn't seem to touch the other sectors for a bit at least until the end, so this should be pretty fast. Are there gaps between each slowing things down I wonder. You probably want to log the timestamps in real time into a buffer for printing later. Then any gaps between calls will be noticed.
@rogloh said:
Looks like it's interleaving writes between two sector groups at the start. Maybe this part is updating the two FAT tables or something in preparation for the real write.
That'd be right. The fopen() is before it prints Buffer = 256 kB.
Once the writes start after you print Buffer=256kB then it doesn't seem to touch the other sectors for a bit at least until the end, so this should be pretty fast. Are there gaps between each slowing things down I wonder. You probably want to log the timestamps in real time into a buffer for printing later. Then any gaps between calls will be noticed.
Actually, funnily, although a long way short of the raw performance, it's quite fast at writing with 256 kB buffer. The slow ones are when using smaller buffer sizes.
Oh, oops, that block list is not 16 MB at all. It's only 512 kB, 2 x 256 kB.
If the reads were not being printed, then they might also be happening between some of the write calls as well if the filesystem keeps checking the table (hopefully it's not dumb and repeatedly reading the same thing). You'd expect to see some reads happening when the file is initially opened.
For comparison, using the SPI driver with the Samsung EVO 128 GB card:
clkfreq = 360000000 clkmode = 0x10011fb
Filesystem = fatfs, Driver = sdcardx
mount: OK
Buffer = 2 kB, Written 2048 kB at 299 kB/s, Verified, Read 2048 kB at 2441 kB/s
Buffer = 2 kB, Written 2048 kB at 289 kB/s, Verified, Read 2048 kB at 2277 kB/s
Buffer = 2 kB, Written 2048 kB at 289 kB/s, Verified, Read 2048 kB at 2461 kB/s
Buffer = 2 kB, Written 2048 kB at 291 kB/s, Verified, Read 2048 kB at 2286 kB/s
Buffer = 4 kB, Written 2048 kB at 496 kB/s, Verified, Read 2048 kB at 2873 kB/s
Buffer = 4 kB, Written 2048 kB at 499 kB/s, Verified, Read 2048 kB at 2877 kB/s
Buffer = 4 kB, Written 2048 kB at 494 kB/s, Verified, Read 2048 kB at 2855 kB/s
Buffer = 4 kB, Written 2048 kB at 498 kB/s, Verified, Read 2048 kB at 2785 kB/s
Buffer = 8 kB, Written 4096 kB at 736 kB/s, Verified, Read 4096 kB at 3210 kB/s
Buffer = 8 kB, Written 4096 kB at 735 kB/s, Verified, Read 4096 kB at 3195 kB/s
Buffer = 8 kB, Written 4096 kB at 731 kB/s, Verified, Read 4096 kB at 3216 kB/s
Buffer = 8 kB, Written 4096 kB at 722 kB/s, Verified, Read 4096 kB at 3199 kB/s
Buffer = 16 kB, Written 4096 kB at 960 kB/s, Verified, Read 4096 kB at 3383 kB/s
Buffer = 16 kB, Written 4096 kB at 963 kB/s, Verified, Read 4096 kB at 3375 kB/s
Buffer = 16 kB, Written 4096 kB at 963 kB/s, Verified, Read 4096 kB at 3367 kB/s
Buffer = 16 kB, Written 4096 kB at 961 kB/s, Verified, Read 4096 kB at 3382 kB/s
Buffer = 32 kB, Written 8192 kB at 1140 kB/s, Verified, Read 8192 kB at 3460 kB/s
Buffer = 32 kB, Written 8192 kB at 1157 kB/s, Verified, Read 8192 kB at 3466 kB/s
Buffer = 32 kB, Written 8192 kB at 1187 kB/s, Verified, Read 8192 kB at 3468 kB/s
Buffer = 32 kB, Written 8192 kB at 1257 kB/s, Verified, Read 8192 kB at 3461 kB/s
Buffer = 64 kB, Written 8192 kB at 1653 kB/s, Verified, Read 8192 kB at 3468 kB/s
Buffer = 64 kB, Written 8192 kB at 1653 kB/s, Verified, Read 8192 kB at 3459 kB/s
Buffer = 64 kB, Written 8192 kB at 1606 kB/s, Verified, Read 8192 kB at 3466 kB/s
Buffer = 64 kB, Written 8192 kB at 1681 kB/s, Verified, Read 8192 kB at 3470 kB/s
Buffer = 128 kB, Written 16384 kB at 1980 kB/s, Verified, Read 16384 kB at 3469 kB/s
Buffer = 128 kB, Written 16384 kB at 2033 kB/s, Verified, Read 16384 kB at 3468 kB/s
Buffer = 128 kB, Written 16384 kB at 2029 kB/s, Verified, Read 16384 kB at 3469 kB/s
Buffer = 128 kB, Written 16384 kB at 2034 kB/s, Verified, Read 16384 kB at 3469 kB/s
Buffer = 256 kB, Written 16384 kB at 2270 kB/s, Verified, Read 16384 kB at 3470 kB/s
Buffer = 256 kB, Written 16384 kB at 2290 kB/s, Verified, Read 16384 kB at 3470 kB/s
Buffer = 256 kB, Written 16384 kB at 2267 kB/s, Verified, Read 16384 kB at 3470 kB/s
Buffer = 256 kB, Written 16384 kB at 2282 kB/s, Verified, Read 16384 kB at 3469 kB/s
Write speeds are maybe 50% up in the newer driver, double speed with large buffer. Quite dismal considering I was claiming 10x uplift from the raw mode testing. Read speeds are something like 8x so not all gloom.
Sandisk Extreme 32GB using SPI driver is a little better on the write speeds. Starts +50% but moves above 3x at larger buffer sizes. Reads are a solid 8x all round. The Sandisk will be fighting a smaller cluster size too.
clkfreq = 360000000 clkmode = 0x10011fb
Filesystem = fatfs, Driver = sdcardx
mount: OK
Buffer = 2 kB, Written 2048 kB at 645 kB/s, Verified, Read 2048 kB at 2410 kB/s
Buffer = 2 kB, Written 2048 kB at 612 kB/s, Verified, Read 2048 kB at 2450 kB/s
Buffer = 2 kB, Written 2048 kB at 655 kB/s, Verified, Read 2048 kB at 2447 kB/s
Buffer = 2 kB, Written 2048 kB at 651 kB/s, Verified, Read 2048 kB at 2443 kB/s
Buffer = 4 kB, Written 2048 kB at 919 kB/s, Verified, Read 2048 kB at 2912 kB/s
Buffer = 4 kB, Written 2048 kB at 997 kB/s, Verified, Read 2048 kB at 2834 kB/s
Buffer = 4 kB, Written 2048 kB at 999 kB/s, Verified, Read 2048 kB at 2908 kB/s
Buffer = 4 kB, Written 2048 kB at 990 kB/s, Verified, Read 2048 kB at 2911 kB/s
Buffer = 8 kB, Written 4096 kB at 1234 kB/s, Verified, Read 4096 kB at 3212 kB/s
Buffer = 8 kB, Written 4096 kB at 1298 kB/s, Verified, Read 4096 kB at 3130 kB/s
Buffer = 8 kB, Written 4096 kB at 1323 kB/s, Verified, Read 4096 kB at 3219 kB/s
Buffer = 8 kB, Written 4096 kB at 1230 kB/s, Verified, Read 4096 kB at 3214 kB/s
Buffer = 16 kB, Written 4096 kB at 1589 kB/s, Verified, Read 4096 kB at 3380 kB/s
Buffer = 16 kB, Written 4096 kB at 1581 kB/s, Verified, Read 4096 kB at 3383 kB/s
Buffer = 16 kB, Written 4096 kB at 1594 kB/s, Verified, Read 4096 kB at 3379 kB/s
Buffer = 16 kB, Written 4096 kB at 1478 kB/s, Verified, Read 4096 kB at 3378 kB/s
Buffer = 32 kB, Written 8192 kB at 2038 kB/s, Verified, Read 8192 kB at 3382 kB/s
Buffer = 32 kB, Written 8192 kB at 1677 kB/s, Verified, Read 8192 kB at 3378 kB/s
Buffer = 32 kB, Written 8192 kB at 2022 kB/s, Verified, Read 8192 kB at 3379 kB/s
Buffer = 32 kB, Written 8192 kB at 1937 kB/s, Verified, Read 8192 kB at 3379 kB/s
Buffer = 64 kB, Written 8192 kB at 2350 kB/s, Verified, Read 8192 kB at 3380 kB/s
Buffer = 64 kB, Written 8192 kB at 2350 kB/s, Verified, Read 8192 kB at 3378 kB/s
Buffer = 64 kB, Written 8192 kB at 2347 kB/s, Verified, Read 8192 kB at 3380 kB/s
Buffer = 64 kB, Written 8192 kB at 2345 kB/s, Verified, Read 8192 kB at 3378 kB/s
Buffer = 128 kB, Written 16384 kB at 2555 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 128 kB, Written 16384 kB at 2486 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 128 kB, Written 16384 kB at 2533 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 128 kB, Written 16384 kB at 2565 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 256 kB, Written 16384 kB at 2677 kB/s, Verified, Read 16384 kB at 3372 kB/s
Buffer = 256 kB, Written 16384 kB at 2676 kB/s, Verified, Read 16384 kB at 3371 kB/s
Buffer = 256 kB, Written 16384 kB at 2679 kB/s, Verified, Read 16384 kB at 3372 kB/s
Buffer = 256 kB, Written 16384 kB at 2574 kB/s, Verified, Read 16384 kB at 3373 kB/s
@Rayman said:
@evanh impressive speeds. So, it's ready to test? Does one just drop those files you posted into Flexprop to use it?
Yep, and yes. But you will need the accessory board that has 4-bit bus width. Roger designed the board, so Roger and I are the only two that have it.
Since the SPI wired bootable SD slot has that EEPROM clash limiting resistor it can't be clocked at extreme rates, so I've not bothered to support 1-bit mode. Although it certainly could be done in the future. There might be some speed-up since the new driver is using streamer ops for everything, which can operate at sysclock/2 even at high sysclocks. At high sysclocks the SPI driver steps down to sysclock/8.
Does the driver support a power enable pin?
Yes. Here's a report from the development code:
Card detected ... power cycle of SD card
power-down threshold = 37 pin state = 1
power-down slope = 34914 us pin state = 0
power-up threshold = 209 pin state = 0
power-up slope = 1128 us pin state = 1
It's using the DAC-comparator pin mode to set each threshold (0.5 V low and 2.7 V high) and then measure how long it takes the power rail to transition. I take 32 samples spread across 1.0 ms, unanimous voting like the debounce circuit, to provide the spec'd hold time before state change. Voltage measuring is at the CLK pin. All I/O pins each have a 22 kR pull-up soldered on the accessory board.
I'm guessing Von is planning on making 4-bit versions for Parallax to sell.
Now that it's proven, I guess it wouldn't be such a huge effort to port the driver to Spin2 and place it in the Obex. I doubt I would have made it this far without printf() for debug. It's not just the ease of formatting the prints either, scrollable terminal history is a massive sanity saver.
Another thing I do now is, not unlike how Chip works, I have four separate editor windows open, tiled across the width of a single 43" 4k monitor. Each has multiple tabs that are mixtures of the files I'm examining. I can have all windows showing different parts of the one file if I like. None are full height. I have multiple terminals open below, and a multi-tabbed file manager, are fixtures.
@Rayman said:
Just looked at it and see it's all in C with inline PASM2.
Thought it might be spin2 that was converted with spin2cpp, but guess not.
printf() was a critical part of development.
Yeah, interfacing to FSRW should be next step. Can you provide the newest edition of that? I know it had lots of contributions over the years and many were broken.
Guess needs a better maintainer...
Also see a note where @ke4pjw wanted SD power to cycle with reset. Think that would be good enough?
Or, needs a dedicated pin?
@Rayman said:
@evanh impressive speeds. So, it's ready to test? Does one just drop those files you posted into Flexprop to use it?
Yep, and yes. But you will need the accessory board that has 4-bit bus width. Roger designed the board, so Roger and I are the only two that have it.
It's an open design shared earlier in this thread, and with any luck Parallax may run with a design based on it, or at least try to share the same pin mapping/functionality. It makes good use of the 8 pins of a P2 accessory module and evanh's driver obviously uses this pin mapping now too - so it's the "standard" right now. But anyone else making their own larger P2 boards can just use the same type of circuit and evanh's code should work at speed if the layout is not too crazy. The little pFET I just had in my spare parts bin could be changed to something else easier to solder if needed it just needs a reasonably low Rds to not drop voltage too much at SD card currents. Or it could be controlled by a regulator from 5V down to 3.3V. The card detect trick is handy too to share the pin and ensures power gets shut off when a card is ejected - also handy for controlling regulator use in low powered setups.
Ah, oh, I've found something - The filesystem is requesting a SYNC after each cluster is written. That's excessively pedantic behaviour, imho. I never imagined SYNC was even being used, to be honest. Maybe once upon file close seemed a reasonable place. Calling SYNC makes no diff to the SD card. It'll store the sent data just the same either way.
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this. ... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
@rogloh said:
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Doesn't help at the moment. The problem is those singles, they kill any chance of performance because they break up the contiguous order. I just tried, via the driver, concatenating the separate but contiguous cluster writes into one CMD25. And got it working seemingly reliably too. It made a difference, but only for buffers larger than the cluster size. So kind of annoying.
Eg: Samsung, using 256 kB buffer, went from 5 MB/s to 9.5 MB/s writing a 16 MB file.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this. ... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
Yeah, that's the sort of thing that needs addressed for sure.
@Rayman said:
going to try it with this very soon...
Nice. Are there any series resistors underneath the board, or were they skipped? Will be interesting to see what performance can be gained if you did omit them.
@rogloh said:
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Doesn't help at the moment. The problem is those singles, they kill any chance of performance because they break up the contiguous order. I just tried, via the driver, concatenating the separate but contiguous cluster writes into one CMD25. And got it working seemingly reliably too. It made a difference, but only for buffers larger than the cluster size. So kind of annoying.
Eg: Samsung, using 256 kB buffer, went from 5 MB/s to 9.5 MB/s writing a 16 MB file.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this. ... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
Yeah, that's the sort of thing that needs addressed for sure.
Where is the filesystem code that controls all this exactly? Is it part of flexspin source tree? I'd like to see the C code if it's published somewhere.
@rogloh said:
Where is the filesystem code that controls all this exactly? Is it part of flexspin source tree? I'd like to see the C code if it's published somewhere.
It'll be there in that same directory as the driver file. I copied it all into another directory because it needed the top level API altered to get the extra pins assigned. Presumably ff.c is the actual filesystem source since it's the big file.
If you wanted to have both the new and old drivers for two SD cards operating at once then there would presumably be two complete copies of the filesystem compiled in as well.
Another temporary hack by me really.
EDIT: Taking a peek, it's quite the beast I see. Supports long file names, TRIMming, and ExFAT too.
The Unicode stuff seems overkill. That's going to just be for filenames.
Comments
First thing to note is that while read performance is okay, write performance is way behind what the raw speed was doing in driver development. That's something I expect can be improved on as the filesystem code gets reworked.
Here's one of the better examples using the Sandisk Extreme 32 GB (2017). First the development program run (Using only an 8 kByte buffer size):
Then the filesystem tester run:
And a lesser example using the Samsung EVO 128 GB (2023):
I just read this in your results:
So is this being tested at 360MHz? I'll be running around 297MHz or 270MHz I guess. Wonder what I can expect to see there.
Point is the filesystem code is showing its weaknesses now that we have a more performant driver.
I haven't tried looking into how it all works so aren't able to give any reason though.
Hmm, maybe it's choking on lots of random sector seeks for updating the file system's FAT tables etc, although a lot of that activity should mostly happen once at the start of a large file write if the disk isn't fragmented and uses large cluster size. The cluster chain needs to be updated, presumably done as the file is being written in case of sudden loss. It'd be interesting to see the list of sectors being accessed by the filesystem. A timestamped debug log history in your driver might be needed to see this.
I could do it real simple just by printing each start block number. There's no need for performance when mapping ...
Yeah then you'll know what the underlying file system is doing between all the sector transfers etc. Might be good to timestamp still, just for seeing relative gaps in calls etc. Absolute time is not so important.
Here's just the writes alone - for writing one 512 kB file, using a 256 kB buffer, to a 16 GB card:
The ones with
+10
(16 block, 8 kB cluster size) are multi-block bursts. So clusters don't look to be grouped into a larger burst. Which means the larger buffer in the program isn't doing much good.Looks like it's interleaving writes between two sector groups at the start. Maybe this part is updating the two FAT tables or something in preparation for the real write. Once the writes start after you print Buffer=256kB then it doesn't seem to touch the other sectors for a bit at least until the end, so this should be pretty fast. Are there gaps between each slowing things down I wonder. You probably want to log the timestamps in real time into a buffer for printing later. Then any gaps between calls will be noticed.
That'd be right. The
fopen()
is before it printsBuffer = 256 kB
.Actually, funnily, although a long way short of the raw performance, it's quite fast at writing with 256 kB buffer. The slow ones are when using smaller buffer sizes.
Oh, oops, that block list is not 16 MB at all. It's only 512 kB, 2 x 256 kB.
If the reads were not being printed, then they might also be happening between some of the write calls as well if the filesystem keeps checking the table (hopefully it's not dumb and repeatedly reading the same thing). You'd expect to see some reads happening when the file is initially opened.
That card above was a different one. Here's the same test again using the Samsung 128 GB card. You can see the cluster size is 4x larger.
Another thing that becomes more obvious here is there is four single block writes between each 256 kB buffer written.
EDIT: Same again but with reads now reported as well:
For comparison, using the SPI driver with the Samsung EVO 128 GB card:
Write speeds are maybe 50% up in the newer driver, double speed with large buffer. Quite dismal considering I was claiming 10x uplift from the raw mode testing. Read speeds are something like 8x so not all gloom.
Sandisk Extreme 32GB using SPI driver is a little better on the write speeds. Starts +50% but moves above 3x at larger buffer sizes. Reads are a solid 8x all round. The Sandisk will be fighting a smaller cluster size too.
@evanh impressive speeds. So, it's ready to test? Does one just drop those files you posted into Flexprop to use it?
Does the driver support a power enable pin?
As for 270 MHz, with the new driver. It barely affects write performance. Reads are down 15% maybe.
Sandisk Extreme 32 GB:
Read speeds flatline from 16 kB buffer size. Which tells me the cluster size is 16 kB.
Removing block read CRC processing and using sysclock/2 - Same story with 16 kB clusters:
Yep, and yes. But you will need the accessory board that has 4-bit bus width. Roger designed the board, so Roger and I are the only two that have it.
Since the SPI wired bootable SD slot has that EEPROM clash limiting resistor it can't be clocked at extreme rates, so I've not bothered to support 1-bit mode. Although it certainly could be done in the future. There might be some speed-up since the new driver is using streamer ops for everything, which can operate at sysclock/2 even at high sysclocks. At high sysclocks the SPI driver steps down to sysclock/8.
Yes. Here's a report from the development code:
It's using the DAC-comparator pin mode to set each threshold (0.5 V low and 2.7 V high) and then measure how long it takes the power rail to transition. I take 32 samples spread across 1.0 ms, unanimous voting like the debounce circuit, to provide the spec'd hold time before state change. Voltage measuring is at the CLK pin. All I/O pins each have a 22 kR pull-up soldered on the accessory board.
EDIT: Removed one repeated sentence for clarity.
I'm guessing Von is planning on making 4-bit versions for Parallax to sell.
Now that it's proven, I guess it wouldn't be such a huge effort to port the driver to Spin2 and place it in the Obex. I doubt I would have made it this far without printf() for debug. It's not just the ease of formatting the prints either, scrollable terminal history is a massive sanity saver.
Another thing I do now is, not unlike how Chip works, I have four separate editor windows open, tiled across the width of a single 43" 4k monitor. Each has multiple tabs that are mixtures of the files I'm examining. I can have all windows showing different parts of the one file if I like. None are full height. I have multiple terminals open below, and a multi-tabbed file manager, are fixtures.
It's not a specific IDE but it looks like one.
@evanh Awesome! I have to try it out. See how sensitive it is to trace lengths and all...
Wonder how hard would be to make this work with FSRW...
Maybe just replace the block driver with this one and it's done?
Just looked at it and see it's all in C with inline PASM2.
Thought it might be spin2 that was converted with spin2cpp, but guess not.
printf() was a critical part of development.
Yeah, interfacing to FSRW should be next step. Can you provide the newest edition of that? I know it had lots of contributions over the years and many were broken.
The latest might be here?
https://forums.parallax.com/discussion/173378/the-actually-functional-spin2-sd-driver-i-hope-kyefat-sdspi-with-audio/p2
Guess needs a better maintainer...
Also see a note where @ke4pjw wanted SD power to cycle with reset. Think that would be good enough?
Or, needs a dedicated pin?
It's an open design shared earlier in this thread, and with any luck Parallax may run with a design based on it, or at least try to share the same pin mapping/functionality. It makes good use of the 8 pins of a P2 accessory module and evanh's driver obviously uses this pin mapping now too - so it's the "standard" right now. But anyone else making their own larger P2 boards can just use the same type of circuit and evanh's code should work at speed if the layout is not too crazy. The little pFET I just had in my spare parts bin could be changed to something else easier to solder if needed it just needs a reasonably low Rds to not drop voltage too much at SD card currents. Or it could be controlled by a regulator from 5V down to 3.3V. The card detect trick is handy too to share the pin and ensures power gets shut off when a card is ejected - also handy for controlling regulator use in low powered setups.
Ah, oh, I've found something - The filesystem is requesting a SYNC after each cluster is written. That's excessively pedantic behaviour, imho. I never imagined SYNC was even being used, to be honest. Maybe once upon file close seemed a reasonable place. Calling SYNC makes no diff to the SD card. It'll store the sent data just the same either way.
If you return early by skipping that request does it speed it up much? Also with those extra interleaved single sector reads/writes between bursts of 256kB I wonder how much delay that contributes and how much of the bandwidth improvements are reduced.
Interestingly you can see it write to a sector "84b", then it reads it again between the multi-sector bursts. What has changed it in the meantime that it needs to read it again? Some buffer limitation in the filesystem storage? Ideally it could cache this.
... WR25940+40 WR84b WR7fcb RDf740 WRf740 WR801 RD84b WR25980+40 ...
Doesn't help at the moment. The problem is those singles, they kill any chance of performance because they break up the contiguous order. I just tried, via the driver, concatenating the separate but contiguous cluster writes into one CMD25. And got it working seemingly reliably too. It made a difference, but only for buffers larger than the cluster size. So kind of annoying.
Eg: Samsung, using 256 kB buffer, went from 5 MB/s to 9.5 MB/s writing a 16 MB file.
Yeah, that's the sort of thing that needs addressed for sure.
going to try it with this very soon...
Never seen atexit() before... Interesting...
Nice. Are there any series resistors underneath the board, or were they skipped? Will be interesting to see what performance can be gained if you did omit them.
Where is the filesystem code that controls all this exactly? Is it part of flexspin source tree? I'd like to see the C code if it's published somewhere.
It'll be there in that same directory as the driver file. I copied it all into another directory because it needed the top level API altered to get the extra pins assigned. Presumably
ff.c
is the actual filesystem source since it's the big file.If you wanted to have both the new and old drivers for two SD cards operating at once then there would presumably be two complete copies of the filesystem compiled in as well.
Another temporary hack by me really.
EDIT: Taking a peek, it's quite the beast I see. Supports long file names, TRIMming, and ExFAT too.
The Unicode stuff seems overkill. That's going to just be for filenames.