@evanh said:
Rayman,
Turn on SD_DEBUG_PERFORMANCE so we can see if any blocks are successful. It looks like nothing is working because it can't even read the MBR.
While you're at it, uncomment the three ACMD13 lines below in the driver:
// Support for caching is not yet implemented, but sounds a promising approachsend_acmd(13, 0, resp);
rx_datablocks(buff, 1, timeout, resp); // data length is 64 bytes, CRC will fail// __builtin_printf(" ACMD13 - ");// for( tmr = 0; tmr <= 63; tmr++ )// __builtin_printf(" %02x", buff[tmr]);__builtin_printf("Cache (A2 extension) supported = ");
That'll give us a little peek at read data content.
@evanh said:
Rayman,
Turn on SD_DEBUG_PERFORMANCE so we can see if any blocks are successful. It looks like nothing is working because it can't even read the MBR.
While you're at it, uncomment the three ACMD13 lines below in the driver:
...
That'll give us a little peek at read data content.
Or you could write a program to dump the content of some known data blocks like the MBR.
If we're suspicious of filesystem mismanagement then it's easy enough to swap to the old SPI driver and associated filesystem while still using the same card socket.
In the tester program, edit the mountsd() function. Comment out the three lines pertaining to _vfs_open_sdsdcard() and uncomment the two lines pertaining to _vfs_open_sdcardx().
You probably also need to edit the pin enums at the top for PIN_DI, PIN_DO and PIN_CS.
CS is same pin as DAT3
DO is same pin as DAT0
DI is same pin as CMD
It works for me now. My cards all work with both drivers. Reformatting isn't going to make it not work.
Give the SPI driver a run. It should work for you. EDIT: You could probably comment out all the speed tests except one. The SPI driver will be a lot slower. Wow, no, at 300 MHz sysclock, it's close at writes until the buffer size gets large.
In the past there has been a difference between the compiler on Windoze and the compiler on Linux. The bug was found by comparing the user compiled binaries and .lst files. ie: doing a binary compare of sdfat-speedtest.binary of both mine and yours.
If the SPI driver doesn't help, okay setup Ubuntu then. I'm on Kubuntu 24.04, btw.
A few months back I moved from Kubuntu 20.04. Interestingly, I installed as a minimal desktop, so lots of stuff doesn't get pre-installed then. A little bit surprisingly that includes no compilers. It wasn't any big deal though. Just had to add GCC and Make/Bison for building Flexspin. Don't remember having to add Git though. Maybe I added that earlier and forgot.
Hmm, well, no luck with CMD48. The A2 cards just aren't responding to it at all.
CMD48 issued while in Transfer State. Card isn't busy. Moving on to another command afterward is no problem.
EDIT: Err, it does mess up the subsequent command ...
EDIT2: Oh, now that's weird, I am getting something as long as I ignore the timeout on the command-response packet. I have no idea what it is yet:
EDIT3: That's the Sandisk. The second A2 card, the Samsung, is responding but isn't giving anything meaningful. Other cards timeout on the data block as well as the command-response.
Doh! There was a valid R1 response all along. I'd just bugged the check logic and didn't bother to verify it. I got the scope out this morning and did exactly that and only then realised my mistake. Was too tired again I guess.
Okay, I think I'm getting it slowly. The entirety of Extension Function #0 looks to be just a description of what is contained in the subsequent functions.
Assuming Function's 1 and 2 are always going to be the same predetermined Power Management Function (PMF) and Performance Enhancement Function (PEF) structures respectively, Function 0 can probably be ignored. Which could explain why the Samsung card hasn't filled out its Function 0.
All the extra pages per function look to be just that, spare storage for that Extension Function should it desire it.
Got it going I think. Am able to set it without error now. And I get a performance change from the Samsung card. Sadly, that change is for the worse. The Sandisk is unaffected performance wise.
Samsung EVO 128 GB without the cache extension:
clkfreq = 360000000 clkmode = 0x10011fb
Filesystem = fatfs, Driver = sdsdcard
mount sd: OK
Buffer = 2 kB, Written 2048 kB at 693 kB/s, Verified, Read 2048 kB at 6370 kB/s
Buffer = 2 kB, Written 2048 kB at 808 kB/s, Verified, Read 2048 kB at 6848 kB/s
Buffer = 2 kB, Written 2048 kB at 733 kB/s, Verified, Read 2048 kB at 6655 kB/s
Buffer = 2 kB, Written 2048 kB at 718 kB/s, Verified, Read 2048 kB at 6429 kB/s
Buffer = 4 kB, Written 2048 kB at 1476 kB/s, Verified, Read 2048 kB at 11091 kB/s
Buffer = 4 kB, Written 2048 kB at 1481 kB/s, Verified, Read 2048 kB at 10820 kB/s
Buffer = 4 kB, Written 2048 kB at 1485 kB/s, Verified, Read 2048 kB at 10606 kB/s
Buffer = 4 kB, Written 2048 kB at 1483 kB/s, Verified, Read 2048 kB at 10256 kB/s
Buffer = 8 kB, Written 4096 kB at 2825 kB/s, Verified, Read 4096 kB at 16941 kB/s
Buffer = 8 kB, Written 4096 kB at 2523 kB/s, Verified, Read 4096 kB at 14640 kB/s
Buffer = 8 kB, Written 4096 kB at 2896 kB/s, Verified, Read 4096 kB at 15173 kB/s
Buffer = 8 kB, Written 4096 kB at 2591 kB/s, Verified, Read 4096 kB at 12146 kB/s
Buffer = 16 kB, Written 4096 kB at 5116 kB/s, Verified, Read 4096 kB at 25527 kB/s
Buffer = 16 kB, Written 4096 kB at 4422 kB/s, Verified, Read 4096 kB at 20703 kB/s
Buffer = 16 kB, Written 4096 kB at 4490 kB/s, Verified, Read 4096 kB at 20652 kB/s
Buffer = 16 kB, Written 4096 kB at 4425 kB/s, Verified, Read 4096 kB at 18552 kB/s
Samsung EVO 128 GB with the cache extension enabled:
clkfreq = 360000000 clkmode = 0x10011fb
Filesystem = fatfs, Driver = sdsdcard
mount sd: OK
Buffer = 2 kB, Written 2048 kB at 649 kB/s, Verified, Read 2048 kB at 6508 kB/s
Buffer = 2 kB, Written 2048 kB at 666 kB/s, Verified, Read 2048 kB at 6606 kB/s
Buffer = 2 kB, Written 2048 kB at 666 kB/s, Verified, Read 2048 kB at 6439 kB/s
Buffer = 2 kB, Written 2048 kB at 660 kB/s, Verified, Read 2048 kB at 6099 kB/s
Buffer = 4 kB, Written 2048 kB at 1268 kB/s, Verified, Read 2048 kB at 10637 kB/s
Buffer = 4 kB, Written 2048 kB at 1264 kB/s, Verified, Read 2048 kB at 10302 kB/s
Buffer = 4 kB, Written 2048 kB at 1263 kB/s, Verified, Read 2048 kB at 11865 kB/s
Buffer = 4 kB, Written 2048 kB at 1269 kB/s, Verified, Read 2048 kB at 11616 kB/s
Buffer = 8 kB, Written 4096 kB at 2466 kB/s, Verified, Read 4096 kB at 16237 kB/s
Buffer = 8 kB, Written 4096 kB at 2243 kB/s, Verified, Read 4096 kB at 13647 kB/s
Buffer = 8 kB, Written 4096 kB at 2455 kB/s, Verified, Read 4096 kB at 14132 kB/s
Buffer = 8 kB, Written 4096 kB at 2232 kB/s, Verified, Read 4096 kB at 17138 kB/s
Buffer = 16 kB, Written 4096 kB at 3926 kB/s, Verified, Read 4096 kB at 24097 kB/s
Buffer = 16 kB, Written 4096 kB at 3498 kB/s, Verified, Read 4096 kB at 19588 kB/s
Buffer = 16 kB, Written 4096 kB at 3529 kB/s, Verified, Read 4096 kB at 19268 kB/s
Buffer = 16 kB, Written 4096 kB at 3478 kB/s, Verified, Read 4096 kB at 17290 kB/s
Huh, just found a bug in the DAT0 Busy waiting routine. Fixing this has restored the performance difference in the Samsung card. I'm not sure why it wasn't more of a problem generally to be honest.
The bug came from me recently removing the CMD7 SELECT that used to be embedded in that routine but I didn't add a replacement of continuous clocks during the waiting. It was in effect relying on whatever trailing clocks came off prior activity.
So both cards are unaffected in the end. They are both responding to the CMD48/CMD49 packets though. They appear to be engaging the cache feature. It just doesn't help with the way I'm using them.
@Wuerfel_21 said:
Is this with any manual cache flushing? That might really make it worse for small sizes.
I've verified, with the modified fwrite()/fread(), that fflush() is only called once at the fclose(). And the ioctl(SYNC)'ing is the only place where I have the card's cache flushed.
Well that's a wash. Some performance enhancement that is. Though you found a bug, so that's good. But that implies that something did change. Maybe the first few sectors are accelerated and then it gets slower towards the end?
@Wuerfel_21 said:
Well that's a wash. Some performance enhancement that is. Though you found a bug, so that's good. But that implies that something did change. Maybe the first few sectors are accelerated and then it gets slower towards the end?
There's an extra step in the ioctl(SYNC) routine where it has to wait on the Busy both before the CMD49 and again after to ensure the flush is complete. Only one wait is needed without the CMD49.
With the bug there, the waiting was somehow slower but not stalled. Whereas without the CMD49 it wasn't slowed at all.
So @evanh have you managed to determine the source of all these various inter-cluster sector overheads when you timestamped them and which might be candidates for removal/optimization?
That FATFS stuff we found earlier related to avoiding cluster allocation during writes still has no effect? Was that because these APIs can't easily be accessed by your test application or some other reason? Unfortunately I'm only partially following this thread right now so don't have a lot of time to consider it all.
@evanh said:
Heh, no, I stopped looking at that when Ada gave me hope for ignoring it.
I'm guessing you may have to revisit this eventually if we want to get rid of those single sector accesses which seem to be killing streaming performance.
The idea had been that the caching would make it all faster by eliminating the long Busy states. The small singles would be so fast they wouldn't matter much. Alas, that didn't pan out.
@evanh said:
The idea had been that the caching would make it all faster by eliminating the long Busy states. The small singles would be so fast they wouldn't matter much. Alas, that didn't pan out.
Plus those sorts of extra features are probably somewhat card dependent anyway. Caching may not help streaming writes much one the buffer fills up and you are still writing.
Comments
While you're at it, uncomment the three ACMD13 lines below in the driver:
// Support for caching is not yet implemented, but sounds a promising approach send_acmd(13, 0, resp); rx_datablocks(buff, 1, timeout, resp); // data length is 64 bytes, CRC will fail // __builtin_printf(" ACMD13 - "); // for( tmr = 0; tmr <= 63; tmr++ ) // __builtin_printf(" %02x", buff[tmr]); __builtin_printf("Cache (A2 extension) supported = ");
That'll give us a little peek at read data content.
Here's an example of that output:
ACMD13 - 80 00 00 00 08 00 00 00 04 04 90 00 08 0a 19 0a 00 08 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
With the Mac formatted driver, it seems it can read from a file named speed1.bin that I created on the disk:
clkfreq = 300000000 clkmode = 0x1000efb Filesystem = fatfs, Driver = sdsdcard Clock divider for SD card is 3 (100 MHz) mount sd: OK Mis-match! Read 1816 kB at 8137 kB/s Mis-match! Read 1816 kB at 8118 kB/s Mis-match! Read 1816 kB at 8085 kB/s Mis-match! Read 1816 kB at 8098 kB/s
So, appears it can read a file, just not write...
Or you could write a program to dump the content of some known data blocks like the MBR.
Ok, here's with the mac formatted disk:
OCR register c0ff8000 - SDHC/SDXC Card Data Transfer Mode entered - Published RCA aaaa0000 4-bit data interface engaged ACMD13 - 80 00 00 00 03 00 00 00 04 00 90 00 14 05 1a 0a 00 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Cache (A2 extension) supported = no High-Speed access mode engaged CID register backed up SD clock divider set to sysclock/3. 'rxlag' compensation is 0 . CMD10 error! . CMD10 error! . CMD10 error! . CMD10 error! . CMD10 error! ................................................................................ ................................................................................ . CMD10 error! rxlag=6 selected Lowest=6 Highest=7 CID decode: ManID=03 OEMID=SD Name=SA08G Ver=8.0 Serial=17A32AA1 Date=2024-2 SD Card Init Successful RD0 424816 RD2000 427783 RD2001 428575 mount sd: OK RD9646 430056 WR9646 430903 !
Here's with Windows formatted disk:
clkfreq = 300000000 clkmode = 0x1000efb Filesystem = fatfs, Driver = sdsdcard Clock divider for SD card is 3 (100 MHz) Set pins: CLK_PIN=44 CMD_PIN=45 DAT_PIN=40 POW_PIN=39 LED_PIN=46 Card detected ... power cycle of SD card power-down threshold = 37 pin state = 1 power-down slope = 13380 us pin state = 0 power-up threshold = 209 pin state = 0 power-up slope = 1032 us pin state = 1 SD clock divider set to sysclock/750. 'rxlag' compensation is 0 Card idle OK OCR register c0ff8000 - SDHC/SDXC Card Data Transfer Mode entered - Published RCA aaaa0000 4-bit data interface engaged ACMD13 - 80 00 00 00 03 00 00 00 04 00 90 00 14 05 1a 0a 00 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Cache (A2 extension) supported = no High-Speed access mode engaged CID register backed up SD clock divider set to sysclock/3. 'rxlag' compensation is 0 . CMD10 error! . CMD10 error! . . . CMD10 error! ................................................................................ ................................................................................ . CMD10 error! rxlag=6 selected Lowest=6 Highest=7 CID decode: ManID=03 OEMID=SD Name=SA08G Ver=8.0 Serial=16D32AA3 Date=2024-2 SD Card Init Successful RD0 422166 RD2000 425130 RD2001 426253 mount sd: OK RD4000 427737 RD4001 428565 RD4002 429611 RD4003 430442 RD4004 431269 RD40 05 432704 RD4006 433749 RD4007 434592 RD4008 435420 RD4009 436253 RD400a 43 7083 RD400b 437917 RD400c 438763 RD400d 440202 RD400e 441247 RD400f 442074 RD4000 442886 RD4001 443716 RD4002 444534 RD4003 445354 RD4004 446173 RD40 05 447604 Buffer = 2 kB, tmr=449015 i=0, bytes=2048, fh=70980 WR4005 451064 !
What are your drives formatted with? Some kind of Linux?
Maybe this formatter will help?
https://www.sdcard.org/downloads/formatter/
That didn't work, but does give a different error message now:
Filesystem = fatfs, Driver = sdsdcard Clock divider for SD card is 4 (75 MHz) Set pins: CLK_PIN=44 CMD_PIN=45 DAT_PIN=40 POW_PIN=39 LED_PIN=46 Card detected ... power cycle of SD card power-down threshold = 37 pin state = 1 power-down slope = 11704 us pin state = 0 power-up threshold = 209 pin state = 0 power-up slope = 1032 us pin state = 1 SD clock divider set to sysclock/750. 'rxlag' compensation is 0 Card idle OK OCR register c0ff8000 - SDHC/SDXC Card Data Transfer Mode entered - Published RCA aaaa0000 4-bit data interface engaged ACMD13 - 80 00 00 00 03 00 00 00 04 00 90 00 14 05 1a 0a 00 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Cache (A2 extension) supported = no High-Speed access mode engaged CID register backed up SD clock divider set to sysclock/4. 'rxlag' compensation is 0 . CMD10 error! . CMD10 error! . CMD10 error! . CMD10 error! ................................................................................ ................................................................................ ................................................................................ . CMD10 error! rxlag=6 selected Lowest=5 Highest=7 CID decode: ManID=03 OEMID=SD Name=SA08G Ver=8.0 Serial=17A32AA1 Date=2024-2 SD Card Init Successful RD0 743334 RD2000 746311 RD2001 747108 mount sd: OK fopen() for writing failed! errno = 7: Not enough memory Clear pins: 44 45 40 39 46
If we're suspicious of filesystem mismanagement then it's easy enough to swap to the old SPI driver and associated filesystem while still using the same card socket.
In the tester program, edit the
mountsd()
function. Comment out the three lines pertaining to _vfs_open_sdsdcard() and uncomment the two lines pertaining to _vfs_open_sdcardx().You probably also need to edit the pin enums at the top for PIN_DI, PIN_DO and PIN_CS.
CS is same pin as DAT3
DO is same pin as DAT0
DI is same pin as CMD
@evanh any way you could format a card with the sdcard.org program and see it it works?
It works for me now. My cards all work with both drivers. Reformatting isn't going to make it not work.
Give the SPI driver a run. It should work for you. EDIT: You could probably comment out all the speed tests except one. The SPI driver will be a lot slower. Wow, no, at 300 MHz sysclock, it's close at writes until the buffer size gets large.
Replace the enums with the following:
PIN_DO = PIN_DAT0, PIN_DI = PIN_CMD, PIN_CS = PIN_DAT3,
In the past there has been a difference between the compiler on Windoze and the compiler on Linux. The bug was found by comparing the user compiled binaries and .lst files. ie: doing a binary compare of
sdfat-speedtest.binary
of both mine and yours.We'd need to align our source files first though.
I’ll just setup a Linux box to reproduce your result. Think flavor matters? U on Ubuntu ?
No, test the SPI driver. That's easy to do.
If the SPI driver doesn't help, okay setup Ubuntu then. I'm on Kubuntu 24.04, btw.
A few months back I moved from Kubuntu 20.04. Interestingly, I installed as a minimal desktop, so lots of stuff doesn't get pre-installed then. A little bit surprisingly that includes no compilers. It wasn't any big deal though. Just had to add GCC and Make/Bison for building Flexspin. Don't remember having to add Git though. Maybe I added that earlier and forgot.
Hmm, well, no luck with CMD48. The A2 cards just aren't responding to it at all.
CMD48 issued while in Transfer State. Card isn't busy. Moving on to another command afterward is no problem.
EDIT: Err, it does mess up the subsequent command ...
EDIT2: Oh, now that's weird, I am getting something as long as I ignore the timeout on the command-response packet. I have no idea what it is yet:
CID decode: ManID=03 OEMID=SD Name=SN64G Ver=8.0 Serial=8AB989E1 Date=2021-2 Ext Func0:Page0: 00 00 70 00 02 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 4d 46 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 01 00 00 00 04 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 45 46 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
EDIT3: That's the Sandisk. The second A2 card, the Samsung, is responding but isn't giving anything meaningful. Other cards timeout on the data block as well as the command-response.
CID decode: ManID=1B OEMID=SM Name=ED2S5 Ver=3.0 Serial=49C16906 Date=2023-2 Ext Func0:Page0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Doh! There was a valid R1 response all along. I'd just bugged the check logic and didn't bother to verify it. I got the scope out this morning and did exactly that and only then realised my mistake. Was too tired again I guess.
Rayman,
This program should be preconfigured to use the SPI driver with your card slot config.
Okay, I think I'm getting it slowly. The entirety of Extension Function #0 looks to be just a description of what is contained in the subsequent functions.
Assuming Function's 1 and 2 are always going to be the same predetermined Power Management Function (PMF) and Performance Enhancement Function (PEF) structures respectively, Function 0 can probably be ignored. Which could explain why the Samsung card hasn't filled out its Function 0.
All the extra pages per function look to be just that, spare storage for that Extension Function should it desire it.
Got it going I think. Am able to set it without error now. And I get a performance change from the Samsung card. Sadly, that change is for the worse.
The Sandisk is unaffected performance wise.
Samsung EVO 128 GB without the cache extension:
clkfreq = 360000000 clkmode = 0x10011fb Filesystem = fatfs, Driver = sdsdcard mount sd: OK Buffer = 2 kB, Written 2048 kB at 693 kB/s, Verified, Read 2048 kB at 6370 kB/s Buffer = 2 kB, Written 2048 kB at 808 kB/s, Verified, Read 2048 kB at 6848 kB/s Buffer = 2 kB, Written 2048 kB at 733 kB/s, Verified, Read 2048 kB at 6655 kB/s Buffer = 2 kB, Written 2048 kB at 718 kB/s, Verified, Read 2048 kB at 6429 kB/s Buffer = 4 kB, Written 2048 kB at 1476 kB/s, Verified, Read 2048 kB at 11091 kB/s Buffer = 4 kB, Written 2048 kB at 1481 kB/s, Verified, Read 2048 kB at 10820 kB/s Buffer = 4 kB, Written 2048 kB at 1485 kB/s, Verified, Read 2048 kB at 10606 kB/s Buffer = 4 kB, Written 2048 kB at 1483 kB/s, Verified, Read 2048 kB at 10256 kB/s Buffer = 8 kB, Written 4096 kB at 2825 kB/s, Verified, Read 4096 kB at 16941 kB/s Buffer = 8 kB, Written 4096 kB at 2523 kB/s, Verified, Read 4096 kB at 14640 kB/s Buffer = 8 kB, Written 4096 kB at 2896 kB/s, Verified, Read 4096 kB at 15173 kB/s Buffer = 8 kB, Written 4096 kB at 2591 kB/s, Verified, Read 4096 kB at 12146 kB/s Buffer = 16 kB, Written 4096 kB at 5116 kB/s, Verified, Read 4096 kB at 25527 kB/s Buffer = 16 kB, Written 4096 kB at 4422 kB/s, Verified, Read 4096 kB at 20703 kB/s Buffer = 16 kB, Written 4096 kB at 4490 kB/s, Verified, Read 4096 kB at 20652 kB/s Buffer = 16 kB, Written 4096 kB at 4425 kB/s, Verified, Read 4096 kB at 18552 kB/s
Samsung EVO 128 GB with the cache extension enabled:
clkfreq = 360000000 clkmode = 0x10011fb Filesystem = fatfs, Driver = sdsdcard mount sd: OK Buffer = 2 kB, Written 2048 kB at 649 kB/s, Verified, Read 2048 kB at 6508 kB/s Buffer = 2 kB, Written 2048 kB at 666 kB/s, Verified, Read 2048 kB at 6606 kB/s Buffer = 2 kB, Written 2048 kB at 666 kB/s, Verified, Read 2048 kB at 6439 kB/s Buffer = 2 kB, Written 2048 kB at 660 kB/s, Verified, Read 2048 kB at 6099 kB/s Buffer = 4 kB, Written 2048 kB at 1268 kB/s, Verified, Read 2048 kB at 10637 kB/s Buffer = 4 kB, Written 2048 kB at 1264 kB/s, Verified, Read 2048 kB at 10302 kB/s Buffer = 4 kB, Written 2048 kB at 1263 kB/s, Verified, Read 2048 kB at 11865 kB/s Buffer = 4 kB, Written 2048 kB at 1269 kB/s, Verified, Read 2048 kB at 11616 kB/s Buffer = 8 kB, Written 4096 kB at 2466 kB/s, Verified, Read 4096 kB at 16237 kB/s Buffer = 8 kB, Written 4096 kB at 2243 kB/s, Verified, Read 4096 kB at 13647 kB/s Buffer = 8 kB, Written 4096 kB at 2455 kB/s, Verified, Read 4096 kB at 14132 kB/s Buffer = 8 kB, Written 4096 kB at 2232 kB/s, Verified, Read 4096 kB at 17138 kB/s Buffer = 16 kB, Written 4096 kB at 3926 kB/s, Verified, Read 4096 kB at 24097 kB/s Buffer = 16 kB, Written 4096 kB at 3498 kB/s, Verified, Read 4096 kB at 19588 kB/s Buffer = 16 kB, Written 4096 kB at 3529 kB/s, Verified, Read 4096 kB at 19268 kB/s Buffer = 16 kB, Written 4096 kB at 3478 kB/s, Verified, Read 4096 kB at 17290 kB/s
Is this with any manual cache flushing? That might really make it worse for small sizes.
Huh, just found a bug in the DAT0 Busy waiting routine. Fixing this has restored the performance difference in the Samsung card. I'm not sure why it wasn't more of a problem generally to be honest.
The bug came from me recently removing the CMD7 SELECT that used to be embedded in that routine but I didn't add a replacement of continuous clocks during the waiting. It was in effect relying on whatever trailing clocks came off prior activity.
So both cards are unaffected in the end. They are both responding to the CMD48/CMD49 packets though. They appear to be engaging the cache feature. It just doesn't help with the way I'm using them.
I've verified, with the modified fwrite()/fread(), that fflush() is only called once at the fclose(). And the ioctl(SYNC)'ing is the only place where I have the card's cache flushed.
Well that's a wash. Some performance enhancement that is. Though you found a bug, so that's good. But that implies that something did change. Maybe the first few sectors are accelerated and then it gets slower towards the end?
There's an extra step in the ioctl(SYNC) routine where it has to wait on the Busy both before the CMD49 and again after to ensure the flush is complete. Only one wait is needed without the CMD49.
With the bug there, the waiting was somehow slower but not stalled. Whereas without the CMD49 it wasn't slowed at all.
@evanh Tried with regular uSD driver and it doesn't work. Investigating as to why...
It has to be FAT32. That formatter may default to ExFAT.
So @evanh have you managed to determine the source of all these various inter-cluster sector overheads when you timestamped them and which might be candidates for removal/optimization?
That FATFS stuff we found earlier related to avoiding cluster allocation during writes still has no effect? Was that because these APIs can't easily be accessed by your test application or some other reason? Unfortunately I'm only partially following this thread right now so don't have a lot of time to consider it all.
Heh, no, I stopped looking at that when Ada gave me hope for ignoring it.
I'm guessing you may have to revisit this eventually if we want to get rid of those single sector accesses which seem to be killing streaming performance.
The idea had been that the caching would make it all faster by eliminating the long Busy states. The small singles would be so fast they wouldn't matter much. Alas, that didn't pan out.
Plus those sorts of extra features are probably somewhat card dependent anyway. Caching may not help streaming writes much one the buffer fills up and you are still writing.