Actually, the Single-block loop is even bigger now. Looking at it I'm a little surprised the Multi-block path is working as well as it does. Almost all of the logic is done in the assembly. The only part there still in C is the decision, and loop, on the block count fitting the buffer.
So with your overheads and what gaps you've seen before with fast cards, what sort of sustained transfer rates do you expect will be achievable on reads (no-CRC check enabled) and writes (with CRC) ? Can we get 28MiB/s non-stop running on a 270MHz P2? That would allow 30fps video at 640x480x24 bits in pure RGB (no 4:2:2 subsampling) with audio. 30MiB/s would allow 24fps widescreen 858x480p or thereabouts.
270 MHz sysclock, clock divider of 3, with CRC processing enabled, it can read data at 36 MiB/s. EDIT: Ah, well, still need to add filesystem overheads to that. Time to get back to integrating into the driver ...
EDIT2: Oh, the 36 MB/s was with 8 kB buffer size, btw. A 64 kB buffer moves that up to 38 MB/s. And 16 kB gives a solid 37 MB/s.
EDIT3: Disabling the CRC processing and using sysclock/2 divider takes that to 55 MB/s. Or even a little more, 58 MB/s, with the Sandisk cards.
EDIT3: 270 MHz with sysclock/2 (135 MHz SD clock) is of course massively overclocking the SD bus for the 3.3 Volt High Speed interface. It does, however, fit within the upper limit of 1.8 Volt UHS-I interface (Which can operate in spec up to an insane 208 MHz SD clock). So I guess that's why newer cards just accept it and keep up.
The Samsung EVO card has the strangest behaviour. I've hesitated to mention it before but it does seem to be persisting. Upon first run of the testing it performs exceptionally poorly at about half the expected speeds - Even on repeats of reading the same blocks over and over. One whole test run rereads the same sequential block list 15 times, with each loop halving the total number to read.
After the first run it's fine. It's like the card needs a few seconds to warm up.
Maybe it fits in some sort of internal cache? 16MB is not beyond the realm of fitting into a cache. If you try a much larger transfer range test that has no chance of fitting then maybe you won't see such a difference between runs 1 and 2.
It would be sorted after the first loop then. The second line of the first test run should show a dramatic up-tick in performance but it doesn't.
And power cycling doesn't revert the performance either. It's still fine after swapping cards for a while and coming back to the Samsung. The problem only seems to show up after hours or days of no power.
I'm guessing if I made a test that ran for say 30 seconds and graphed progress every 0.1 second I'd see it rise suddenly a few seconds into the run. But only when the card has been cold.
Wow, I'm impressed. Even the older cards are performing at 135 MHz SD clock. Here's my oldest card, the Adata Silver (2013), at sysclock/2 without CRC processing. Note the first line has a, repeatable, latency spike:
@rogloh said:
What if you start with a warm card to begin with? Sit it next to a heat source for a bit then test it.
Yeah, no, that was a euphemistic use of cold. But, taking the hint, I've now tested it as an actual thermally cold card and it's still behaving perfectly fine first try. So cold in this case only seems to be when unpowered for days.
I won't throw it out, but clearly it's not looking a happy SD card any longer. I'm gonna file it under it-was-already-faulty and I just sped it to the grave.
EDIT: Huh, that latest pattern above, where the performance was consistently ok-poor-ok through the test sizes, I do now remember one of the cards did that before. Yeah, I'm concluding the Samsung card has always been sick.
Yeah, the cell charge, a cell level calibration thingy. QLC Flash will be the worst for this. Back in 840 EVO days was still TLC.
Oddly, it has always seemed to be a Samsung exclusive issue though.
Comments
Actually, the Single-block loop is even bigger now. Looking at it I'm a little surprised the Multi-block path is working as well as it does. Almost all of the logic is done in the assembly. The only part there still in C is the decision, and loop, on the block count fitting the buffer.
So with your overheads and what gaps you've seen before with fast cards, what sort of sustained transfer rates do you expect will be achievable on reads (no-CRC check enabled) and writes (with CRC) ? Can we get 28MiB/s non-stop running on a 270MHz P2? That would allow 30fps video at 640x480x24 bits in pure RGB (no 4:2:2 subsampling) with audio. 30MiB/s would allow 24fps widescreen 858x480p or thereabouts.
270 MHz sysclock, clock divider of 3, with CRC processing enabled, it can read data at 36 MiB/s. EDIT: Ah, well, still need to add filesystem overheads to that. Time to get back to integrating into the driver ...
EDIT2: Oh, the 36 MB/s was with 8 kB buffer size, btw. A 64 kB buffer moves that up to 38 MB/s. And 16 kB gives a solid 37 MB/s.
EDIT3: Disabling the CRC processing and using sysclock/2 divider takes that to 55 MB/s. Or even a little more, 58 MB/s, with the Sandisk cards.
EDIT3: 270 MHz with sysclock/2 (135 MHz SD clock) is of course massively overclocking the SD bus for the 3.3 Volt High Speed interface. It does, however, fit within the upper limit of 1.8 Volt UHS-I interface (Which can operate in spec up to an insane 208 MHz SD clock). So I guess that's why newer cards just accept it and keep up.
The Samsung EVO card has the strangest behaviour. I've hesitated to mention it before but it does seem to be persisting. Upon first run of the testing it performs exceptionally poorly at about half the expected speeds - Even on repeats of reading the same blocks over and over. One whole test run rereads the same sequential block list 15 times, with each loop halving the total number to read.
After the first run it's fine. It's like the card needs a few seconds to warm up.
The poor results of first time run:
Then the very next run is this:
Maybe it fits in some sort of internal cache? 16MB is not beyond the realm of fitting into a cache. If you try a much larger transfer range test that has no chance of fitting then maybe you won't see such a difference between runs 1 and 2.
It would be sorted after the first loop then. The second line of the first test run should show a dramatic up-tick in performance but it doesn't.
And power cycling doesn't revert the performance either. It's still fine after swapping cards for a while and coming back to the Samsung. The problem only seems to show up after hours or days of no power.
I'm guessing if I made a test that ran for say 30 seconds and graphed progress every 0.1 second I'd see it rise suddenly a few seconds into the run. But only when the card has been cold.
What if you start with a warm card to begin with? Sit it next to a heat source for a bit then test it.
Wow, I'm impressed. Even the older cards are performing at 135 MHz SD clock. Here's my oldest card, the Adata Silver (2013), at sysclock/2 without CRC processing. Note the first line has a, repeatable, latency spike:
The Apacer (2018) is clean though:
Yeah, no, that was a euphemistic use of cold. But, taking the hint, I've now tested it as an actual thermally cold card and it's still behaving perfectly fine first try. So cold in this case only seems to be when unpowered for days.
A 10 hour gap isn't enough. Samsung EVO worked first try. Although, the first line does indicate a minor latency extend there:
Which vanishes again on subsequent runs:
Seems weird. Charge leakage?
Maybe. And I may have done damage now. I put it in an oven, possibly over 100 degC, for 5 hours.
First run after:
Seventh run after (5 minutes later):
Eleventh run after (10 minutes):
30 minutes in the freezer:
Run 1:
Run 12:
I won't throw it out, but clearly it's not looking a happy SD card any longer. I'm gonna file it under it-was-already-faulty and I just sped it to the grave.
EDIT: Huh, that latest pattern above, where the performance was consistently ok-poor-ok through the test sizes, I do now remember one of the cards did that before. Yeah, I'm concluding the Samsung card has always been sick.
EDIT2: Reminds me of the days of the full spec'd Samsung 840 EVO SSD needing a firmware update for excessively slow read speeds with age of data. And even then it wasn't a perfect fix. https://www.anandtech.com/show/8617/samsung-releases-firmware-update-to-fix-the-ssd-840-evo-read-performance-bug
EDIT3: Ha! Yep, writing fresh data fixes it.
Yeah, the cell charge, a cell level calibration thingy. QLC Flash will be the worst for this. Back in 840 EVO days was still TLC.
Oddly, it has always seemed to be a Samsung exclusive issue though.