@rogloh said:
Same driver, different SD cards on different IO cmd/clk/data pins, right? Not actually sharing the same SD data bus (which I believe is theoretically possible in SD transfer mode); as that case seems like too much complexity.
Yes, two separate SD pins sets for two separate cards. That's how it is right now too. I have your Eval add-on uSD board at basepin 16 and a hand wired full sized SD slot I made at basepin 40.
The driver has no support for sharing the SD bus. It never deselects the card. EDIT: Err, it has to deselect to perform a CMD10 (SEND_CID) when it runs a rxlag calibration cycle.
On that note, I'd very much like to come up with a solution for using block reads instead of CMD10 to do the calibration with. The problem is there's no way to be sure the data blocks being read aren't just all zeros or all ones. I think I'd need to write data to the card storage.
@evanh said:
On that note, I'd very much like to come up with a solution for using block reads instead of CMD10 to do the calibration with. The problem is there's no way to be sure the data blocks being read aren't just all zeros or all ones. I think I'd need to write data to the card storage.
Oh, that's right, I did have an idea to attempt engaging 1.8 Volt UHS interface and see how the card handles the Prop2 staying 3.3 Volts ... if that somehow works then I can use UHS's dedicated CMD19 (SEND_TUNING_BLOCK) which uses the DAT pins.
EDIT: Nah, it'll be a bust. The Vdd supply stays at 3.3 V after UHS switchover. Which means, at the very least, the card's signalling will be too low for the Prop2 inputs at speed.
Yeah pity. That tuning block command looked useful otherwise. From my memory of old discussions, the pin comparator is slower than the streamer so it may not work at high speed. Still might be worth a quick look though to help measure latency somehow. There may still be some residual correlation between optimal read delay and the response time of the comparator to a known tuning pattern even if its initial delayed response is large due to its own bandwidth limitations. Perhaps try it anyway?
That'd be dog's breakfast. Can't operate at full speed because of the comparator's speed limit, so just projecting from something slower, and then requires power cycling and reiniting after any calibration cycle is done. UHS mode, like SPI mode, can't be switch out of without a power cycle.
And I doubt reliability of any projected method anyway.
PS: UHS would require bitDAC pin config for outputs as well. Otherwise the Prop2's 3.3 V outputs will likely lift the card's 1.8 V regulator voltage and cause a fault there.
All those differences between calibrating and full speed operation would need a lot of careful behaviour mapping to make a projection from. The death nail being that different boards with different track lengths will redefine the mappings. And possibly differences in SD cards will impact it too.
There is a couple but they're both basically empty structures. CMD10's CID structure was the best I found with a decent mix of 1's and 0's.
PS: The calibrator routine performs 12 x CMD10 for each dot, and 80 dots per rxlag setting, and 24 rxlag settings are tested. So possible 24 x 80 x 12 = 23040 issuings of CMD10. In reality a lot less. Each per dot group of 12 is checked for errors, which, when occurs, aborts that whole rxlag setting, flagging it unsuitable, moving on to next setting.
PPS: CMD10 has 136 bit (17 byte) response, including the CRC and framing.
Comparing different numbers of files shows up a growing overhead in handling the FAT filesystem as the directory fills. The verify stage is most affected.
200 MHz sysclock, Sandisk Extreme 64 GB card, sdsd.cc v1.4 driver, 512 bytes file size
Yeah that's interesting. The more sectors the FAT filesystem extends itself into with file allocations, the more FAT sectors need to be navigated through for file verification purposes. It's certainly noticeable in those results.
The other operations are also slowed down too, but not as much, probably because they are more individual operations instead of being repeatedly and alternately accessed due to the different locations on the disk of the two files being verified.
I'm looking into "discard" erase mode - Hoping it might help with consistency of write performance. Discarding would be the best way to implement block TRIMming. TRIM being the name used in SATA drives to inform a Flash based block device of what can be erased in the background at its leisure, and is an optionally supported feature in Flexspin's filesystem handler.
First surprise I got is it's simply an uncommon feature to exist in SD cards. Or at least in my collection. Only one card indicates support for discard - The Sandisk Extreme 64 GB card, made in 2021. The even newer cards in my collection don't support discarding. This includes the Samsung Evo which otherwise has a lot of modern SD features.
I think it makes a difference. Consistency appeared to improve after first performing a full FULE erase then setting up a fresh partition table and FAT32 volume on the SD card. Of note is Flexspin's filesystem handler is issuing one TRIM for each file overwrite.
clkfreq = 330000000 clkmode = 0x10420fb
Compiled with FlexC v7.2.1
Speed Class = C10 UHS Grade = U3 Video Class = V30 App Class = A2
TRIM = 1 FULE = 1
Card User Capacity = 60906 MiB
CID decode: ManID=03 OEMID=SD Name=SN64G
Ver=8.0 Serial=8ab989e1 Date=2021-2
SD clock-divider set to sysclock/3 (110.0 MHz)
rxlag=13 selected Lowest=11 Highest=15
cluster size = 32768 bytes
It's not conclusive. It was previously the same as below but with quite a lot of 21 MB/s and 24 MB/s writing rates popping up as well. Multiple runs are now pretty close to this every run:
TRIM 7f80..8f7f Buffer = 2 kB, Written 2048 kB at 27868 kB/s, Verified, Read 2048 kB at 31688 kB/s
TRIM 7f80..8f7f Buffer = 2 kB, Written 2048 kB at 27520 kB/s, Verified, Read 2048 kB at 31688 kB/s
TRIM 7f80..8f7f Buffer = 2 kB, Written 2048 kB at 29822 kB/s, Verified, Read 2048 kB at 31683 kB/s
TRIM 8f80..9f7f Buffer = 4 kB, Written 2048 kB at 28691 kB/s, Verified, Read 2048 kB at 37202 kB/s
TRIM 8f80..9f7f Buffer = 4 kB, Written 2048 kB at 29923 kB/s, Verified, Read 2048 kB at 37182 kB/s
TRIM 8f80..9f7f Buffer = 4 kB, Written 2048 kB at 25504 kB/s, Verified, Read 2048 kB at 36628 kB/s
TRIM 9f80..bf7f Buffer = 8 kB, Written 4096 kB at 33469 kB/s, Verified, Read 4096 kB at 41727 kB/s
TRIM 9f80..bf7f Buffer = 8 kB, Written 4096 kB at 31246 kB/s, Verified, Read 4096 kB at 41720 kB/s
TRIM 9f80..bf7f Buffer = 8 kB, Written 4096 kB at 32071 kB/s, Verified, Read 4096 kB at 41727 kB/s
TRIM bf80..df7f Buffer = 16 kB, Written 4096 kB at 31549 kB/s, Verified, Read 4096 kB at 44086 kB/s
TRIM bf80..df7f Buffer = 16 kB, Written 4096 kB at 29758 kB/s, Verified, Read 4096 kB at 44108 kB/s
TRIM bf80..df7f Buffer = 16 kB, Written 4096 kB at 29624 kB/s, Verified, Read 4096 kB at 44119 kB/s
TRIM df80..11f7f Buffer = 32 kB, Written 8192 kB at 35723 kB/s, Verified, Read 8192 kB at 45609 kB/s
TRIM df80..11f7f Buffer = 32 kB, Written 8192 kB at 36177 kB/s, Verified, Read 8192 kB at 45589 kB/s
TRIM df80..11f7f Buffer = 32 kB, Written 8192 kB at 35787 kB/s, Verified, Read 8192 kB at 45611 kB/s
TRIM 11f80..15f7f Buffer = 64 kB, Written 8192 kB at 34712 kB/s, Verified, Read 8192 kB at 45727 kB/s
TRIM 11f80..15f7f Buffer = 64 kB, Written 8192 kB at 35837 kB/s, Verified, Read 8192 kB at 45726 kB/s
TRIM 11f80..15f7f Buffer = 64 kB, Written 8192 kB at 35803 kB/s, Verified, Read 8192 kB at 45746 kB/s
TRIM 15f80..19f7f Buffer = 128 kB, Written 8192 kB at 36617 kB/s, Verified, Read 8192 kB at 45738 kB/s
TRIM 15f80..19f7f Buffer = 128 kB, Written 8192 kB at 34870 kB/s, Verified, Read 8192 kB at 45702 kB/s
TRIM 15f80..19f7f Buffer = 128 kB, Written 8192 kB at 34535 kB/s, Verified, Read 8192 kB at 45734 kB/s
Hmm, okay, nope, no difference after all. It was the FULE erase that made a small difference. Disabling/enabling TRIM makes no measurable difference, at least not in the short term.
The speed tester sequence has the filesystem handler always overwriting the same blocks for the same files, so there's no fragmentation occurring. Therefore making it easy for the card to know what can be erased without any assistive prompting.
The additional code adds 384 bytes to the binary size. I'll leave it in the driver source but it won't be compiled in without the right #define switch - Which is in Flexspin's include/filesys/fatfs/ffconf.h
Comments
Yes, two separate SD pins sets for two separate cards. That's how it is right now too. I have your Eval add-on uSD board at basepin 16 and a hand wired full sized SD slot I made at basepin 40.
The driver has no support for sharing the SD bus. It never deselects the card. EDIT: Err, it has to deselect to perform a CMD10 (SEND_CID) when it runs a rxlag calibration cycle.
On that note, I'd very much like to come up with a solution for using block reads instead of CMD10 to do the calibration with. The problem is there's no way to be sure the data blocks being read aren't just all zeros or all ones. I think I'd need to write data to the card storage.
Oh, that's right, I did have an idea to attempt engaging 1.8 Volt UHS interface and see how the card handles the Prop2 staying 3.3 Volts ... if that somehow works then I can use UHS's dedicated CMD19 (SEND_TUNING_BLOCK) which uses the DAT pins.
EDIT: Nah, it'll be a bust. The Vdd supply stays at 3.3 V after UHS switchover. Which means, at the very least, the card's signalling will be too low for the Prop2 inputs at speed.
Yeah pity. That tuning block command looked useful otherwise. From my memory of old discussions, the pin comparator is slower than the streamer so it may not work at high speed. Still might be worth a quick look though to help measure latency somehow. There may still be some residual correlation between optimal read delay and the response time of the comparator to a known tuning pattern even if its initial delayed response is large due to its own bandwidth limitations. Perhaps try it anyway?
That'd be dog's breakfast. Can't operate at full speed because of the comparator's speed limit, so just projecting from something slower, and then requires power cycling and reiniting after any calibration cycle is done. UHS mode, like SPI mode, can't be switch out of without a power cycle.
And I doubt reliability of any projected method anyway.
PS: UHS would require bitDAC pin config for outputs as well. Otherwise the Prop2's 3.3 V outputs will likely lift the card's 1.8 V regulator voltage and cause a fault there.
All those differences between calibrating and full speed operation would need a lot of careful behaviour mapping to make a projection from. The death nail being that different boards with different track lengths will redefine the mappings. And possibly differences in SD cards will impact it too.
Ok. Sounds dubious then at best. There's no fixed/known JEDEC like structure on the SD somewhere that can be read via block transfers?
There is a couple but they're both basically empty structures. CMD10's CID structure was the best I found with a decent mix of 1's and 0's.
PS: The calibrator routine performs 12 x CMD10 for each dot, and 80 dots per rxlag setting, and 24 rxlag settings are tested. So possible 24 x 80 x 12 = 23040 issuings of CMD10. In reality a lot less. Each per dot group of 12 is checked for errors, which, when occurs, aborts that whole rxlag setting, flagging it unsuitable, moving on to next setting.
PPS: CMD10 has 136 bit (17 byte) response, including the CRC and framing.
I've made another tester program, this time for file create and delete testing ...
Using v1.4 of sdsd.cc
Using v1.2 of sdsd.cc
Using plug-in version of sdmm.cc
And its sister sdmm_bashed.cc
All testing was at 200 MHz sysclock and with a Sandisk Extreme 64 GB card
Samsung EVO 128 GB card
Using v1.4 of sdsd.cc
Using v1.2 of sdsd.cc
Adata Orange 64 GB card, using v1.4 of sdsd.cc
Kingston Select Plus 64 GB card, using v1.4 of sdsd.cc
Apacer 16 GB card, using v1.4 of sdsd.cc
Adata 2 GB (Camera) card, using v1.4 of sdsd.cc
Comparing different numbers of files shows up a growing overhead in handling the FAT filesystem as the directory fills. The verify stage is most affected.
200 MHz sysclock, Sandisk Extreme 64 GB card, sdsd.cc v1.4 driver, 512 bytes file size
Yeah that's interesting. The more sectors the FAT filesystem extends itself into with file allocations, the more FAT sectors need to be navigated through for file verification purposes. It's certainly noticeable in those results.
The other operations are also slowed down too, but not as much, probably because they are more individual operations instead of being repeatedly and alternately accessed due to the different locations on the disk of the two files being verified.
I'm looking into "discard" erase mode - Hoping it might help with consistency of write performance. Discarding would be the best way to implement block TRIMming. TRIM being the name used in SATA drives to inform a Flash based block device of what can be erased in the background at its leisure, and is an optionally supported feature in Flexspin's filesystem handler.
First surprise I got is it's simply an uncommon feature to exist in SD cards. Or at least in my collection. Only one card indicates support for discard - The Sandisk Extreme 64 GB card, made in 2021. The even newer cards in my collection don't support discarding. This includes the Samsung Evo which otherwise has a lot of modern SD features.
I think it makes a difference. Consistency appeared to improve after first performing a full FULE erase then setting up a fresh partition table and FAT32 volume on the SD card. Of note is Flexspin's filesystem handler is issuing one TRIM for each file overwrite.
It's not conclusive. It was previously the same as below but with quite a lot of 21 MB/s and 24 MB/s writing rates popping up as well. Multiple runs are now pretty close to this every run:
Hmm, okay, nope, no difference after all. It was the FULE erase that made a small difference. Disabling/enabling TRIM makes no measurable difference, at least not in the short term.
The speed tester sequence has the filesystem handler always overwriting the same blocks for the same files, so there's no fragmentation occurring. Therefore making it easy for the card to know what can be erased without any assistive prompting.
The additional code adds 384 bytes to the binary size. I'll leave it in the driver source but it won't be compiled in without the right #define switch - Which is in Flexspin's include/filesys/fatfs/ffconf.h