@MXX said:
@evanh, very impressive speed test results!
Are there any advanges / drawbacks in using your hand wire pins sequence vs rloh add on board sequence? Are them 100% equivalent?
No power switch, and no indicator LED. Otherwise they function the same, yes. LED is unimportant but I would advise to have the power switch.
Placing the DAT pins first or last doesn't matter, no. I was just proving it as part of doing the hand wiring.
The LED can be used for activity so you can tell when it reads/writes the SD, or even for other debug. If you have the pins free (e.g. in the group of 8 IO pins on an P2 breakout header) it's useful, that's why I added it into my circuit. If you don't have spare IO or you want to alloc pins more freely yourself, you could certainly omit it in a pinch.
The power switch is useful to avoid lock ups that require manual intervention (card removal, refit etc). The pin that controls it does double duty and can read back the presence of the card without affecting the power switching capability. It also lets you start from a clean/known state every time if you choose to power cycle the card at software startup.
Agreed to both. And I have had to reseat the card in the hand wired slot repeatedly when I'm struggling with coding changes that cause bad sequencing of the card. So the power switch has made a difference.
Switching into SPI is one-way. It's documented as such. Issuing a CMD0 with DAT3 held low is all it takes. For then on it's exclusively SPI only until the next power cycle.
@Rayman said:
Would be nice to update the "shell" example in flexprop to include this 4-bit mode uSD...
Pretty much there I think. The Spin edition has helped a little. Eric revamping the filesystem layer definitely alleviated a file writing performance issue. I didn't do a todo list ...
Know if your new 1-bit and 4-bit drivers for Spin2 can have multiple files open at same time?
Sure, the drivers only operate at the block level. File limits are at the filesystem layer above.
@Rayman said:
@evanh
How does one select the best divider?
Just lower it until it breaks?
Or, is there some top divided clock speed that will work for most cards?
It's precautionary for the moment. 3 is optimal but maybe a badly built hardware interface is best using a larger value.
All values from 2 and up works. 2 is only effective when block read CRC is disabled. The software can't process the CRC any faster than sysclock/3 for both reading and writing but reading has the option of ignoring the CRC, ie: don't run the check at all.
PS: Oh, that's a to-do'er I'd forgotten about in fact. Making the optional block read CRC a runtime switchable feature. It only affects a single function, albeit in a major way. I'd not proceeded because ioctl() wasn't working. Eric got that fixed maybe a week back.
That's right, I'd also planned to move the adjustable clock divider into ioctl() as well.
PPS: Regarding the clock divider, once I get the ioctl() stuff implemented. Operating the SD clock frequency above 50 MHz is technically out of spec for 3.3 Volt High Speed access mode. On that note I might default the clock divider to sysclock/4 rather than sysclock/3. Sysclock/4 also provides the spec'd symmetrical clock pulse. People wanting to push it can then issue ioctl()s to suit.
@Rayman said:
Know if your new 1-bit and 4-bit drivers for Spin2 can have multiple files open at same time?
Sure, the drivers only operate at the block level. File limits are at the filesystem layer above.
This has me wondering whether there is some way to allow more than one COG to access the filesystem (at least for multiple reader situations).
If a lock could be taken during writes until they close the file, then maybe you could share the file system somehow? Although I recall ersmith mentioning the file system code wasn't actually re-entrant (might use some globals somewhere). Perhaps there's some future hope to separate those by COGID.
I think Kye's FAT32 driver does have sufficient locking that you can have multiple instances working on the same card simultaneously. Works for reading, anyways.
@Wuerfel_21 said:
I think Kye's FAT32 driver does have sufficient locking that you can have multiple instances working on the same card simultaneously. Works for reading, anyways.
Ok, but that is separate to flexspin, right? I've not looked as to what is built-in vs external solutions.
@Wuerfel_21 said:
I think Kye's FAT32 driver does have sufficient locking that you can have multiple instances working on the same card simultaneously. Works for reading, anyways.
Ok, but that is separate to flexspin, right? I've not looked as to what is built-in vs external solutions.
I wrote a wrapper that allows you to read multiple files at the same time. This is what I used in my "Rick Roll" demo for April Fools. Allowed for 8 concurrent people to telnet to the P2 to watch the ASCII Rick Roll. The Rick Roll text file was larger than RAM.
I've got the ioctrl() able to set the clock divider now. Defaults to 4. As a side effect, every time it gets set the driver also recalibrates rxlag; even if the divider is unchanged. So it can be used as a programmatic recalibration.
Yep, yep, and yep. No guarantees of a long life of course.
Bare in mind the earlier measurements with the read-block CRC enabled were often at 200 MHz sysclock. So, clock for clock, not actually doubled throughput.
EDIT: Here's the 200 MHz sysclock (100 MHz SD clock) run with CRC disabled.
Buffer = 64 kB, Written 8192 kB at 14219 kB/s, Verified, Read 8192 kB at 29097 kB/s
Buffer = 64 kB, Written 8192 kB at 14360 kB/s, Verified, Read 8192 kB at 28909 kB/s
Buffer = 64 kB, Written 8192 kB at 14329 kB/s, Verified, Read 8192 kB at 29211 kB/s
Buffer = 64 kB, Written 8192 kB at 13776 kB/s, Verified, Read 8192 kB at 29273 kB/s
And the now default sysclock/4 (50 MHz SD clock) with CRC enabled.
Buffer = 64 kB, Written 8192 kB at 12543 kB/s, Verified, Read 8192 kB at 18132 kB/s
Buffer = 64 kB, Written 8192 kB at 12432 kB/s, Verified, Read 8192 kB at 18144 kB/s
Buffer = 64 kB, Written 8192 kB at 12446 kB/s, Verified, Read 8192 kB at 17949 kB/s
Buffer = 64 kB, Written 8192 kB at 12202 kB/s, Verified, Read 8192 kB at 18113 kB/s
The optimal sysclock/3 (66.7 MHz SD clock) with CRC enabled.
Buffer = 64 kB, Written 8192 kB at 14281 kB/s, Verified, Read 8192 kB at 21440 kB/s
Buffer = 64 kB, Written 8192 kB at 14241 kB/s, Verified, Read 8192 kB at 21439 kB/s
Buffer = 64 kB, Written 8192 kB at 14267 kB/s, Verified, Read 8192 kB at 21435 kB/s
Buffer = 64 kB, Written 8192 kB at 13996 kB/s, Verified, Read 8192 kB at 21388 kB/s
You can see that sysclock/3 write speed is as good as sysclock/2. SD mode block-writes require a CRC be calculated. At sysclock/2 the streamer goes idle well before the CRC calculation is done.
@evanh said:
Yep, yep, and yep. No guarantees of a long life of course.
Bare in mind the earlier measurements with the read-block CRC enabled were often at 200 MHz sysclock. So, clock for clock, not actually doubled throughput.
EDIT: Here's the 200 MHz sysclock (100 MHz SD clock) run with CRC disabled.
Buffer = 64 kB, Written 8192 kB at 14219 kB/s, Verified, Read 8192 kB at 29097 kB/s
Buffer = 64 kB, Written 8192 kB at 14360 kB/s, Verified, Read 8192 kB at 28909 kB/s
Buffer = 64 kB, Written 8192 kB at 14329 kB/s, Verified, Read 8192 kB at 29211 kB/s
Buffer = 64 kB, Written 8192 kB at 13776 kB/s, Verified, Read 8192 kB at 29273 kB/s
So is this now a sustainable read speed in this case with CRC's disabled? That's about enough to do uncompressed SDTV video playback if it is. Perhaps with a 270MHz clock it would add some more margin and make it realizable - albeit probably also card dependent. If we can get write speeds to this same level that would be awesome and could enable some long SD recordings of captured SDTV HDMI sources.
@Rayman said:
High speed datalogging might also be interesting...
Yeah if writes can get to similar speeds it'd be handy for a logic analyzer style application, capturing raw byte data at say at 25MHz to SD card for offline analysis (or wider data at lesser speeds). Of course you could potentially build the whole thing into a P2 itself with external PSRAM and operate much faster but the storage will still be limited there vs logging to a large SD card.
Some kind of burst mode also possible. Record a frame at very high speed to PSRAM and then write it to uSD.
The faster writes can increase the possible frame rate...
The existing inner loop consists of 4 load/prep instructions and 8 CRCNIBs to process 32-bits per loop. It might be quite hard even for a lookup table to beat that. The alternative is throw two extra cogs at the job of just CRC processing - Offloading the SD driver cog completely.
@evanh said:
The existing inner loop consists of 4 load/prep instructions and 8 CRCNIBs to process 32-bits per loop. It might be quite hard even for a lookup table to beat that. The alternative is throw two extra cogs at the job of just CRC processing - Offloading the SD driver cog completely.
Lookup table can't possibly beat CRCNIB. You could have a LUT for processing 8 bits at a time, but you'd need at least 2 instructions to do that (more likely: 3 or 4 - don't remember the exact algorithm for LUT CRC), so no good.
@Rayman said:
BTW: Almost afraid to ask this, but have you all ever heard of any legal issue with doing 4-bit mode without being a SD member?
I'm thinking the only real issue is if you put the SD logo on your device without paying them for the privilege.
Trademarks are trademarks, I doubt there could be any claim on just SD as two letters sans any style.
I've not touched any copyright since I've not used or seen any reference source code for operating in SD mode. Much of which is likely bound to SD controller chips anyway. What did get referenced, as a starting point, is existing SPI mode drivers. Namely Flexspin's driver. And obviously I've worked from the freely available specifications, v6.
I also made use of a copy of non-free v3 spec that I found. It provided details on command timings that aren't in the free edition. Section 4.12 shows the framing interrelationships of various command sequences, including bus turnaround and command-response timeout. The sort of stuff a physical controller would do for you.
Almost there. Feature set is finally where I wanted it.
The auto-calibration has a niggle, at 360 MHz only, that I'm not sure I can do much about. It's card dependant. It might be an impedance matching issue. Maybe worth me trying different value inline resistors ...
I got pinged to re-order the Parallax microSD breakout (#64009)
Feels like if I'm doing that, I should add 4-bit support and the power control FET ?
Anything else preferable?
Any preferred pinout ?
And because I don't appreciate the 240R resistor... what if we had dual P2 headers. Gemini SD!
Plug one way around for boot SD (with the 240R), and the other way for data SD (with regular and 4-bit mode).
Either way-- if it keeps a single header, I'll be looking to resolve that 240R speed-bump!
Just the four DAT pins have to be in order as either the high or low four pins.
Either way-- if it keeps a single header, I'll be looking to resolve that 240R speed-bump!
I note the existing #64009 pinout is such it can be used as the primary boot when slotted into accessory header P56..P63. And hence the 240R in case there is also a boot EEPROM in place. So the 240R kind of has to stay.
At any rate, irrespective of the resistor value, this particular pinout precludes a 4-bit pinout on the same header.
Comments
The LED can be used for activity so you can tell when it reads/writes the SD, or even for other debug. If you have the pins free (e.g. in the group of 8 IO pins on an P2 breakout header) it's useful, that's why I added it into my circuit. If you don't have spare IO or you want to alloc pins more freely yourself, you could certainly omit it in a pinch.
The power switch is useful to avoid lock ups that require manual intervention (card removal, refit etc). The pin that controls it does double duty and can read back the presence of the card without affecting the power switching capability. It also lets you start from a clean/known state every time if you choose to power cycle the card at software startup.
Agreed to both. And I have had to reseat the card in the hand wired slot repeatedly when I'm struggling with coding changes that cause bad sequencing of the card. So the power switch has made a difference.
Might need powere switch to change from 4bit mode to 1 bit mode?
Switching into SPI is one-way. It's documented as such. Issuing a CMD0 with DAT3 held low is all it takes. For then on it's exclusively SPI only until the next power cycle.
Would be nice to update the "shell" example in flexprop to include this 4-bit mode uSD...
@evanh
Know if your new 1-bit and 4-bit drivers for Spin2 can have multiple files open at same time?
Pretty much there I think. The Spin edition has helped a little. Eric revamping the filesystem layer definitely alleviated a file writing performance issue. I didn't do a todo list ...
Sure, the drivers only operate at the block level. File limits are at the filesystem layer above.
@evanh
How does one select the best divider?
Just lower it until it breaks?
Or, is there some top divided clock speed that will work for most cards?
It's precautionary for the moment. 3 is optimal but maybe a badly built hardware interface is best using a larger value.
All values from 2 and up works. 2 is only effective when block read CRC is disabled. The software can't process the CRC any faster than sysclock/3 for both reading and writing but reading has the option of ignoring the CRC, ie: don't run the check at all.
PS: Oh, that's a to-do'er I'd forgotten about in fact. Making the optional block read CRC a runtime switchable feature. It only affects a single function, albeit in a major way. I'd not proceeded because ioctl() wasn't working. Eric got that fixed maybe a week back.
That's right, I'd also planned to move the adjustable clock divider into ioctl() as well.
PPS: Regarding the clock divider, once I get the ioctl() stuff implemented. Operating the SD clock frequency above 50 MHz is technically out of spec for 3.3 Volt High Speed access mode. On that note I might default the clock divider to sysclock/4 rather than sysclock/3. Sysclock/4 also provides the spec'd symmetrical clock pulse. People wanting to push it can then issue ioctl()s to suit.
This has me wondering whether there is some way to allow more than one COG to access the filesystem (at least for multiple reader situations).
If a lock could be taken during writes until they close the file, then maybe you could share the file system somehow? Although I recall ersmith mentioning the file system code wasn't actually re-entrant (might use some globals somewhere). Perhaps there's some future hope to separate those by COGID.
I think Kye's FAT32 driver does have sufficient locking that you can have multiple instances working on the same card simultaneously. Works for reading, anyways.
Ok, but that is separate to flexspin, right? I've not looked as to what is built-in vs external solutions.
Yes that's the Spin2 driver
I wrote a wrapper that allows you to read multiple files at the same time. This is what I used in my "Rick Roll" demo for April Fools. Allowed for 8 concurrent people to telnet to the P2 to watch the ASCII Rick Roll. The Rick Roll text file was larger than RAM.
https://forums.parallax.com/discussion/175780/concurrent-sd-card-access#latest
I've got the ioctrl() able to set the clock divider now. Defaults to 4. As a side effect, every time it gets set the driver also recalibrates rxlag; even if the divider is unchanged. So it can be used as a programmatic recalibration.
CRC switch is working
@evanh Are you really clocking the uSD at 170 MHz?
Are you saying that when you turn CRC checking off, you can basically double throughput to 40 MB/s reads?
That's off the charts...
So, this would be with 340 MHz P2 clock suppose...
Yep, yep, and yep. No guarantees of a long life of course.
Bare in mind the earlier measurements with the read-block CRC enabled were often at 200 MHz sysclock. So, clock for clock, not actually doubled throughput.
EDIT: Here's the 200 MHz sysclock (100 MHz SD clock) run with CRC disabled.
And the now default sysclock/4 (50 MHz SD clock) with CRC enabled.
The optimal sysclock/3 (66.7 MHz SD clock) with CRC enabled.
You can see that sysclock/3 write speed is as good as sysclock/2. SD mode block-writes require a CRC be calculated. At sysclock/2 the streamer goes idle well before the CRC calculation is done.
So is this now a sustainable read speed in this case with CRC's disabled? That's about enough to do uncompressed SDTV video playback if it is. Perhaps with a 270MHz clock it would add some more margin and make it realizable - albeit probably also card dependent. If we can get write speeds to this same level that would be awesome and could enable some long SD recordings of captured SDTV HDMI sources.
High speed datalogging might also be interesting...
Yeah if writes can get to similar speeds it'd be handy for a logic analyzer style application, capturing raw byte data at say at 25MHz to SD card for offline analysis (or wider data at lesser speeds). Of course you could potentially build the whole thing into a P2 itself with external PSRAM and operate much faster but the storage will still be limited there vs logging to a large SD card.
Some kind of burst mode also possible. Record a frame at very high speed to PSRAM and then write it to uSD.
The faster writes can increase the possible frame rate...
BTW: Almost afraid to ask this, but have you all ever heard of any legal issue with doing 4-bit mode without being a SD member?
I'm thinking the only real issue is if you put the SD logo on your device without paying them for the privilege.
If recall, Linux already proved you can use 4-bit mode without a license, right?
The existing inner loop consists of 4 load/prep instructions and 8 CRCNIBs to process 32-bits per loop. It might be quite hard even for a lookup table to beat that. The alternative is throw two extra cogs at the job of just CRC processing - Offloading the SD driver cog completely.
Lookup table can't possibly beat CRCNIB. You could have a LUT for processing 8 bits at a time, but you'd need at least 2 instructions to do that (more likely: 3 or 4 - don't remember the exact algorithm for LUT CRC), so no good.
Trademarks are trademarks, I doubt there could be any claim on just SD as two letters sans any style.
I've not touched any copyright since I've not used or seen any reference source code for operating in SD mode. Much of which is likely bound to SD controller chips anyway. What did get referenced, as a starting point, is existing SPI mode drivers. Namely Flexspin's driver. And obviously I've worked from the freely available specifications, v6.
I also made use of a copy of non-free v3 spec that I found. It provided details on command timings that aren't in the free edition. Section 4.12 shows the framing interrelationships of various command sequences, including bus turnaround and command-response timeout. The sort of stuff a physical controller would do for you.
@evanh Is there any update on getting the driver into Flexprop's github?
Sorry if it's been mentioned already...
Almost there. Feature set is finally where I wanted it.
The auto-calibration has a niggle, at 360 MHz only, that I'm not sure I can do much about. It's card dependant. It might be an impedance matching issue. Maybe worth me trying different value inline resistors ...
Hey everyone!
I got pinged to re-order the Parallax microSD breakout (#64009)
Feels like if I'm doing that, I should add 4-bit support and the power control FET ?
Anything else preferable?
Any preferred pinout ?
And because I don't appreciate the 240R resistor... what if we had dual P2 headers. Gemini SD!
Plug one way around for boot SD (with the 240R), and the other way for data SD (with regular and 4-bit mode).
Either way-- if it keeps a single header, I'll be looking to resolve that 240R speed-bump!
Just the four DAT pins have to be in order as either the high or low four pins.
I note the existing #64009 pinout is such it can be used as the primary boot when slotted into accessory header P56..P63. And hence the 240R in case there is also a boot EEPROM in place. So the 240R kind of has to stay.
At any rate, irrespective of the resistor value, this particular pinout precludes a 4-bit pinout on the same header.