Question for Ada: The brackets around the referenced array element, *(uint32_t *)(&csd[6]), are they important? Or is that just a habit of yours? I'd have thought this would be sufficient: *(uint32_t *)&csd[6]
Oh, bother. I belatedly worked out my quickly wired up full sized SD slot isn't so great. The hack doesn't have any pull-ups, of course. I was aware of things weren't all rosy for a while but hadn't examined the issue until now. Maybe what's most surprising is a couple of cards, including the old 2 GB one, actually behave perfectly without any external pull-ups. The rest of them have all kinds of demented behaviour - Most actually get to the end of init without issue but go splat on the first busy check.
I can only presume those two cards have their own internal pull-ups permanently engaged.
It's not straight forward to modify the driver to handle it either. Every place where there is a DIR change on the CMD or DAT pins I'd also have to add a mode change of said pin(s).
I don't have any formula. Maybe 0.5 W but it depends quite a lot on what is happening, it could be notably below or above. Heavy hubRAM use is known to draw extra.
Just the generic "Unkown error". The rest is debug printf()s internal to the driver.
Better interfacing is something I'm leaving alone for the moment. Eric mentioned he is reworking everything for providing block level access that bypasses the filesystem layers. I'll see what comes of that before hassling about error codes and ioctl() features.
I guess, for development, it can stay as an add-on. It uses all eight I/O pins of a header so represents reasonably effective utilisation when trying to workout what a finished production board would use. No real need to do any dev board designs with an embedded 4-bit SD slot.
@Rayman said:
@rogloh If you guys aren't going to sell it, maybe you won't care if I do?
Was hoping somebody would jump on this, but not seeing it yet...
Anyone can use the design idea. It's open / schematic published. I'd just recommend retaining the same pinout/capabilities so the software will work with it. If you mess with that then any driver code will then need to fragment and will be a PITA to support.
I've made a point of not locking down the pin mapping in the driver. The main requirement is the four DAT pins must be in order least-to-most counting up, and on a multiple of 4-bit boundary, ie: 0..3 or 4..7 or 8...11, and so on. The other four pins can be anywhere individually. And two pins are even optional.
The roughly wired full-sized slot I recently assembled has the pin mapping as: CMD = base+2, CLK = base+3, DAT = base+4 .. base+7.
@rogloh said:
Ok that's good then. I thought it was more restrictive based on SPI mode operation and distances between MISO/MOSI/CLK/CS etc.
The development code used smartpins for init at 400 kHz. Also I was measuring latencies with a smartpin. It was certainly handy having the init code path independent of the performance block read/write code path back then. When things broke I still had the evidence it worked at init. I didn't lose my hair quite so quickly.
None of that went into the final driver though. It's now a generic streamer solution that can switch to any clock divider.
@evanh said:
None of that went into the final driver though. It's now a generic streamer solution that can switch to any clock divider.
At one point we did a problem where one of your regular SPI mode driver transfer optimizations had distance issues and you created some sort of workaround for SPI mode where you needed an intermediate pin remapped to reach the neighboring pins.
This post mentions the optimization that got broken: https://forums.parallax.com/discussion/comment/1545197/#Comment_1545197
Presumably that is no longer the case now in mainstream flexspin, right? That's the thing you removed?
The SD SPI mode driver is entirely separate from the SD mode driver. Different set of files in different subdirectory. Both can be loaded at once for two different slots. EDIT: I haven't tried using both together so don't know what will happen.
As for that hack, it does look to still be intact in the shipped SD SPI mode driver as part of Flexspin.
The current duplication of supporting filesystem source files is another thing to iron out once Eric is done with the redesign.
The Samsung EVO SD card, as well as not publishing any extension data, also doesn't support Sequential CQ mode. I'm going to assume, therefore, that Windoze only checks for the A2 feature flag then neither checks for what extensions a card can do nor uses anything other than Voluntary CQ mode.
The good news is command queuing works without engaging UHS.
Preliminary, and still buggy, testing with the CQ and Cache enabled. With CMD44/45/46/47, and queue depth = 1 as a direct pass through, I'm getting worse performance on small buffer sizes compared to regular CMD17/18/24/25.
That's a bit of a downer from my perspective. I was particularly expecting the single block writes of FAT filesystem management to be handled without the long card busy delays that brings small buffer sizes to a crawl.
@Rayman said:
Here's the first version of the board. Perhaps not as nicely layed out as the @rogloh one, but should hopefully function the same electronically...
Also, got some 22k and 1.8k resistors, just match.
Cool. If you are going to make it available, I'd recommend selecting a decent quality SD card mechanism so people can plug/remove cards a lot before it breaks. The one I chose was a push to insert, and push to release type with a spring. It was just available locally however and not on Mouser or Digikey unfortunately when I looked for it. https://www.altronics.com.au/p/p5717-oupiin-surface-mount-micro-sd-slot-type-memory-card-socket/
Uh-oh! This might be too much. I've worked out that, with CQ+Cache, these cards now sometimes begin sending data on the DAT pins during command-response. I'm in no way configured to handle that!
Uh-oh! This might be too much. I've worked out that, with CQ+Cache, these cards now sometimes begin sending data on the DAT pins during command-response. I'm in no way configured to handle that!
Aha, is this the first time we've seen this case? I recall that was one of the concerns in that these two buses could overlap but we'd never actually seen it.
Totally new for sure. I was scratching my head as to why I was getting bunches of read CRC errors ... until I got the scope probes attached again. Writes have no problem naturally.
The two A2 cards I have behave a little different to each other but the CRC errors have only occurred on certain multi-block bursts I think. Once I realised the why, I stopped looking any closer.
I'm now of the opinion it'll just be too painful to implement a solution with CQ+Cache. There'll be an increasing overhead cost as well as more bloat. Adding a second cog would help but not hugely. That's not a path I ever wanted to go down. There needs to be big benefits for sacrificing a whole cog.
Next approach, I've decided on is to go back to original command set but add delayed single block writes. Make use of IOCTL's sync mechanism to write the updated filesystem metadata then. It'll use more RAM, to hold those blocks, but maybe not much - sync'ing is occurring quite often at the moment.
EDIT: I guess that's technically a cache, within the driver, of those single blocks. It'll need to identify matching block numbers, coming from the filesystem layers, for readback and overwrite.
Comments
Thanks. Good job. I'd moved that code out of the ioctl() for exactly this sort of treatment.
Question for Ada: The brackets around the referenced array element,
*(uint32_t *)(&csd[6])
, are they important? Or is that just a habit of yours? I'd have thought this would be sufficient:*(uint32_t *)&csd[6]
Maybe. C syntax like that always confuses me, so I end up with extra brackets to make sure I get what I want.
Okay, good. Same here.
Oh, bother. I belatedly worked out my quickly wired up full sized SD slot isn't so great. The hack doesn't have any pull-ups, of course. I was aware of things weren't all rosy for a while but hadn't examined the issue until now. Maybe what's most surprising is a couple of cards, including the old 2 GB one, actually behave perfectly without any external pull-ups. The rest of them have all kinds of demented behaviour - Most actually get to the end of init without issue but go splat on the first busy check.
I can only presume those two cards have their own internal pull-ups permanently engaged.
It's not straight forward to modify the driver to handle it either. Every place where there is a DIR change on the CMD or DAT pins I'd also have to add a mode change of said pin(s).
LOL, you need a reverse full size SD to micro SD adapter. Apparently they do exist, although a short cable one would be preferable.
https://www.catch.com.au/product/tf-male-to-microsd-female-card-reader-extender-test-tools-for-car-gps-phone-black-26758787/
I'll dig up some through-hole resistors. Annoyingly, I've left my kit at work and I wasn't planning on going on site tomorrow.
OT question: do you know typical total P2 power for two or three cogs @ 252 MHz or 300 MHz?
I don't have any formula. Maybe 0.5 W but it depends quite a lot on what is happening, it could be notably below or above. Heavy hubRAM use is known to draw extra.
EDIT: 0.5 W should be a high estimate I think.
New feature: The driver now detects if the pull-ups are missing.
Good, that should be helpful. Presumably there is some error return value that can be checked for this.
Just the generic "Unkown error". The rest is debug printf()s internal to the driver.
Better interfacing is something I'm leaving alone for the moment. Eric mentioned he is reworking everything for providing block level access that bypasses the filesystem layers. I'll see what comes of that before hassling about error codes and ioctl() features.
@rogloh If you guys aren't going to sell it, maybe you won't care if I do?
Was hoping somebody would jump on this, but not seeing it yet...
I guess, for development, it can stay as an add-on. It uses all eight I/O pins of a header so represents reasonably effective utilisation when trying to workout what a finished production board would use. No real need to do any dev board designs with an embedded 4-bit SD slot.
Anyone can use the design idea. It's open / schematic published. I'd just recommend retaining the same pinout/capabilities so the software will work with it. If you mess with that then any driver code will then need to fragment and will be a PITA to support.
I've made a point of not locking down the pin mapping in the driver. The main requirement is the four DAT pins must be in order least-to-most counting up, and on a multiple of 4-bit boundary, ie: 0..3 or 4..7 or 8...11, and so on. The other four pins can be anywhere individually. And two pins are even optional.
The roughly wired full-sized slot I recently assembled has the pin mapping as: CMD = base+2, CLK = base+3, DAT = base+4 .. base+7.
Ok that's good then. I thought it was more restrictive based on SPI mode operation and distances between MISO/MOSI/CLK/CS etc.
The development code used smartpins for init at 400 kHz. Also I was measuring latencies with a smartpin. It was certainly handy having the init code path independent of the performance block read/write code path back then. When things broke I still had the evidence it worked at init. I didn't lose my hair quite so quickly.
None of that went into the final driver though. It's now a generic streamer solution that can switch to any clock divider.
At one point we did a problem where one of your regular SPI mode driver transfer optimizations had distance issues and you created some sort of workaround for SPI mode where you needed an intermediate pin remapped to reach the neighboring pins.
This post mentions the optimization that got broken:
https://forums.parallax.com/discussion/comment/1545197/#Comment_1545197
Presumably that is no longer the case now in mainstream flexspin, right? That's the thing you removed?
The SD SPI mode driver is entirely separate from the SD mode driver. Different set of files in different subdirectory. Both can be loaded at once for two different slots. EDIT: I haven't tried using both together so don't know what will happen.
As for that hack, it does look to still be intact in the shipped SD SPI mode driver as part of Flexspin.
The current duplication of supporting filesystem source files is another thing to iron out once Eric is done with the redesign.
The Samsung EVO SD card, as well as not publishing any extension data, also doesn't support Sequential CQ mode. I'm going to assume, therefore, that Windoze only checks for the A2 feature flag then neither checks for what extensions a card can do nor uses anything other than Voluntary CQ mode.
The good news is command queuing works without engaging UHS.
Here's the first version of the board. Perhaps not as nicely layed out as the @rogloh one, but should hopefully function the same electronically...
Also, got some 22k and 1.8k resistors, just match.
Roger cheated with double-siding the components.
Move the slot a little closer to the header and then the fill will go right round the top edge there.
Preliminary, and still buggy, testing with the CQ and Cache enabled. With CMD44/45/46/47, and queue depth = 1 as a direct pass through, I'm getting worse performance on small buffer sizes compared to regular CMD17/18/24/25.
That's a bit of a downer from my perspective. I was particularly expecting the single block writes of FAT filesystem management to be handled without the long card busy delays that brings small buffer sizes to a crawl.
I'll carry on debugging it anyway.
Cool. If you are going to make it available, I'd recommend selecting a decent quality SD card mechanism so people can plug/remove cards a lot before it breaks. The one I chose was a push to insert, and push to release type with a spring. It was just available locally however and not on Mouser or Digikey unfortunately when I looked for it.
https://www.altronics.com.au/p/p5717-oupiin-surface-mount-micro-sd-slot-type-memory-card-socket/
@rogloh one here is like that. Also think that’s better.
Uh-oh! This might be too much. I've worked out that, with CQ+Cache, these cards now sometimes begin sending data on the DAT pins during command-response. I'm in no way configured to handle that!
Aha, is this the first time we've seen this case? I recall that was one of the concerns in that these two buses could overlap but we'd never actually seen it.
Totally new for sure. I was scratching my head as to why I was getting bunches of read CRC errors ... until I got the scope probes attached again. Writes have no problem naturally.
The two A2 cards I have behave a little different to each other but the CRC errors have only occurred on certain multi-block bursts I think. Once I realised the why, I stopped looking any closer.
I'm now of the opinion it'll just be too painful to implement a solution with CQ+Cache. There'll be an increasing overhead cost as well as more bloat. Adding a second cog would help but not hugely. That's not a path I ever wanted to go down. There needs to be big benefits for sacrificing a whole cog.
Next approach, I've decided on is to go back to original command set but add delayed single block writes. Make use of IOCTL's sync mechanism to write the updated filesystem metadata then. It'll use more RAM, to hold those blocks, but maybe not much - sync'ing is occurring quite often at the moment.
EDIT: I guess that's technically a cache, within the driver, of those single blocks. It'll need to identify matching block numbers, coming from the filesystem layers, for readback and overwrite.