What is the current state of SD card interface driver implementations?
ags
Posts: 386
I've spent days researching fast SD card drivers discussed and/or available on this forum. I have enough information to be dangerous and/or confused. If there is another source that consolidates all the various implementations/threads into summary form I'd appreciate a pointer. If not, it may help not just me, but others to provide an overall landscape of what's been done and what is available. Here's what I've spliced together:
(One of) the original fast drivers, fsrw by Tomas Rokicki and later Jonathan Dummer (now v 2.6) has been updated to support FAT16/32. There are at least three PASM driver implementations underlying the SPIN high-level interface. I'm not sure how to select from them. One is "spi_safe". This driver uses single instruction writes and 2 instruction reads (byte and block reads). Does that mean the others are not, and if so, why not? There are also "mb_spi" and "mb_rawb". It looks like "mb_spi" was at one time using a 2 instruction per bit read loop, but that is now commented out and uses a single instruction per bit read loop like "mb_rawb" uses - but for single-byte reads "mb_spi" still uses a 2 instruction per bit loop while "mb_rawb" uses a single instruction per bit read loop. "mb_rawb" supports ReadAhead/WriteBehind. The README in the archive is out of date (not complaining, just sayin') and refers to driver files "sdspi", "sdspiasm", "sdspifasm" and "sdsipqasm" - which no longer seem to exist.
There is another implementation, and even an AppNote on ParralaxSemi, by Kye which appears to be a full-featured FAT16/32 implementation, not as fast as fsrw but a more complete FAT implementation.
There are threads on how to get maximum SPI read/write performance, and various implementations. kuroneko (self-admitted counter tinkerer "because I have a thing with counters...") has made contributions, along with others.
There are many more people that have contributed, and apologies if I've missed out on other significant contributors/contributions. This is not intended to be an exhaustive list, just trying to summarize what I've learned in the past week.
Questions:
Is there a recommended driver to use with fsrw?
Is there any further explanation available on the problems (and perhaps solutions) encountered with the single instruction per bit read methods, or ways to decide when "unsafe" situations occur and how to avoid (of course I still want the fastest reliable driver speed possible) Are there other methods (presumably using counters) to implement fast SPI read and write?
Have I missed something (else/new/improved)?
Thanks.
(One of) the original fast drivers, fsrw by Tomas Rokicki and later Jonathan Dummer (now v 2.6) has been updated to support FAT16/32. There are at least three PASM driver implementations underlying the SPIN high-level interface. I'm not sure how to select from them. One is "spi_safe". This driver uses single instruction writes and 2 instruction reads (byte and block reads). Does that mean the others are not, and if so, why not? There are also "mb_spi" and "mb_rawb". It looks like "mb_spi" was at one time using a 2 instruction per bit read loop, but that is now commented out and uses a single instruction per bit read loop like "mb_rawb" uses - but for single-byte reads "mb_spi" still uses a 2 instruction per bit loop while "mb_rawb" uses a single instruction per bit read loop. "mb_rawb" supports ReadAhead/WriteBehind. The README in the archive is out of date (not complaining, just sayin') and refers to driver files "sdspi", "sdspiasm", "sdspifasm" and "sdsipqasm" - which no longer seem to exist.
There is another implementation, and even an AppNote on ParralaxSemi, by Kye which appears to be a full-featured FAT16/32 implementation, not as fast as fsrw but a more complete FAT implementation.
There are threads on how to get maximum SPI read/write performance, and various implementations. kuroneko (self-admitted counter tinkerer "because I have a thing with counters...") has made contributions, along with others.
There are many more people that have contributed, and apologies if I've missed out on other significant contributors/contributions. This is not intended to be an exhaustive list, just trying to summarize what I've learned in the past week.
Questions:
Is there a recommended driver to use with fsrw?
Is there any further explanation available on the problems (and perhaps solutions) encountered with the single instruction per bit read methods, or ways to decide when "unsafe" situations occur and how to avoid (of course I still want the fastest reliable driver speed possible) Are there other methods (presumably using counters) to implement fast SPI read and write?
Have I missed something (else/new/improved)?
Thanks.
Comments
save_spi is called save because of the slowed down read. What I could gather out of the Forums is that the faster read routine is working but makes problems with some pin-combinations.
So the recommended driver for FSRW is save_spi. All 3 driver from @lonesock (jonathan Dummer) support multi-block read and write and have one complete sector-buffer in the cog to support read-ahead and write behind.
sdspixx is a driver used in FemtoBasic and as far as I remember combines SPI and I2C in one pasm-cog but does not use the multi-block mode of the sd-cards.
Kye's FAT_Engine supports Directories and FSWR does not.
Kye's FAT_Engine supports RTC's and is correctly setting creation-date, last change and last access in the directory-entries. FSRW can be tricked into using a RTC but just sets creation-date.
There is a patch out for Kye's Fat_Engine (kyefat_lfn) that supports READING long filenames (as in iterating thru the directory entries) but not writing them.
The main problem with both Filesystems is that SPIN is way slower than the pasm-driver so all the speed gained by fast multi-block access is lost in handling the cluster and directory access in spin.
And both need a lot of HUB memory.
@MPARK'S spinx uses a older version of FSRW and 2 or 3 pasm-cogs to leave more HUB memory free. I need to look into that a bit more.
Did you missed something? YES! (shameless plug)
There is my RAISD-System. A redundant array of independent SDcards. As addon for the Quickstart the RAISD QS-KIT provides you with 4 SDcards. Available at www.propellerpowered.com.
Enjoy!
Mike
But, Kye's version has more features, so if you don't need the fastest speed, that is worth looking at.
So the question becomes: what was/is the problem with the single instruction read that makes it (sometimes) "unsafe" - and has anyone spent much time trying to address the problem reliably?
The only board I got to fail was also a PCB with SMT components, traces were all fine, no excessive noise coupling or anything...it just would fail to read some of the time. It would work just fine if the PLL was set to 8x, but would fail at 16x. I don't remember what pins were chosen. The failures occurred on 3 of this board, so it was probably not due to a bad prop chip.
The upshot was, if you were doing it for a specific board, test it and go. But since the FSRW block driver was supposed to work in all cases, it was safest to just use 2 instructions per bit read. The total throughput was still pretty high. [8^)
Jonathan
Do you really need FAT32? (You don't necessarily need FAT32). If you allow the SD card to be INTERNAL to the prop, you can relay on your prop application interface to handle reading and writing data to the SD. (This is how any device with memory soldered in treats it, like an Ipad, etc). If you can skip the FAT32, the drive is much smaller and much faster. The SD can be treated similar to the way EEPROM is used, just a bunch of raw pages. In this model, the transfrer speed of the SD is about as fast as the prop can handle. So no waiting, aside from the how ever long it takes for the prop to do the transfer anyway. AND the application can deal with a single 512 byte page as a time, so the transfers are spread over the entire processing time.
Of course, if your primary requirement is physically removing the SD from the prop and placing it in the PC, you are stuck.
Yes, I'm stuck - sort of. I could always write my own low-level drivers to create a custom FS on the SD card from a PC. But I don't have that knowledge/experience at this point (never stopped me before, though).
If you feel you are stuck for lack of alternatives, maybe there are some.
In Propforth, we have a few simple commands that do SD access (same exists for EEprom access). Create a file, read a file, list all the files. Read and write an individual block can be extracted and used in you application. The result is it looks like you have files (because you list them with an "ls" command from the command line), but is really just reading and writing individual blocks, no FAT32. Very fast. This method might be appropriate for a microcontroller with 32k ram and 32 I/O line total.
I'm comfortable with the low-level access to the SD card from the Propeller (not requiring any file system). My concern is how to get data on the SD card from a PC. If I use that method, l would need to write a desktop application that would be able to write to the SD card directly (no file system, just sector writes). I checked out the Propforth pages on google code, is that what you are referring to? I was looking at readme files under "Propforth-08 SD". That is running on the Propeller, not a desktop (correct)?
While I have some ideas on how to create the desktop app, it worries me if I were to distribute it. Seems that a mistake (user specifying the wrong drive (like a hard drive instead of the SD card)) would have disastrous consequences. From what I've seen so far, direct access to raw drive data requires admin permission, and if I overwrote the FAT... well, you know.
Did I understand your reply correctly?
Design isn't done, but I expect I'll need to read at about 5Mbps (and do some minimal processing and then send out to hub RAM)
On read, the standard FSRW will introduce some compute overhead because it has to compute sector and cluster addresses, so your best bet would be to call the SPI driver routines directly to avoid the FSRW overhead. You can simplify the sector addressing by ensuring that all the sectors are contiguous. You can do this by using a freshly formatted SD card when writing the data from a PC. On the Prop, you just need to open the file and get the address of the first sector of the file. You'll probably need to add a method to FSRW to return the sector address, but it's fairly easy to do.
I have been looking at ways to ensure contiguous files on the SD card as you mention (potentially having no file system, just a simple way of knowing the start/end cluster for a set of contiguous "files". That poses some problems for the desktop user (the user creating the data and loading the SD card). Requiring that the SD card is reformatted and written with all necessary files every time there is a change may be a usability issue. Writing my own driver to support a custom "file system light" will be extra work (multiplied by different OSs to support) and could be risky (imagine a user of my SD writing app specifying drive C: instead of G:...).
It is unfortunate that I need to read, not write. Murphy arranged for the fsrw write speed to be double the read speed...
I'm open to other ideas. Thanks for the reply.
If you want to look at reading/writing directly to/from the SD card, take a look at ZiCog. We create 8x 32MB blank files in the PC. They are contiguous files. ZiCog then reads the location of the start of each of the 8 files and uses that as the base for the 8MB disk files (yes, they are created as 32MB but currently we only use the first 8MB). ZiCog originally used the femto routines, and then later I changed it to use fsrw2.6. This works fine on the RamBlade hardware that I designed, and I overclock to 104MHz.
I also use Kye's FAT driver in my version of PropOS which was derived from KyeDos by Dracula. Both can be searched on the forums. Kye's FAT driver (modified) is also used in Catalina C.
Kye's version is more pedantic that the SD card is formatted correctly.
There have been various problems with some SD cards. We have never found out why. The best seem to be SanDisk.
If speed is an issue, I can think of a few things that I could do, one of which is to insist on contiguous files. A simple way to do that is to reformat the card then write all files in one session (on the desktop). That's more than I want to ask a user to do. Another way is to use low level OS-specific drivers to access the SC card. That has several issues (e.g. portability and risk of damaging a HD if the user enters wrong information). I thought of caching the cluster strings in a non-contiguous file to avoid having to access the FAT, but I need to be able to accept up to a 100MB file, and worst case I couldn't fit all the clusters in memory.
BTW, does anyone know if contiguous files are guaranteed, even starting from a fresh format? Couldn't the wear leveling decide to use non-contiguous clusters, or is that mechanism a level below (hidden from the user) the LBA?
You could have a single cog dedicated to reading the data from the SD card and writing it into hub RAM. This cog could talk directly to the SPI driver cog, or you can even build the function into the SPI driver. You could have the function read from SD and write into a circular buffer in hub RAM. The routine could monitor the read index for the buffer to ensure that it doesn't overrun the cog that is consuming the data.
The high-level requirements are that I need to support individual files as large as 100MB, and as small as just a few kB. I expect upwards of 16 GB total data on the SD card. I need to support the ability to add/delete individual files. I would prefer that if I want to change one file, I don't have to reformat and write all content on the SD card. Being able to update the SD card directly from a desktop, as well as LAN is also desired.
Do this:
Format the SD card and then save a BIG file on it that is sized to what you will need on the disk. Since FAT32 supports up to 2^32-1 file sizes... make a file that big. Do this on your 16 GB SD card.
MAKE SURE YOU FORMAT THE CARD WITH THE LARGEST CLUSTER SIZE POSSIBLE! (Greatly increases the speed).
Now, any access to that file from FSRW are guaranteed to be continuous because you created the file right after formatting the card. The large cluster size means that there will be very little FAT look up (which decreases speed).
Sequential reads and write will be very fast if you do this.
....
Now, if you want to go faster, then you'll need to hack FSRW to return the block address of the beginning of the file on the SD card. Once you have that, you can pass that block address to the low level block driver functions bypassing the FAT. This will then allow you to hit the advertised speed FSRW quotes on the OBEX page. Note that this technique only works for contiguous files.
EDIT: I suggest you use Kye's suggestion first. If that works out to be fast enough then you won't have to write any special code. Otherwise, you would just need to write optimized versions of pread and its supporting functions until you get the speed you need.
The trick is to also not do the FAT lookup too. Which the file system will do normally, this is why you have to go through the low level block access functions for the best speed.
A close second would be to build an optimized cluster list in memory when first opening a file. I will have to see if even infrequent non-contiguous blocks would cause enough slowdown to cause buffer underflows. If that isn't a problem, the only downside is that with enough user modification to the files, fragmentation would eventually result in a slowdown and/or inability to store the entire optimzed cluster list in cog memory. A check-on-opening operation could detect that, and a defrag operation would fix it.
Thanks again for the creative ideas. I have to build a prototype and test it now. Is there any reason that SD card reads would "stall" - as was mentioned for writes? (not slowdown due to non-contiguous blocks) I'm not sure why the writes can stall so can't even guess what might happen with reads.
It wasn't a stall in the sense of halting; it was just an extra long write delay. I'm assuming it had to do with having to pause and pre-erase some large block of sectors, but that's only a guess.
In all my hacking around, I have never seen that happen on reads; read times have always been consistent.
@args - The reason for the large file is because you want the FSRW to stay in multi-block mode. It only can stay in read multi-block mode as long as the next sector to get is equal to the current sector+1. This means you have to have one large file and not use the file system to read through the file but instead use the low-level FSRW block access functions.
Any type of system that doesn't just read the file continuously will be slower.
I would need to "restart" a new multi-block read to access a different "sub-file" contained within the large file. And if I ended up deleting every other "sub-file" in the "container" file, and then wanted to add new files just a little bit larger than the deleted files, I would end up running out of space on a half-full SD card.
Am I missing something?