What is the current state of SD card interface driver implementations?

ags · 2013-03-07 11:25

I've spent days researching fast SD card drivers discussed and/or available on this forum. I have enough information to be dangerous and/or confused. If there is another source that consolidates all the various implementations/threads into summary form I'd appreciate a pointer. If not, it may help not just me, but others to provide an overall landscape of what's been done and what is available. Here's what I've spliced together:

(One of) the original fast drivers, fsrw by Tomas Rokicki and later Jonathan Dummer (now v 2.6) has been updated to support FAT16/32. There are at least three PASM driver implementations underlying the SPIN high-level interface. I'm not sure how to select from them. One is "spi_safe". This driver uses single instruction writes and 2 instruction reads (byte and block reads). Does that mean the others are not, and if so, why not? There are also "mb_spi" and "mb_rawb". It looks like "mb_spi" was at one time using a 2 instruction per bit read loop, but that is now commented out and uses a single instruction per bit read loop like "mb_rawb" uses - but for single-byte reads "mb_spi" still uses a 2 instruction per bit loop while "mb_rawb" uses a single instruction per bit read loop. "mb_rawb" supports ReadAhead/WriteBehind. The README in the archive is out of date (not complaining, just sayin') and refers to driver files "sdspi", "sdspiasm", "sdspifasm" and "sdsipqasm" - which no longer seem to exist.

There is another implementation, and even an AppNote on ParralaxSemi, by Kye which appears to be a full-featured FAT16/32 implementation, not as fast as fsrw but a more complete FAT implementation.

There are threads on how to get maximum SPI read/write performance, and various implementations. kuroneko (self-admitted counter tinkerer "because I have a thing with counters...") has made contributions, along with others.

There are many more people that have contributed, and apologies if I've missed out on other significant contributors/contributions. This is not intended to be an exhaustive list, just trying to summarize what I've learned in the past week.

Questions:

Is there a recommended driver to use with fsrw?

Is there any further explanation available on the problems (and perhaps solutions) encountered with the single instruction per bit read methods, or ways to decide when "unsafe" situations occur and how to avoid (of course I still want the fastest reliable driver speed possible) Are there other methods (presumably using counters) to implement fast SPI read and write?

Have I missed something (else/new/improved)?

Thanks.

Mike Green · 2013-03-07 12:35

sdspiFemto.spin is an older low level driver that's part of the FemtoBasic series of Basic interpreters. It combines a low level SD card driver with an I2C driver and can handle overlapped buffered sector I/O as well as overlapped buffered I2C I/O. Because of its generality, it's relatively slow compared to Kye's or Rokicki's low level drivers. FSRW 2.6 has been modified (fsrwFemto.spin) to work with sdspiFemto.spin.

msrobots · 2013-03-07 12:39

Having tinkered around with both Filesystems for a while now for my RAISD-System. I can add some more pieces to the confusion.

save_spi is called save because of the slowed down read. What I could gather out of the Forums is that the faster read routine is working but makes problems with some pin-combinations.

So the recommended driver for FSRW is save_spi. All 3 driver from @lonesock (jonathan Dummer) support multi-block read and write and have one complete sector-buffer in the cog to support read-ahead and write behind.

sdspixx is a driver used in FemtoBasic and as far as I remember combines SPI and I2C in one pasm-cog but does not use the multi-block mode of the sd-cards.

Kye's FAT_Engine supports Directories and FSWR does not.

Kye's FAT_Engine supports RTC's and is correctly setting creation-date, last change and last access in the directory-entries. FSRW can be tricked into using a RTC but just sets creation-date.

There is a patch out for Kye's Fat_Engine (kyefat_lfn) that supports READING long filenames (as in iterating thru the directory entries) but not writing them.

The main problem with both Filesystems is that SPIN is way slower than the pasm-driver so all the speed gained by fast multi-block access is lost in handling the cluster and directory access in spin.

And both need a lot of HUB memory.

@MPARK'S spinx uses a older version of FSRW and 2 or 3 pasm-cogs to leave more HUB memory free. I need to look into that a bit more.

Did you missed something? YES! (shameless plug)

There is my RAISD-System. A redundant array of independent SDcards. As addon for the Quickstart the RAISD QS-KIT provides you with 4 SDcards. Available at www.propellerpowered.com.

Enjoy!

Mike

Rayman · 2013-03-07 13:56

fsrw has been around a long time and is still the fastest implementation, last time I checked.

But, Kye's version has more features, so if you don't need the fastest speed, that is worth looking at.

jstjohnz · 2013-03-07 14:02

I asked the same question recently, re "safe-spi". The answer was that in testing the single-instruction-per-read code, it was found that it would work with most propeller chips, but would fail with some. I believe that testing was not with an sd card but with some other spi device. I personally have not tried it, but it would be nice to have that extra speed when writing. In my case it's communicating with a wiznet ethernet module.

ags · 2013-03-07 14:03

For my project (at least for now) I will focus on read-only, as fast as possible. I won't need much FS support so I will start with the fsrw implementation.

So the question becomes: what was/is the problem with the single instruction read that makes it (sometimes) "unsafe" - and has anyone spent much time trying to address the problem reliably?

ags · 2013-03-07 14:06

@jstjohnz: I think even the safe implementation uses single instruction writes. The challenge is with the single instruction reads. It would be very helpful to understand if the "sometimes" it didn't work was due to Propeller variation (and is it with 80 or 100 MHz operation?) or SD card variations. Your post suggests the former, which for me would be more tolerable than random failures based on which SD card is used.

lonesock · 2013-03-07 14:36

Writes were all fine at 1 instruction per bit. The read errors seemed to have to do with the cog running the block driver, coupled with the pins running to the SD card. I have had multiple boards work just fine with single-instruction reads, everything from a proto-board at 80 MHz with 8" long ribbon cable hanging off it, to small PCBs with SMT components running at 100 MHz clock.

The only board I got to fail was also a PCB with SMT components, traces were all fine, no excessive noise coupling or anything...it just would fail to read some of the time. It would work just fine if the PLL was set to 8x, but would fail at 16x. I don't remember what pins were chosen. The failures occurred on 3 of this board, so it was probably not due to a bad prop chip.

The upshot was, if you were doing it for a specific board, test it and go. But since the FSRW block driver was supposed to work in all cases, it was safest to just use 2 instructions per bit read. The total throughput was still pretty high. [8^)

Jonathan

prof_braino · 2013-03-07 14:40

ags wrote: »

I've spent days researching fast SD card drivers discussed and/or available on this forum. I have enough information to be dangerous and/or confused. If there is another source that consolidates all the various implementations/threads into summary form I'd appreciate a pointer.

Do you really need FAT32? (You don't necessarily need FAT32). If you allow the SD card to be INTERNAL to the prop, you can relay on your prop application interface to handle reading and writing data to the SD. (This is how any device with memory soldered in treats it, like an Ipad, etc). If you can skip the FAT32, the drive is much smaller and much faster. The SD can be treated similar to the way EEPROM is used, just a bunch of raw pages. In this model, the transfrer speed of the SD is about as fast as the prop can handle. So no waiting, aside from the how ever long it takes for the prop to do the transfer anyway. AND the application can deal with a single 512 byte page as a time, so the transfers are spread over the entire processing time.

Of course, if your primary requirement is physically removing the SD from the prop and placing it in the PC, you are stuck.

ags · 2013-03-07 14:45

Thanks Jonathan, that's the info I was hoping someone (you) could share. It might sound trivial, but did anyone consider if the failure was due to the difference in loading between the CLK and DO (MISO) pins? If I understand the design intent and timing correctly, just moving the edges relative to one another might be an issue (but I have no idea exactly where in the single clock period the counters actually drive the A/BPIN, and when exactly the A/BPIN values are sampled and suspect that others have already thought of that).

ags · 2013-03-07 14:47

prof_braino wrote: »

Do you really need FAT32? ...
Of course, if your primary requirement is physically removing the SD from the prop and placing it in the PC, you are stuck.

Yes, I'm stuck - sort of. I could always write my own low-level drivers to create a custom FS on the SD card from a PC. But I don't have that knowledge/experience at this point (never stopped me before, though).

prof_braino · 2013-03-15 07:38

ags wrote: »

Yes, I'm stuck - sort of. I could always write my own low-level drivers to create a custom FS on the SD card from a PC. But I don't have that knowledge/experience at this point (never stopped me before, though).

If you feel you are stuck for lack of alternatives, maybe there are some.

In Propforth, we have a few simple commands that do SD access (same exists for EEprom access). Create a file, read a file, list all the files. Read and write an individual block can be extracted and used in you application. The result is it looks like you have files (because you list them with an "ls" command from the command line), but is really just reading and writing individual blocks, no FAT32. Very fast. This method might be appropriate for a microcontroller with 32k ram and 32 I/O line total.

Dave Hein · 2013-03-15 09:33

ags wrote: »

Yes, I'm stuck - sort of. I could always write my own low-level drivers to create a custom FS on the SD card from a PC. But I don't have that knowledge/experience at this point (never stopped me before, though).

You never said what your speed requirement is. Is the safe_spi.spin driver not fast enough for your application?

ags · 2013-03-15 11:33

prof_braino wrote: »

If you feel you are stuck for lack of alternatives, maybe there are some.

In Propforth, we have a few simple commands that do SD access (same exists for EEprom access). Create a file, read a file, list all the files. Read and write an individual block can be extracted and used in you application. The result is it looks like you have files (because you list them with an "ls" command from the command line), but is really just reading and writing individual blocks, no FAT32. Very fast. This method might be appropriate for a microcontroller with 32k ram and 32 I/O line total.

I'm comfortable with the low-level access to the SD card from the Propeller (not requiring any file system). My concern is how to get data on the SD card from a PC. If I use that method, l would need to write a desktop application that would be able to write to the SD card directly (no file system, just sector writes). I checked out the Propforth pages on google code, is that what you are referring to? I was looking at readme files under "Propforth-08 SD". That is running on the Propeller, not a desktop (correct)?

While I have some ideas on how to create the desktop app, it worries me if I were to distribute it. Seems that a mistake (user specifying the wrong drive (like a hard drive instead of the SD card)) would have disastrous consequences. From what I've seen so far, direct access to raw drive data requires admin permission, and if I overwrote the FAT... well, you know.

Did I understand your reply correctly?

ags · 2013-03-15 11:37

Dave Hein wrote: »

You never said what your speed requirement is. Is the safe_spi.spin driver not fast enough for your application?

Design isn't done, but I expect I'll need to read at about 5Mbps (and do some minimal processing and then send out to hub RAM)

Dave Hein · 2013-03-15 12:45

That's 5 Mega-bits/second, right? The description for FSRW 2.6 says it can read at 900 Kbytes/sec, which would be 7.2 Mbps, so it seems like it meets your requirement. It also says that it can do 1.8 Mbytes/sec on write. Of course, writes to an SD card can stall at times, so the 1.8 Mbytes/sec would be a peak speed.

On read, the standard FSRW will introduce some compute overhead because it has to compute sector and cluster addresses, so your best bet would be to call the SPI driver routines directly to avoid the FSRW overhead. You can simplify the sector addressing by ensuring that all the sectors are contiguous. You can do this by using a freshly formatted SD card when writing the data from a PC. On the Prop, you just need to open the file and get the address of the first sector of the file. You'll probably need to add a method to FSRW to return the sector address, but it's fairly easy to do.

ags · 2013-03-15 14:34

Dave, very good points. I won't know until I have a prototype (hw) and some code to test, but I think that speed will be an issue. I have sustain overall data throughput rates of about 5 Mbps consistently (bursts are not enough), and that will entail reading data from the SD card (into a the "master" cog RAM), evaluating, some simple manipulation/processing, and finally dispatching it out to helper cogs (through hub RAM). To accomplish that, I expect that I will have to be able to read in bursts at speeds > 5Mbps in order to have the slack to accomplish the rest.

I have been looking at ways to ensure contiguous files on the SD card as you mention (potentially having no file system, just a simple way of knowing the start/end cluster for a set of contiguous "files". That poses some problems for the desktop user (the user creating the data and loading the SD card). Requiring that the SD card is reformatted and written with all necessary files every time there is a change may be a usability issue. Writing my own driver to support a custom "file system light" will be extra work (multiplied by different OSs to support) and could be risky (imagine a user of my SD writing app specifying drive C: instead of G:...).

It is unfortunate that I need to read, not write. Murphy arranged for the fsrw write speed to be double the read speed...

I'm open to other ideas. Thanks for the reply.

Cluso99 · 2013-03-16 02:30

ags:

If you want to look at reading/writing directly to/from the SD card, take a look at ZiCog. We create 8x 32MB blank files in the PC. They are contiguous files. ZiCog then reads the location of the start of each of the 8 files and uses that as the base for the 8MB disk files (yes, they are created as 32MB but currently we only use the first 8MB). ZiCog originally used the femto routines, and then later I changed it to use fsrw2.6. This works fine on the RamBlade hardware that I designed, and I overclock to 104MHz.

I also use Kye's FAT driver in my version of PropOS which was derived from KyeDos by Dracula. Both can be searched on the forums. Kye's FAT driver (modified) is also used in Catalina C.

Kye's version is more pedantic that the SD card is formatted correctly.

There have been various problems with some SD cards. We have never found out why. The best seem to be SanDisk.

ags · 2013-03-18 08:21

@Cluso99 - I understand the low-level formatting and use of SD cards on the Prop side. Followed the link in your sig for ZiCog, and what I found was .spin files (did I look in the right place?)

If speed is an issue, I can think of a few things that I could do, one of which is to insist on contiguous files. A simple way to do that is to reformat the card then write all files in one session (on the desktop). That's more than I want to ask a user to do. Another way is to use low level OS-specific drivers to access the SC card. That has several issues (e.g. portability and risk of damaging a HD if the user enters wrong information). I thought of caching the cluster strings in a non-contiguous file to avoid having to access the FAT, but I need to be able to accept up to a 100MB file, and worst case I couldn't fit all the clusters in memory.

BTW, does anyone know if contiguous files are guaranteed, even starting from a fresh format? Couldn't the wear leveling decide to use non-contiguous clusters, or is that mechanism a level below (hidden from the user) the LBA?

Dave Hein · 2013-03-18 09:06

The SD wear leveling code is hidden from the user. A file created on a freshly formatted card (or even one where all the files have been deleted) should alway consist of contiguous sectors. I still don't fully understand your requirements, but it seems like you should be able to create a 100MB file on a clean SD card, and then do all your I/O within that file. I don't know if you need to create multiple files, since your PC app could just rewrite the contents of the 100MB file without creating a new file.

You could have a single cog dedicated to reading the data from the SD card and writing it into hub RAM. This cog could talk directly to the SPI driver cog, or you can even build the function into the SPI driver. You could have the function read from SD and write into a circular buffer in hub RAM. The routine could monitor the read index for the buffer to ensure that it doesn't overrun the cog that is consuming the data.

ags · 2013-03-19 16:27

You are correct to point out that requirements are not clear. I'm currently in the early design phase. I start with an "I want it all" goal, then prioritize and reduce down to an implementation that is realizable and optimizes the most important features. I'd like to be able to read full-speed from an SD card with no special file system/structure. That would allow any PC to write/copy the files. I'd prefer not to have a custom desktop application to write a special format. If I can sustain the required speed with FAT32, that would be great. I won't know that until I have a prototype, but rough calculations put it on the edge of being possible.

The high-level requirements are that I need to support individual files as large as 100MB, and as small as just a few kB. I expect upwards of 16 GB total data on the SD card. I need to support the ability to add/delete individual files. I would prefer that if I want to change one file, I don't have to reformat and write all content on the SD card. Being able to update the SD card directly from a desktop, as well as LAN is also desired.

Kye · 2013-03-19 17:46

Look, if you want the absolute fastest possible read speed, then use the FSRW driver just like it is in the OBEX.

Do this:

Format the SD card and then save a BIG file on it that is sized to what you will need on the disk. Since FAT32 supports up to 2^32-1 file sizes... make a file that big. Do this on your 16 GB SD card.

MAKE SURE YOU FORMAT THE CARD WITH THE LARGEST CLUSTER SIZE POSSIBLE!

(Greatly increases the speed).

Now, any access to that file from FSRW are guaranteed to be continuous because you created the file right after formatting the card. The large cluster size means that there will be very little FAT look up (which decreases speed).

Sequential reads and write will be very fast if you do this.

....

Now, if you want to go faster, then you'll need to hack FSRW to return the block address of the beginning of the file on the SD card. Once you have that, you can pass that block address to the low level block driver functions bypassing the FAT. This will then allow you to hit the advertised speed FSRW quotes on the OBEX page. Note that this technique only works for contiguous files.

Dave Hein · 2013-03-19 17:56

If you don't want to impose a contiguous-sector requirement, you could pre-read the cluster addresses into memory. If the SD card is formatted with 32K clusters a 100MB file will contain 3,200 clusters. If use a 2Gig SD card (or restrict your use to the first 2Gig of a larger card) you only need 16 bits per cluster address. That would require a 6,400-byte cluster table for a 100MB file.

EDIT: I suggest you use Kye's suggestion first. If that works out to be fast enough then you won't have to write any special code. Otherwise, you would just need to write optimized versions of pread and its supporting functions until you get the speed you need.

Cluso99 · 2013-03-19 17:59

ags: As Dave suggested, and as we do with ZiCog/CPM, you can create a 100MB file(s) and initially they wil be contiguous. However, once you delete any files and create files, you run the risk of them not being contiguous. If it is speed you require, then you may just have to initialise the card to FAT32 and create a file the maximum size you require in total, and look after where the files are located. i.e. you totally control the SD card's format but just bury it within a FAT32 file. There is a limit of the file size but I cannot recall what it is.

Kye · 2013-03-20 06:50

Having the file be continuous means that multiblock mode on the SD card never stops. This increases the read speed.

The trick is to also not do the FAT lookup too. Which the file system will do normally, this is why you have to go through the low level block access functions for the best speed.

ags · 2013-03-20 10:07

Thanks for the ideas. The one that seems to be the most flexible without requiring low-level (non-FAT32) access from a desktop application is building a file system on top of a file system (by creating large (i.e. 2^32-1 byte), empty contigous files in the FAT32 structure, then managing the contents of that file with additional structure). It does mean only a custom app can manipulate the files inside the large "container" files, and normal drag-and-drop file operations supported by the OS will work, but only data in the small files properly constructed by the custom app and placed in the large "container" files will be visible on the Prop board.

A close second would be to build an optimized cluster list in memory when first opening a file. I will have to see if even infrequent non-contiguous blocks would cause enough slowdown to cause buffer underflows. If that isn't a problem, the only downside is that with enough user modification to the files, fragmentation would eventually result in a slowdown and/or inability to store the entire optimzed cluster list in cog memory. A check-on-opening operation could detect that, and a defrag operation would fix it.

Thanks again for the creative ideas. I have to build a prototype and test it now. Is there any reason that SD card reads would "stall" - as was mentioned for writes? (not slowdown due to non-contiguous blocks) I'm not sure why the writes can stall so can't even guess what might happen with reads.

David B · 2013-03-20 11:34

In my endless quest to better understand SD cards, I once ran a test of timing raw writes to sequential sectors, and saw a periodic extra long write time of something like tens of milliseconds. It didn't show an exact pattern, but it was almost exactly periodic, happening about every 512 sector writes.

It wasn't a stall in the sense of halting; it was just an extra long write delay. I'm assuming it had to do with having to pause and pre-erase some large block of sectors, but that's only a guess.

In all my hacking around, I have never seen that happen on reads; read times have always been consistent.

Kye · 2013-03-20 17:22

You are correct David, the SD card does have to preform the erase some time.

@args - The reason for the large file is because you want the FSRW to stay in multi-block mode. It only can stay in read multi-block mode as long as the next sector to get is equal to the current sector+1. This means you have to have one large file and not use the file system to read through the file but instead use the low-level FSRW block access functions.

Any type of system that doesn't just read the file continuously will be slower.

ags · 2013-03-20 18:28

@Kye - yes, understood. The purpose is to have an entire file readable as a consecutive set of blocks. So if I wanted to have 4 distinct files, I could come up with my own way of managing them inside the larger container file. That way, if I wanted to add/delete files, I could manage the process of block selection (within the context of the large "container" file) so that I didn't end up with fragemented (non-contiguous) files). The value here is that I could create a desktop app and not need low level drivers to access raw disk I/O. From that perspective it would all look like just writing a custom-content (huge) file.

I would need to "restart" a new multi-block read to access a different "sub-file" contained within the large file. And if I ended up deleting every other "sub-file" in the "container" file, and then wanted to add new files just a little bit larger than the deleted files, I would end up running out of space on a half-full SD card.

Am I missing something?

Kye · 2013-03-21 07:23

You got it all.

What is the current state of SD card interface driver implementations?

Comments