Simple file system for flash
Rayman
Posts: 14,826
Here's my current thoughts on a simple file system for SPI or SQI flash.
In OBEX, you can find a file system by Mike Green that is brilliant, but
won't allow me to create large contiguous data files.
What I've tried to do here is take his ideas and combine them with my requirements...
The flash chips that I know about are easy to read from, but complicated to write to...
First, the data must be in the erased state in order to write. The smallest amout you can
erase is a 4 kB sector. And, the smallest amount you can write is 256 bytes.
I'm taking Mike's approach of not having a fat table, but instead writing file information to the
beginning of a 4 kB sector. The things I'm adding are contiguous files (he has file info
on each sector, I'm just putting it on the start sector of a file), long file names, and arbitrary file
information data. The idea here is that the header info could be very small if you want, or it can be very large if needed.
So, a sector that starts a file will have this header structure:
4 bytes = "~FS1" 'just a code for the system, to distinguish in most cases from real file data
4 bytes = filesize in bytes
2 bytes = #of sectors reserved for file (maybe you want to reserve extra space for this file)
2 bytes = offset to filename (allows arbitrary sized additional header info after this header)
2 bytes = offset to filename extension (allows quick access to the extension of the file)
2 bytes = offset to data (allows quick access to file data)
For wear levelling purposes, a file could start towards the end of memory and wrap around to the beginning.
I think this provides enough flexibility for use in most instances that I can think of.
If anybody has ideas, I'd love to hear them.
In OBEX, you can find a file system by Mike Green that is brilliant, but
won't allow me to create large contiguous data files.
What I've tried to do here is take his ideas and combine them with my requirements...
The flash chips that I know about are easy to read from, but complicated to write to...
First, the data must be in the erased state in order to write. The smallest amout you can
erase is a 4 kB sector. And, the smallest amount you can write is 256 bytes.
I'm taking Mike's approach of not having a fat table, but instead writing file information to the
beginning of a 4 kB sector. The things I'm adding are contiguous files (he has file info
on each sector, I'm just putting it on the start sector of a file), long file names, and arbitrary file
information data. The idea here is that the header info could be very small if you want, or it can be very large if needed.
So, a sector that starts a file will have this header structure:
4 bytes = "~FS1" 'just a code for the system, to distinguish in most cases from real file data
4 bytes = filesize in bytes
2 bytes = #of sectors reserved for file (maybe you want to reserve extra space for this file)
2 bytes = offset to filename (allows arbitrary sized additional header info after this header)
2 bytes = offset to filename extension (allows quick access to the extension of the file)
2 bytes = offset to data (allows quick access to file data)
For wear levelling purposes, a file could start towards the end of memory and wrap around to the beginning.
I think this provides enough flexibility for use in most instances that I can think of.
If anybody has ideas, I'd love to hear them.
Comments
I think that would work nicely, however I would get rid of the extension - if someone wants an extension, it can be part of the file name.
I would also like to see directories, however I don't think nested directories are absolutely necessary at this point.
Thanks for looking. I'm trying to code it up now and see how well it works...
The file extension is not a seperate entity, you just provide a pointer to where it starts within the filename string.
That saves time parsing the filename string so you can easily cycle through a given file type,
like pictures, fonts, etc.
Directories is one of the optional extras I've tried to include via the variable header size.
So, after the regular header, one can extra info such as directory, checksum, permissions, or anything else...
How about the searching of files? Does it mean, e.g. the program will need to scan the flash for header code "~FS1" to get a listing of files? Maybe allocating a 4k byte as uFAT table to store pointer to long filename & file location?
You are most welcome!
Ok that makes sense... so it just points to whatever is after the dot, if anything; and can be 0 (or other flag value) to indicate no extension.
A fat table would be another way to go. I suppose you could use the actual Windows FAT structure.
But, I'm trying to keep things simple here, while at the same time making it flexible and providing long filename support as well as guaranteed contiguous files...
Since the flash is relatively small (compared to SD), 2 MB or so, it doesn't take long to scan for files...
BTW, I'm changing the descriptor to "µFS1" because the mu character is outside regular ascii making it more unique...
Unless I missed it someplace:
file deletion? you'll need to release the 4K sectors and mark them deleted (or eligible for reuse) Reuse gets trickier you either need to track the empty chunks, do some sort of 'best-fit' search for new files or just lose the space until you compact the flash.
compaction? If you are going to be deleting files at some point you'll want a utility to compact the data to the lower part of the FLASH to open up a big block at the end for new files otherwise your flash is going to get fragmented and unusable.
Thanks for looking into this. I have an extra 2MB of flash on SpinSocket-Flash and GameBaby reserved for just this idea. The other 2MB are for user programs
I'll be happy to help you test. It may be some advantage to have a flash-driver plug-in interface similar to what David has specified for his basic. The cache_interface.spin is an API that can talk to the flash cog driver. It has an API for flash programming and data access.
I think the way it will work is that you scan the first 4 bytes of each sector starting at sector 0, looking for the "µFS1" descriptor.
When you find one, that's the start of a file and you read in how many sectors it has reserved for itself.
Then, you jump to the following sector and read the sector descriptor.
Flash in the erased state will read as $FFFF_FFFF, so that is a good indicator that the sector is ready for writing.
If it doesn't read $FFFF_FFFF and doesn't read "µFS1", then it is somehow corrupted and needs erasing before use.
It may be useful to have a "defrag" utility to compact the files. One idea is have the starting sector for the defragged data
be anywhere, not just the first sector, so as to achieve wear leveling.
jazzed, sounds great.
But, now I'm thinking about it again and here's the latest thoughts:
I'm contemplating putting a number for the offset to the start of the filename, instead of having the filename start at a fixed location. That may make it more flexible...
The main reason for doing my flash file system the way it was done is to avoid wearing out sectors. Allocation tables and directories are the worst offenders since they change with every file written and it's fairly hard to implement them so they're movable. If you keep a file length in the allocation tables, this has to be updated every time you add to the file (open, add, close) and that requires a read/rewrite of the whole segment. By keeping the file specific information in the file itself, it's easy to relocate the file or just pieces of it. This requires a search of the flash memory, but reads are quite fast and only a few bytes have to be read from each sector to see if this is the one desired. An algorithm is used to pick a starting point for searching based on the file name and this helps distribute files around the flash address space.
My only problem with it is that it didn't allow for large, contiguous data blocks.
The flash chips I'm using have a specified endurance of 100,000 cycles.
I really can't envision ever getting even close to that...
Actually, I mostly think of the flash being read from often and only occasionally written to...
I had really envisioned the file system for sequential data usage and decided that there was low enough overhead in having a 4K segment that was effectively only 4080 bytes long with 16 bytes of overhead. There's no big reason why you couldn't have an 8K segment with two flash memories in parallel with a segment effectively 8192 - 16 = 8176 bytes long.
A 100,000 cycle limit is only a problem if you use the same block over and over again. That potential for wear-out occurs any time you don't track usage and move blocks when necessary. My scheme doesn't compensate for the case where the same file name is created over and over again.