TAR for SD and flash
Rayman
Posts: 14,665
I've been thinking about how to implement a filesystem for flash and also how to use long filenames on SD...
It just occurred to me that perhaps a .tar archive would allow both of these...
Actually, what I was thinking about for a flash filesystem isn't much different than the tar system.
So, the entire flash contents would just be one .tar file...
This might also be useful with FSRW to allow long filenames and directories...
It just occurred to me that perhaps a .tar archive would allow both of these...
Actually, what I was thinking about for a flash filesystem isn't much different than the tar system.
So, the entire flash contents would just be one .tar file...
This might also be useful with FSRW to allow long filenames and directories...
Comments
Other operating systems have long file names...
The file I'd be used on SD would be called something like "app1.tar" and so would be in 8.3 format and FSRW would read it just fine.
But, within the tar would be files with long filenames. I think there is no rights issue here...
And, the more I think about this, the more I like it. In particular, I'd like to bundle all the files required for an application into one file (things like icon, program, data, bitmaps, etc.). Tar provides an easy and standardized way to do this.
Plus, I'd be able to read these files on a PC using WinZip or WinTar.
Not only that, it gives me a common storage format for both SD and flash...
There was even one scheme I especially liked that was used on a ramdisk. It didn't have a fat table per se, only a table that had nothing more than two entries for each file, where it started, and how long it was. This way, no memory was wasted, only the exact amount of space that was used by the files copied onto the ramdisk. It could only be loaded once, if you wanted to load more files into it, you had to unload it and begin again. It obviously wouldn't work well for files that are going to be edited, data files and the like, but it was primarily for executable programs, especially those you ran a lot. By loading them into this ram drive, it reduced load times, and didn't use up any more memory than it would having the programs loaded for execution anyhow. Something like this tar file system would be useful, and with the removal of 512-byte blocks, the same thing could easily be implemented here. As mentioned, it'd require some sort of compact utility, but that's nothing different from programs that defrag your hd anyhow, so I say go for it. I'd use something like this. It's always irritated me that as smart as computer people seem to be, nobody could ever come up with a way not to waste disk space when saving files.
I mean, if I added up all the wasted space on my hd today, it would probably come to more than my entire 20MB hd I had in my first computer.
Just seems a waste to me.
Anyway, I'll be keeping an eye on this one, and if possible, I'll even contribute if I can manage to understand it well enough to do so.
The point in any program is that you have to find the right balance! For example the usage of RAM or disc space versus runtime. If you increase granularity of filesize, you have to increase the complexity of the code! For example the hardware like HDs and SDs work with sectors. If you don't want to waste a byte you'd have to read a sector and shift the sector so that the beginning of the sector which might belong to another file is not part of the data you find in your load-buffer.
Also the clustering like it's used in FAT has been introduced to keep management data size and runtime smaller at the cost of wasted HD space. But even there they did something clever, as you as the user can decide of how bad the real-file-size to used-HD-memory ratio is because you can decide on how big the clusters shall be. So, you have a lot of small files -> keep cluster size low - you have a lot of big files -> increase cluster size!
The problem with FAT32 nowadays is, that the media-size is too big to have smaller cluster-sizes.
This will help me with one goal of being able to use Sphinx to compile using the original .spin files...
I'll just use WinTar to put the whole library folder onto SD...
If you just want to support long file names you could just maintain a file in each directory that maps long file name to short names. Your OS could use either the short names in the FAT directory, or the long names in the mapping file.
http://gnuwin32.sourceforge.net/packages/gtar.htm
or, I see everywhere that 7-zip supports tar too...
Just tried 7-zip and was able to create a tar file of the Prop Library. It is ~ 3MB..
BTW: I think it was really WinRar that I remembered using. Just tried that one, but it doesn't let me create .tar files...
This means that you have to iterate through the whole file if you want to read the last file!
You need FAT code AND TAR code to make it happen (if the SD card should still be readable by a PC)! So, I think your time is spend much better if you add directory support to the SD card driver (which should be easy, as a subdirectory has the same structure as the root directory) and live without long filenames.
I use a SWAP-file to avoid having tons of image-files on the SD-card. Of course this SWAP has to be generated in a way that's not as easy as 'copy it to the SD', but after that it can be bundled with the software. For generating I simply use small SPIN code which includes the data and writes it to the right location inside of the SWAP-file. Would also be possible to create a tool for that. - just like TAR ... that's also a tool.
The access of the SWAP-file is done by a hand full of functions added to the FSRW and does not have big overhead, as it is sector-based (no FAT involved).
First, because I only envision there being a handful of files. The headers tell me where the next file starts.
I'll just use the "seek" command to get the next header. It should only take milliseconds to scan the contents.
And, if this does turn out to be too slow for some app, one could just create their own table of offets to files.
I think this may actually be faster than the regular way of switching between files with FSRW... If you want to open a new file now,you first have to "pclose" and then "popen" a new one. This has got to take some time. Inside a TAR, all you have to do is a "Seek"...
It's not clear how I can do this without recreating the entire file...
What I'd like to try is erasing the file the way DOS and Windows do, by replacing the first letter of the filename with $E5
http://www.diydatarecovery.nl/kb_undelete_article.htm
Interesting thing to see is how 7-zip and Windows will react to this...
Re this and also your nifty early spin editor. I've been playing around with this sort of thing as well. Fundamentally, there isn't enough room for a program and an operating system in the propeller (I see you are out of cogs on the VGA driver). But there are other solutions. I've built both Kyedos and the touchscreen operating system around one line of code that Kye included in his SD driver
With this, each program can chain the next one.
So, rather than having a whole lot of operating system routines that stay resident (and which invariably grow), every program includes the operating system object, or at least, the bits of the operating system objects that it needs. So you have the main startup screen, with either a flashing cursor (kyedos) or some icons (touchburger), and if you run a program, it chains the next program (eg your text editor). And then there is another routine which is a warm boot, which is a single line call to go back to the main operating system.
Each program then gets the full 32k of the propeller chip. And with external ram, data can be moved off into ram so most of the 32k can be filled with code.
Each program is compiled to a .bin (or .exe) file and stored on an SD card, and so you can grow an operating system piece by piece. So, rather than a directory listing program being part of the OS, that is a chain to another .bin program. That keeps the OS smaller.
I'm not sure how to solve the long filename problem.
A file system for flash sounds interesting. The CP/M file system may be worth looking at. It is simple enough that it would be possible to code it in spin. The N8VEM group recoded CP/M so it worked with a 512k ram chip and hence was a very fast ram disk. CP/M is also where the $E5 comes from.
Coding a text editor may require a decent amount of ram to contain all the text. It probably is the same problem as a cache, so the work done by the GCC group could be useful. Then the actual type of memory may not matter - you have the abstract concept of a 1 megabyte block of text and a small bit of it happens to be visible on the screen, and changing one character is a cache write, and scrolling the page is a cache read.
What would really take the propeller chip somewhere interesting would be both a text editor, and a spin compiler. These could well be separate programs, even written by different people. But the combination of both would mean you would not need a PC any more. So close http://code.google.com/p/open-source-spin-compiler/ Could that C code be compiled using GCC to run on the propeller itself?
Looking at the .tar format at the moment...
TarTest - Archive [Date 2013.02.02 Time 12.50].zip
I think this is going to work nicely... You can see here how easy it is list file contents...
Couple interesting things:
1. 7Zip creates .tar files with what appears to be an old format, and not the newer POSIX style.
This is a little dissappointing, but not a big deal.
2. The filename includes the directory name. This could be a minor problem, since there are only 155 bytes
for the filename in this old style. But, not too bad.
3. The end of the file seems to have 1,024 bytes of unused space.
Not really sure what's going on there...
Fortunately, the file type shows up as 0 in that header. I guess that's a way to tell it's the end...
The .tar file I was using was a tarball of all the Propeller library files... (including the demos).
Here it is: Library.zip
Just found this thread where the 7zip creator says he uses the old style of tar format:
http://sourceforge.net/p/sevenzip/discussion/45797/thread/1815e053
One last thought: For application to flash, it seems one could put many .tar files, one after another, onto flash and it would just appear as one big .tar file.
Only wrinkle is the extra space at the end of my test .tar file. Have to figure out what that is all about... If it's always 1024 bytes, it's not a big deal.
But, I did read that all you have to do is append a new file with the same filename as the first one.
The first one will then either be ignored or overwritten by the second.
Only downside is that you'd have to search the whole contents for a file to open, not just stopping at the first instance...
You could search backwards maybe, but that's not very good for flash...
We don't really need compression, just a way to pack files unto an SD card in a manner which would be most easiest/low memory for the Propeller.
Jeff
This standard is just an extra feature. Standard based approaches are usually best when there are no other contraints...
Why bother with a standard tar format?
Modify your tar header (the first x bytes of the file) to contain the file name, where it begins, and how long it is. Allow as many bites as you like for the filenames, thus no 8.3 restrictions.
And, there's no overhead for finding the files, because the first sector can be only 8k or 16k,, or whatever, plenty to fit into the ram of the prop, then just seek to the file location, and begin reading, with a counter to keep track of how large the file is, (read from the header) and how many you've read.
As mentioned before, this approach won't work well if you're constantly rewriting files as in an editor or something, but it's perfectly serviceable for a file system that won't change much.
The swag archive (anyone remember those) used something similar to this approach for unarchiving their code snippets. It kept the snippets in ascii format, so you could load them into your favorite editor, search for the one you wanted, then copy and paste into your own code, or you could extract individual ones, or all of them into separate files if so desired. I usually left them as one big file, because it made it waste less disk space, and it was equally easy to grab the segment I wanted when coding and borrowing code.
The only problem to this approach (well, there's many if you want constantly changing file lengths, but this has already been addressed) is that a couple times, I caught errors in the file size field, indicating the file was shorter than it's acutal size, so sometimes, you'd extract an individual snippet, only to have it be cut off short, because the size field was wrong. Easy enough to correct once you realized the problem, but still, only a minor irritation, especially since the whole file was there. It would allow for (some) editing wiggle room, since you could easily claim a file was 1200 bytes when it was actually only 900 bytes, so you'd have 300 bytes between the end of that one and the start of the next one to allow for expansion. Of course, this would have to be something coded into the programs (editors) using the files, but there's nothing preventing this practice from working. The file tools (if there were any) would simply treat the extra bytes as part of the file, and no harm done. On the other hand, a carefully crafted entry could leave blank space between two files, and without an accounting program, you'd never know it. I.E. file 2 is 100 bytes, and it says it's 100 bytes long, but file 3 claims it begins at a position 20 bytes further along than where file 2 ends. There's nothing to tell file2 it's wrong, or file 3 it's wrong either. So, you in effect get hidden areas in the file system, but those kinds of things would be easy to catch with a program that went through the file system and added up bytes, and adjusted file starting locations as needed.
So, again, I see no need to stick to an existing standard. Sometimes I think about how much inovation the computer industry has lost out on, because folks stuck to existing standards instead of finding a new inovative way to do things.
Then again, I can point to many places where that was done, and it didn't turn out well, but that's not the point (at least not here). Folks seem to stick with what works, instead of finding another method that may eventually prove superior.
Not saying that in this case, only saying that as a general rule.
There's arguments for both sides of course, but I still believe inovation truly comes when someone thinks it might work better this way, and tries it, and proves themselves to be right.
So,, add a header to the file, put the filenames, starting locations and file lengths in the header, then leave the rest to the file reading tools.
From my point you complain a bit too much about standards!
You forget about one little thing: Standards give you a framework to keep the costs for other innovations low. You don't have to reinvent the wheel for each new thing you want to develop. Just use a well known standard and pick an existing library and you can build something new on top, saving lots of time and money!
And each standard conform device might participate in your new device as well as the new device directly finds a big market for it's distribution.
Imagine a harddrive-manufacturer does not work with fixed size sectors of 512bytes, but a variable size sector. Not a single PC/Mac ... would work with this kind of harddrive! A lot of efford is needed to convince all the companies that should use the new harddrive. Now imagine you have 10 harddrive manufacturers where each one has it's own way!
And in the given problem it is nearly the same! Rayman want's to use well known tools to pack together a bunch of files to be used by the propeller.
But I also agree with you when you say that we -in microcontroller development- should not stick to much to standards which have been defined in PC-area. FSRW_FEMTO for example needs quite a lot of longs just for reading the directory and files managed in clusters ...
In our uC-world it would be enough to have a filename/startsector/length table somewhere if varying files are really needed.
So, what I do is:
Have a hughe SWAP-file which lives in FAT. Copying it as a first file to an empty SD makes sure that the whole file is one sector after the other.
The SWAP-file contains a table as described above.
I have SPIN-files to write data into the SWAP-file at the desired Locations.
The main-program can find whatever it needs either via the file-table or it simply knows where to find stuff.
The advantage is that the SWAP-file can be copied with a PC - for example for bundeling an installation-package. On propeller-side there is no FAT-code needed (except that part that finds the start-sector of the SWAP-file), which makes code smaller and increases load-speed (no popen, no clustered read ... )