Calling all retrocomputer experts
Dr_Acula
Posts: 5,484
I've got this idea for a ramdisk. I have the hardware and it is 1megabyte of sram and I have a rudimentary way of writing files to the disk, then searching for them and returning a pointer to the file location. Then retrieve in blocks. It all works very well especially when coupled with an SD drive. The SD drive loads files used frequently into the ram disk, and then the ram disk allows much faster access (16x faster than SD).
But - a disk operating system could do with a little more than this. It would be useful to be able to delete files and reclaim disk space, and also not have to worry about disk fragmentation, and be able to recycle space for different sized files.
There are a number of uses for a ram disk, including fast loading of fonts, fast transfer of bitmap files to a display, editing of large text files, synthesis of .wav files and editing of large music midi files. And it also opens up the way to moving text files in and out of CP/M easily which means all the text editors in CP/M could actually be useful on the propeller.
So this has led me down the path of exploring simple FAT disk operating systems. And I have always been intrigued with how CP/M managed to fit a file operating system into only a few k of assembly. So I have made a concerted effort to decode the CP/M file system.
First little thing - my 30 year old textbooks are wrong! The File Control Block (FCB) is actually 3 bytes less than the books say it is. And they fail to explain how the FCB really goes about storing the filename and pointers to the actual data.
But I think I have cracked the code and I'd like some input and particularly advice if I have got it wrong.
CP/M always was a dogs breakfast when it came to disk formats. Indeed, the specification was so vague that there ended up being multiple formats, all incompatible. Bits of CP/M live on inside DOS and Windows machines in the form of certain BIOS calls and 8.3 file formats and even inside FAT16, and there is a very interesting history as to why DOS beat CP/M. But I have my theory and it is to do with all those incompatible disk formats, which very much affected me in the 1980s when I spent a month writing an article for an electronics magazine which they then couldn't read off my floppy disks.
Fortunately, the Altair SIMH decided to greatly simplify the CP/M format with their 8mb hard disk images and these are relatively easy to decode. Here is the data with the file names and pointers to the location of the data
and in C code
Let's see if we can crack the code.
First, each file is represented by 32 bytes. Take the first two lines of those examples above.
32 bytes per directory entry. Byte 0 = zero, Byte 1 to 8 = filename, byte 9-11 = extension
Byte 12 = file extent if file is more than 16 k then repeat the filename and this is 0 for 0-16k, 1 for 16-32k etc
Byte 15 = Number of 128 byte records = 80H if all filled up, ie 128 records
byte 16 - 31, a series of 8 words, lsb first, pointing to the location in disk
if there are 16k per FCB and 8 of these words then ? 2k per each of these locations. Fits with 2k being the smallest file size
so this is the location of the next group of records (16 records in 2k)
So a file is broken up into 16k parts. If it is less than 16k, it gets one FCB entry. If it is more than 16k, then the FCB is repeated with the same name and there is a number, byte 12, indicating which 16k part.
Next we need some disk parameters. This was the bit that could change and caused all the incompatibility problems.
So the last line reserves 1 track for the system. A track is 32k, so the actual data starts at 8000H and we can see that in the second line of the hex code, there is a value 08 00. Actually, that is a bit of a false lead. It is Big Endian, the value is actually 0008H. The next entry is 0009H and the difference between each of those is 2k, and we know that each FCB can only access 16k of data, so I think there is an offset there. What seems to work is to take that hex value, subtract 0008H, multiply by 2k, then add to the start of the data which is 8000H and that gives the correct address in the disk where that block of 2k is stored. 2k is the minimum block size - a file with 1 byte in it takes 2k on the disk.
To delete a drive, wipe it all to E5 (or more particularly wipe the file entry table to E5). To remove a directory entry, wipe all the FCB entries that match the filename. To find free blocks of 2k data, search through all the file entries and find the lowest free value excluding zero. To add a file, find the next free FCB group of 32 bytes and add the name. Read and write records in groups of 128 bytes. Each function is only a few lines of code.
I think this is all correct. I'm confused about why the file entries don't start at address zero. On an 8mb drive they start at 6000H. It may be to do with the number of files allowed and this is subtracted off the track size of 32k.
I'm hoping to code all this with some fairly compact Spin code. This would then interface with hardware using pasm code and the hardware part could be changed by just changing the pasm. I want to use this in a very practical way for the touchscreen boards (soon to be dual touchscreen) and the GUI and GUI Apps that have already been written (picture viewer, movie, calculator, synthesizer, touchscreen keyboard). All of these at most use 3 cogs so ideally it would be possible to pull in the 8080 or Z80 emulators in a cog or two and run CP/M as needed. Though in a slightly ironic twist, by combining the touchscreen keyboard with a large textfile in ramdisk, that pretty much equals a word processor and so Wordstar might not be needed.
I'd appreciate some feedback from anyone who used these ancient computer systems. Have I decoded the file system correctly?
But - a disk operating system could do with a little more than this. It would be useful to be able to delete files and reclaim disk space, and also not have to worry about disk fragmentation, and be able to recycle space for different sized files.
There are a number of uses for a ram disk, including fast loading of fonts, fast transfer of bitmap files to a display, editing of large text files, synthesis of .wav files and editing of large music midi files. And it also opens up the way to moving text files in and out of CP/M easily which means all the text editors in CP/M could actually be useful on the propeller.
So this has led me down the path of exploring simple FAT disk operating systems. And I have always been intrigued with how CP/M managed to fit a file operating system into only a few k of assembly. So I have made a concerted effort to decode the CP/M file system.
First little thing - my 30 year old textbooks are wrong! The File Control Block (FCB) is actually 3 bytes less than the books say it is. And they fail to explain how the FCB really goes about storing the filename and pointers to the actual data.
But I think I have cracked the code and I'd like some input and particularly advice if I have got it wrong.
CP/M always was a dogs breakfast when it came to disk formats. Indeed, the specification was so vague that there ended up being multiple formats, all incompatible. Bits of CP/M live on inside DOS and Windows machines in the form of certain BIOS calls and 8.3 file formats and even inside FAT16, and there is a very interesting history as to why DOS beat CP/M. But I have my theory and it is to do with all those incompatible disk formats, which very much affected me in the 1980s when I spent a month writing an article for an electronics magazine which they then couldn't read off my floppy disks.
Fortunately, the Altair SIMH decided to greatly simplify the CP/M format with their 8mb hard disk images and these are relatively easy to decode. Here is the data with the file names and pointers to the location of the data
00 42 41 53 49 43 4C 49 42 52 45 4C 01 00 00 80 08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00 00 42 41 53 49 43 4C 49 42 52 45 4C 02 00 00 44 10 00 11 00 12 00 00 00 00 00 00 00 00 00 00 00 00 42 44 4F 53 43 41 4C 4C 54 58 54 00 00 00 4C 13 00 14 00 15 00 00 00 00 00 00 00 00 00 00 00 00 45 4F 46 20 20 20 20 20 54 58 54 00 00 00 34 16 00 17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 45 4C 50 20 20 20 20 43 4F 4D 00 00 00 1D 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 49 4E 54 52 4F 20 20 20 54 58 54 01 00 00 0D 19 00 1A 00 1B 00 1C 00 1D 00 00 00 00 00 00 00 00 4C 49 42 52 41 52 59 20 42 41 53 00 00 00 26 1E 00 1F 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4D 45 4D 20 20 20 20 20 42 41 53 00 00 00 09 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4F 56 45 52 4C 41 59 42 43 4F 4D 00 00 00 36
and in C code
'\0', 'B', 'A', 'S', 'I', 'C', 'L', 'I', 'B', 'R', 'E', 'L', '\x01', '\0', '\0', '\x80', '\b', '\0', '\t', '\0', '\n', '\0', '\v', '\0', '\f', '\0', '\r', '\0', '\x0E', '\0', '\x0F', '\0', '\0', 'B', 'A', 'S', 'I', 'C', 'L', 'I', 'B', 'R', 'E', 'L', '\x02', '\0', '\0', 'D', '\x10', '\0', '\x11', '\0', '\x12', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'B', 'D', 'O', 'S', 'C', 'A', 'L', 'L', 'T', 'X', 'T', '\0', '\0', '\0', 'L', '\x13', '\0', '\x14', '\0', '\x15', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'E', 'O', 'F', ' ', ' ', ' ', ' ', ' ', 'T', 'X', 'T', '\0', '\0', '\0', '4', '\x16', '\0', '\x17', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'H', 'E', 'L', 'P', ' ', ' ', ' ', ' ', 'C', 'O', 'M', '\0', '\0', '\0', '\x1D', '\x18', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'I', 'N', 'T', 'R', 'O', ' ', ' ', ' ', 'T', 'X', 'T', '\x01', '\0', '\0', '\r', '\x19', '\0', '\x1A', '\0', '\x1B', '\0', '\x1C', '\0', '\x1D', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'L', 'I', 'B', 'R', 'A', 'R', 'Y', ' ', 'B', 'A', 'S', '\0', '\0', '\0', '&', '\x1E', '\0', '\x1F', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'M', 'E', 'M', ' ', ' ', ' ', ' ', ' ', 'B', 'A', 'S', '\0', '\0', '\0', '\t',
Let's see if we can crack the code.
First, each file is represented by 32 bytes. Take the first two lines of those examples above.
32 bytes per directory entry. Byte 0 = zero, Byte 1 to 8 = filename, byte 9-11 = extension
Byte 12 = file extent if file is more than 16 k then repeat the filename and this is 0 for 0-16k, 1 for 16-32k etc
Byte 15 = Number of 128 byte records = 80H if all filled up, ie 128 records
byte 16 - 31, a series of 8 words, lsb first, pointing to the location in disk
if there are 16k per FCB and 8 of these words then ? 2k per each of these locations. Fits with 2k being the smallest file size
so this is the location of the next group of records (16 records in 2k)
So a file is broken up into 16k parts. If it is less than 16k, it gets one FCB entry. If it is more than 16k, then the FCB is repeated with the same name and there is a number, byte 12, indicating which 16k part.
Next we need some disk parameters. This was the bit that could change and caused all the incompatibility problems.
DPBLK3: ; DISK PARAMETER BLOCK (IDE HARD DISK 8MB) SPT_3: .DW 256 ; 256 SECTORS OF 128 BYTES PER 32K TRACK BSH_3: .DB 5 ; BLOCK SHIFT FACTOR (SIZE OF ALLOCATION BLOCK) BLM_3: .DB 31 ; PART OF THE ALLOCATION BLOCK SIZE MATH EXM_3: .DB 1 ; DEFINES SIZE OF EXTENT (DIRECTORY INFO) DSM_3: .DW 2017 ; BLOCKSIZE [4096] * NUMBER OF BLOCKS + 1 = DRIVE SIZE ; HD PARTITION 3 IS 16128 SECTORS LONG ; AT 512 BYTES EACH WHICH IS ; 2016 BLOCKS AT 4096 BYTES A PIECE. DRM_3: .DW 511 ; NUMBER OF DIRECTORY ENTRIES AL0_3: .DB 11110000 ; BIT MAP OF SPACE ALLOCATED TO DIRECTORY AL1_3: .DB 00000000 ; DIRECTORY CAN HAVE UP TO 16 BLOCKS ALLOCATED CKS_3: .DW 0 ; SIZE OF DIRECTORY CHECK [0 IF NON REMOVEABLE] OFF_3: .DW 1 ; 1 TRACK (32K) RESERVED FOR SYSTEM
So the last line reserves 1 track for the system. A track is 32k, so the actual data starts at 8000H and we can see that in the second line of the hex code, there is a value 08 00. Actually, that is a bit of a false lead. It is Big Endian, the value is actually 0008H. The next entry is 0009H and the difference between each of those is 2k, and we know that each FCB can only access 16k of data, so I think there is an offset there. What seems to work is to take that hex value, subtract 0008H, multiply by 2k, then add to the start of the data which is 8000H and that gives the correct address in the disk where that block of 2k is stored. 2k is the minimum block size - a file with 1 byte in it takes 2k on the disk.
To delete a drive, wipe it all to E5 (or more particularly wipe the file entry table to E5). To remove a directory entry, wipe all the FCB entries that match the filename. To find free blocks of 2k data, search through all the file entries and find the lowest free value excluding zero. To add a file, find the next free FCB group of 32 bytes and add the name. Read and write records in groups of 128 bytes. Each function is only a few lines of code.
I think this is all correct. I'm confused about why the file entries don't start at address zero. On an 8mb drive they start at 6000H. It may be to do with the number of files allowed and this is subtracted off the track size of 32k.
I'm hoping to code all this with some fairly compact Spin code. This would then interface with hardware using pasm code and the hardware part could be changed by just changing the pasm. I want to use this in a very practical way for the touchscreen boards (soon to be dual touchscreen) and the GUI and GUI Apps that have already been written (picture viewer, movie, calculator, synthesizer, touchscreen keyboard). All of these at most use 3 cogs so ideally it would be possible to pull in the 8080 or Z80 emulators in a cog or two and run CP/M as needed. Though in a slightly ironic twist, by combining the touchscreen keyboard with a large textfile in ramdisk, that pretty much equals a word processor and so Wordstar might not be needed.
I'd appreciate some feedback from anyone who used these ancient computer systems. Have I decoded the file system correctly?
Comments
Feedback from a minimalist perspective -
The more overhead you build in, the less benefit from speed increase. Some guy said "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
Consider limiting yourself to the "minimum necessary and sufficient to get the job done". All the stuff you are looking at is potentially useful, but might be better placed above the driver for the ram disk. Can you place the ram disk support UNDER a file system driver?
If the ram disk is not going to be used for ast loading of fonts, fast transfer of bitmap files to a display, editing of large text files, synthesis of .wav files and editing of large music midi files, then support for this becomes wasted space and effort. Support should only impact the applications that need it.
$0.02
In this way one can swap to different media by changing only the low level block driver. Or one can implement different file systems on the same media.
So, why not create your RAM disk by making a low level block read/write driver for your ram and then use that from an existing FAT file system object?
The trick here is getting one instance of that higher level fs driver to understand multiple devices, SD and ram disk at least, at the same time.
You could set the 1MB sram up as a hdisk just like the 8MB version but smaller, and a batch file to initialise it each time.
If you want more info email me and I can send you the detail.
I wonder about what would be involved in using the fat16/32 system in cpm?