File transfers between FAT16/32 files and CPM2.2
Cluso99
Posts: 18,069
Background...
We have ZiCog that emulates a Z80/8080 and runs CPM2.2. This has lots of good programs and languages such as MBasic, TurboPascal, C, etc, etc. There are editors such as WordStar, spreadsheets such as SuperCalc, etc, etc.
Each CPM Disk is a single contiguous file stored under FAT16/32.
We have various incarnations of a Prop OS such as KyeDos, Sphinx, PropCmd, etc. Nothing that is really complete yet, but we are getting there.
So how can we move forward?
I now have the HALT program in CPM rebooting the prop, so we can get back to a Prop OS. In order to use decent text editors, we need to use CPM for now anyways. So we require a mechanism to move text files back and forth between CPM and the Propos.
I have started work on this and came up with some questions. I can and will find the answers in the file formats by trial and error. But before I do, does anyone know if the EOF on FAT16/32 files still uses Ctl-Z ? This is how CPM handles it. What about crlf between the two.
Currently I am locating the filename in the CPM disk and will be there shortly to access the file. All this is being done in spin from the Prop OS side.
We have ZiCog that emulates a Z80/8080 and runs CPM2.2. This has lots of good programs and languages such as MBasic, TurboPascal, C, etc, etc. There are editors such as WordStar, spreadsheets such as SuperCalc, etc, etc.
Each CPM Disk is a single contiguous file stored under FAT16/32.
We have various incarnations of a Prop OS such as KyeDos, Sphinx, PropCmd, etc. Nothing that is really complete yet, but we are getting there.
So how can we move forward?
I now have the HALT program in CPM rebooting the prop, so we can get back to a Prop OS. In order to use decent text editors, we need to use CPM for now anyways. So we require a mechanism to move text files back and forth between CPM and the Propos.
I have started work on this and came up with some questions. I can and will find the answers in the file formats by trial and error. But before I do, does anyone know if the EOF on FAT16/32 files still uses Ctl-Z ? This is how CPM handles it. What about crlf between the two.
Currently I am locating the filename in the CPM disk and will be there shortly to access the file. All this is being done in spin from the Prop OS side.
Comments
Is it possible to decode the file handling format CP/M uses and write a decoder in spin?
Another way I have pondered is dumping a file out to external memory, then reading it from Spin.
Or even a slightly crazy idea (which is possible on the propeller due to things running in parallel but not so easy on other micros) where spin keeps running, CP/M is running, and you set up an xmodem or PIP transfer between the two.
All things being equal, what you are trying to do is not at all crazy. I'm in the middle of playing around with fonts and text boxes, and the code is not easy (a text box is a mini word processor) and it would still be easier to use Wordstar than to write a full featured text editor for the prop.
If you can crack this one Cluso you would be a legend!
Anyway, I will know soon enough.
Then I really have to get back to a Prop OS for all of us. And I would like to see how this goes with Catalina too. So I will be asking you Drac about your loading and unloading spin and pasm objects. Dont expect to get here for a couple of weeks though.
For random access files, MBASIC can open a file, read or write a random record and close the file.
The file header contains useful things http://members.iinet.net.au/~daveb/cpm/fcb.html
Not sure where the "number of records" is stored. If you know that, then you don't need to worry about ^Z. In MBASIC you can read a file until EOF.
I'm pondering a simple MBASIC program (or it could be assembly) that opens a file, reads the records and then sends them somewhere, or receives them from somewhere. Create a virtual port between spin and CP/M.
Re loading and unloading spin and pasm objects, the secret to that is the basic structure of your memory driver for the dracblade. Pass all data to and from pasm via a few longs. Your driver used four longs - comman, source, destination, length. You can do almost anything with that. Then you can uncouple the spin and pasm parts from an object. The key really is to avoid interacting with the cog via bits of hub that have been declared as a result of the way the spin compiler works. In other words, don't declare "myvariable" somewhere in spin, and then use "myvariable" in a cog. Instead, add a simple command and transfer the location of "myvariable" that way.
Then you can compile the pasm part independantly, then save it as a binary file on the sd card and load it in as needed.
I can now find an entry in the DIR table and from that calculate the block allocations and where they are on the disk, and locate those data entry sectors. As there is no count on the file, other than sector counts, the balance of a block is filled with Ctl-Z ($1A). Just got to get the bit running where I calculate the actual length (I am reading for a fixed length now) and if the file runs over a second directory entry because it runs out of 16 extents/blocks (each block is 4K and the directory entry allows up to 16 blocks per directory.
Then I have to write that out to a Prop os FAT file.
For the other direction, also from prop os, I have to work out where cpm holds its block usage table. I havent found any reference to that. otherwise I am pretty sure I will have that working soon. Have to leave of for most of next week for something else.
Once I have text working (that includes .PRN and .BAS files) then I will have to look at binaries like .COM files.
Hence the idea with control-z marking the end of text files and the rest if the block being filled with control-z.
For binary files you only have file size to the block size resolution. The actual amount of data used being known by whatever software reads/writes that binary file format and hence ignoring the unused portion of the last block.
Block size and sector size may or may not be the same thing. CP/M file storage went through a number of revisions as disc sizes increased, with increasingly complicated hacks to cope with the larger disc sizes. Last time I looked (admittedly a couple of decades ago) all the info needed was stored on the disc if you knew how to interpret it.
In any case, though, there is no EOF marker for binary files. If you write a binary file, its length is the number of blocks it takes and there is no way to specify that it occupies only part of the final block. Individual programs can adopt their own conventions, but they do so without support from the file system.
Hope that helps, even if only a bit...
While I know the FCB is embedded in the cpm boot sectors, I cannot determine where of if it is truly fixed. Our implementation of ZiCog uses 6x 8MB harddisks and 2x 1MB. So I am going to hardcode the fcb values to support only the 8MB disks because my program does not run under cpm, but under prop os.
Worst case scenario for writing *to* the disk, you could always start with a completely blank disk filled with E5, copy the file as a contiguous block and then in CP/M do a PIP from that drive to another drive. Might make decoding things a little simpler.
I'm off to see if I can learn more about the FCB format...
http://www.seasip.demon.co.uk/Cpm/format22.html
Postedit:
There is no FAT (file allocation table). CPM2.2 builds this on boot from the DIR entries.
I'm reading the disk image in a hex editor. The text is there starting at address E000H. It is indeed padded at the end with ^Z characters.
And there is an entry with the filename (the DPB) at address 6000H.
Perhaps you could look at all the DIR entries on your DRAC_A.DSK (8MB version in zicog). They begin at sector 192 (in 128 byte sectors) and are each 32B long and continue to sector 447. I was looking at these entries... Sector 946 should be 193 and 947 should be 194 within cpm.
Note the ex byte after the ... (where I removed the filename). The are not sequential. The file is BDOS.PRN
Unfortunately I don't know my way around cpm very well, so not sure of what tools are available to see what I am interested in. I can decode the remaining bytes of the DIR entry correctly.
This would require implementing the SIMH I/O port to do but would then work for any CP/M disk format.
Or does this really have to work from the other OS?
@Cluso, I have a 94k text file. The text is contiguous at E000H
The entry table is at 6000H and it has created three "extants", each one 32 bytes. This is 128 bytes (the E5 in the last row is blank and it goes on with more E5s so this is it). I have a feeling there are some numbers in those extants that say it is a larger file. Everything is hidden in this hex code - just need to decode it. The filename is LOREM1.TXT and you see that come up three times at the beginning of the text block, so this is a way of linking each extant together with the same filename.
I think this page explains the bytes http://linux.die.net/man/5/cpm You can see the block pointers incrementing. The extant number is 01 then 03 then 05 (13th byte on each row) - not sure why it increments by two. Just have to pull it all together...
@Drac: Thanks. I have just figured out that I was misreading how it calculates the extent number. Turns out is has a shift left because we have defined the disk blocks to be 4K and 8 blocks can fit a dir entry. So now I know why I am only seeing every 2nd extent number.. I glossed over the part where I had to do a calc.
If you look at your dump you will see after the filename (last $54) the is a byte called EX. That starts as $01, the next extent (next entry) is $03 and then $05. I was originally expecting to see $01, $02, $03 etc here. Now, back to the first entry EX (5 bits only so 0-31), the next byte is S1 and always $00, followed by S2 which are bits 5-12 for the upper part of EX (making 13 bits total). The next byte is RC and contains the record count (blocks) in this entry. If $80 is means the maximum blocks for this entry and there may be another dir entry.
So from your example you can see 80, 80, 67. From the formula I have to add 128 to $67(=105) = 233 by 128byte sectors which means 7x 4K blocks(7x32=224) plus 9 more 128byte sectors (224+9=233). The blocks (allocations) are pairs of bytes that follow, lsb first. So you can see blocks 0018, 0019, 001A, 001B, 001C, 001D, 001E, 001F. So I only use 9 of the 32 sectors of the last $001F 4K block.
Now the maths works - thanks to your dump - I was able to see exactly how to do this Now to code it!
One little test - could you work out from those extant entries how big the file is?
(the i.dsk has data from E000H to 2537FH = 17380H bytes (decimal 95104)
* The filenames are fixed (and the filename of the CPM diskfile). Change in the CON block!
* Only text format files (.TXT, .PRN, .BAS)
* The output filename is "CPMXFER.TXT" and can be opened under windows (plug SD card into pc) using WordPad. Didnt try Notepad.
* If the output filename exists, it will be deleted first.
* Currently only handles CPM --> FAT16/32 (using Kye's Driver)
* Currently only handles files =<32KB (1 dir entry in CPM)
* No date/time inserted
* Bug if the program finds an early Ctl-Z (only looks in the last record of each block) and also needs to scan from the end backwards
Please let me know if you find a problem.
CPMXFER_002a - Archive [Date 2012.01.25 Time 11.23].zip
Is there not a way of determining the size of a file from the extant, rather than looking for a ^Z ?
Determining size...No. The dir only holds the number of sectors (128 bytes) used in each segment. So its size will hold the next higher rounded value to 128 bytes, unless it i=finishes exactly on a 128 byte boundary, in which case, there is no Ctl-Z stored. The cpm writeups acknowledge that binaries must be determined by some other method. Unfortunately, there is no way to tell if a file is text or binary, hence the old problem of some files being truncated.
Maybe tonight I may get to doing the multiple dir entries, allowing larger files.
I have a feeling that won't matter for binaries.
If you are working in assembly or C or Mbasic and open a random access file, you read or write to that file in 128 byte records. So if your file is 5 bytes, it saves 128 bytes.
If you do an xmodem transfer, and the boundary does not fall exactly on the 128 byte boundary xmodem pads with ^Z. But it could pad with anything really as this ^Z is not used subsequently as a marker.
I once tried sending the same file (MBASIC.COM) back and forth with xmodem between a PC and CP/M, and yes, on the first transfer the file size grew so it was rounded up, but after that the file size did not change.
So overall, I think you can determine the file size by looking at the extant data and the ^Z does not matter. I have a feeling there are some file read and write processes that can even leave garbage at the end. It wouldn't matter in a .COM program for instance as the code would never get there to run this part.
The maximum file size supported is 1MB and I do check for this and report an error. For now, it is slow as I am transferring bytes at a time due to the printout capability. That will change as soon as I am sure everyting is ok.
CPMXFER_003 - Archive [Date 2012.01.25 Time 20.34].zip
Outputs all file types (text and binaries) CPM-->FAT16/32
No test is done for Ctl-Z and the final record is not truncated.
Max file size 1MB.
Faster transfer (write 128 byte sectors)
Filenames still hardcoded; one direction only; no date/time stamp.
CPMXFER_004 - Archive [Date 2012.01.26 Time 08.13].zip
Re the hardcoding of filenames, I'm using this technique
then using kye's string object
and then open the file with
and in Kye's string driver, stringcopy is
Date and time stamps may not matter since CP/M 2.2 doesn't have them anyway?
Thanks for the input ideas. I just hardcode because then I dont have to enter anything - it just runs for each test so its quicker for me to debug.
Now to get the other direction working