Shop OBEX P1 Docs P2 Docs Learn Events
File transfers between FAT16/32 files and CPM2.2 — Parallax Forums

File transfers between FAT16/32 files and CPM2.2

Cluso99Cluso99 Posts: 18,069
edited 2012-01-25 18:05 in Propeller 1
Background...

We have ZiCog that emulates a Z80/8080 and runs CPM2.2. This has lots of good programs and languages such as MBasic, TurboPascal, C, etc, etc. There are editors such as WordStar, spreadsheets such as SuperCalc, etc, etc.

Each CPM Disk is a single contiguous file stored under FAT16/32.

We have various incarnations of a Prop OS such as KyeDos, Sphinx, PropCmd, etc. Nothing that is really complete yet, but we are getting there.

So how can we move forward?

I now have the HALT program in CPM rebooting the prop, so we can get back to a Prop OS. In order to use decent text editors, we need to use CPM for now anyways. So we require a mechanism to move text files back and forth between CPM and the Propos.

I have started work on this and came up with some questions. I can and will find the answers in the file formats by trial and error. But before I do, does anyone know if the EOF on FAT16/32 files still uses Ctl-Z ? This is how CPM handles it. What about crlf between the two.

Currently I am locating the filename in the CPM disk and will be there shortly to access the file. All this is being done in spin from the Prop OS side.

Comments

  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-21 17:14
    I'm not sure Ctrl-Z ever took off. One problem that I recall from 'back in the day' is that if you use Ctrl-Z and you transfer a binary file, sooner or later a Ctrl-Z comes up by chance.

    Is it possible to decode the file handling format CP/M uses and write a decoder in spin?

    Another way I have pondered is dumping a file out to external memory, then reading it from Spin.

    Or even a slightly crazy idea (which is possible on the propeller due to things running in parallel but not so easy on other micros) where spin keeps running, CP/M is running, and you set up an xmodem or PIP transfer between the two.

    All things being equal, what you are trying to do is not at all crazy. I'm in the middle of playing around with fonts and text boxes, and the code is not easy (a text box is a mini word processor) and it would still be easier to use Wordstar than to write a full featured text editor for the prop.

    If you can crack this one Cluso you would be a legend!
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-21 20:12
    According to all documents, CPM does use Ctl-Z to terminate text files. Not sure what happens with binaries though as Ctl-Z can appear in the file. I know that if you inadvertantly put a ctlz in a cpm text file it truncates the file. I just am not sure ctl-z is used in MSDOS files - I think the file length is the governing factor. And I may need to handle crlf differently.

    Anyway, I will know soon enough.

    Then I really have to get back to a Prop OS for all of us. And I would like to see how this goes with Catalina too. So I will be asking you Drac about your loading and unloading spin and pasm objects. Dont expect to get here for a couple of weeks though.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-21 20:34
    I never realised it at the time, but MBASIC is fairly close to the metal in terms of how files work.

    For random access files, MBASIC can open a file, read or write a random record and close the file.

    The file header contains useful things http://members.iinet.net.au/~daveb/cpm/fcb.html

    Not sure where the "number of records" is stored. If you know that, then you don't need to worry about ^Z. In MBASIC you can read a file until EOF.

    I'm pondering a simple MBASIC program (or it could be assembly) that opens a file, reads the records and then sends them somewhere, or receives them from somewhere. Create a virtual port between spin and CP/M.

    Re loading and unloading spin and pasm objects, the secret to that is the basic structure of your memory driver for the dracblade. Pass all data to and from pasm via a few longs. Your driver used four longs - comman, source, destination, length. You can do almost anything with that. Then you can uncouple the spin and pasm parts from an object. The key really is to avoid interacting with the cog via bits of hub that have been declared as a result of the way the spin compiler works. In other words, don't declare "myvariable" somewhere in spin, and then use "myvariable" in a cog. Instead, add a simple command and transfer the location of "myvariable" that way.

    Then you can compile the pasm part independantly, then save it as a binary file on the sd card and load it in as needed.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-22 03:25
    Drac: Here is what happens. The FCB controls quite a bit of the disk such as where the directory entries start, block sizes, etc. This is not stored anywhere where it is accessible in CPM except cpm code. It is embedded somewhere in the boot code which could vary if recompiled. So, I am hardcoding that part because I am running under prop os of a kind.

    I can now find an entry in the DIR table and from that calculate the block allocations and where they are on the disk, and locate those data entry sectors. As there is no count on the file, other than sector counts, the balance of a block is filled with Ctl-Z ($1A). Just got to get the bit running where I calculate the actual length (I am reading for a fixed length now) and if the file runs over a second directory entry because it runs out of 16 extents/blocks (each block is 4K and the directory entry allows up to 16 blocks per directory.

    Then I have to write that out to a Prop os FAT file.

    For the other direction, also from prop os, I have to work out where cpm holds its block usage table. I havent found any reference to that. otherwise I am pretty sure I will have that working soon. Have to leave of for most of next week for something else.

    Once I have text working (that includes .PRN and .BAS files) then I will have to look at binaries like .COM files.
  • Heater.Heater. Posts: 21,230
    edited 2012-01-22 03:44
    As far as I recall CP/M has no idea about actual file sizes only how many whole blocks a file takes. So in the directory listings you only see multiples of 4K or whatever.
    Hence the idea with control-z marking the end of text files and the rest if the block being filled with control-z.
    For binary files you only have file size to the block size resolution. The actual amount of data used being known by whatever software reads/writes that binary file format and hence ignoring the unused portion of the last block.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-22 06:05
    actually heater, the resolution is down to sector size, being 128 for cpm2.2. Thanks for the binary info. Guess for transfers of binaries to fat I will just have to set the size to the 128 sector size multiple. For text styles I can detect the ctl-z ending and therefore calculate its size correctly for fat.
  • johnfosjohnfos Posts: 59
    edited 2012-01-22 14:59
    On the control-Z issue, my understanding is that a text file will end at the first c-Z character *unless* the file size happens to be a multiple of the block size, in which case there is no terminating c-Z (because to add one would require an extra and unnecessary block to be added to the file).

    Block size and sector size may or may not be the same thing. CP/M file storage went through a number of revisions as disc sizes increased, with increasingly complicated hacks to cope with the larger disc sizes. Last time I looked (admittedly a couple of decades ago) all the info needed was stored on the disc if you knew how to interpret it.

    In any case, though, there is no EOF marker for binary files. If you write a binary file, its length is the number of blocks it takes and there is no way to specify that it occupies only part of the final block. Individual programs can adopt their own conventions, but they do so without support from the file system.

    Hope that helps, even if only a bit...
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-22 16:44
    johnfos: Yes, I have seen an exact block size where the ctl-z does not appear. "MOVER.PRN"
    While I know the FCB is embedded in the cpm boot sectors, I cannot determine where of if it is truly fixed. Our implementation of ZiCog uses 6x 8MB harddisks and 2x 1MB. So I am going to hardcode the fcb values to support only the 8MB disks because my program does not run under cpm, but under prop os.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-22 17:09
    It would make sense to hard code this for just the 8mb drives. These were always the simplest format IMHO.

    Worst case scenario for writing *to* the disk, you could always start with a completely blank disk filled with E5, copy the file as a contiguous block and then in CP/M do a PIP from that drive to another drive. Might make decoding things a little simpler.

    I'm off to see if I can learn more about the FCB format...
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-22 20:07
    I found this article a great help Drac

    http://www.seasip.demon.co.uk/Cpm/format22.html

    Postedit:
    There is no FAT (file allocation table). CPM2.2 builds this on boot from the DIR entries.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-22 21:42
    Hmm - this is a puzzle. I downloaded a new SIMH CP/M 2.2 emulation complete with a clean i.dsk drive. Then created a little text file in Notepad. Saved it to the directory, used R.COM on drive A to copy it into CP/M. Then PIP to transfer the file to the i.dsk.

    I'm reading the disk image in a hex editor. The text is there starting at address E000H. It is indeed padded at the end with ^Z characters.

    And there is an entry with the filename (the DPB) at address 6000H.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-23 16:04
    Drac: Can you repeat with a larger file. I need one with >32KB but would prefer one with >64KB (say 80KB). Unless I have a bug, I don't find dir entries for sequential extents meaning I am either missing where some are hiding due to a bug, or the are not necessarily consecutive which is a puzzle.

    Perhaps you could look at all the DIR entries on your DRAC_A.DSK (8MB version in zicog). They begin at sector 192 (in 128 byte sectors) and are each 32B long and continue to sector 447. I was looking at these entries...
    946 00...01 00 00 80 40 00 41 00 42 00 46 00 47 00 48 00 49 00 4A 00 .BDOS    PRN....@.A.B.F.G.H.I.J.
    946 00...03 00 00 80 4B 00 4C 00 4D 00 4E 00 4F 00 50 00 51 00 52 00 .BDOS    PRN....K.L.M.N.O.P.Q.R.
    946 00...05 00 00 80 53 00 54 00 55 00 56 00 57 00 58 00 59 00 5A 00 .BDOS    PRN....S.T.U.V.W.X.Y.Z.
    947 00...07 00 00 80 5B 00 5C 00 5D 00 5E 00 5F 00 60 00 61 00 62 00 .BDOS    PRN....[.\.].^._.`.a.b.
    947 00...09 00 00 80 63 00 64 00 65 00 66 00 67 00 68 00 69 00 80 00 .BDOS    PRN....c.d.e.f.g.h.i...
    947 00...0A 00 00 40 81 00 82 00 00 00 00 00 00 00 00 00 00 00 00 00 .BDOS    PRN...@................
    
    Sector 946 should be 193 and 947 should be 194 within cpm.

    Note the ex byte after the ... (where I removed the filename). The are not sequential. The file is BDOS.PRN

    Unfortunately I don't know my way around cpm very well, so not sure of what tools are available to see what I am interested in. I can decode the remaining bytes of the DIR entry correctly.
  • Heater.Heater. Posts: 21,230
    edited 2012-01-23 16:20
    A question might be, why not use the CP/M programs R.COM and W.COM to read and write files to the DOS FS from inside CP/M?
    This would require implementing the SIMH I/O port to do but would then work for any CP/M disk format.
    Or does this really have to work from the other OS?
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-23 18:27
    @heater, I tried to understand the source of R.COM and W.COM but it was too hard to decode.

    @Cluso, I have a 94k text file. The text is contiguous at E000H

    The entry table is at 6000H and it has created three "extants", each one 32 bytes. This is 128 bytes (the E5 in the last row is blank and it goes on with more E5s so this is it). I have a feeling there are some numbers in those extants that say it is a larger file. Everything is hidden in this hex code - just need to decode it. The filename is LOREM1.TXT and you see that come up three times at the beginning of the text block, so this is a way of linking each extant together with the same filename.
    00 4C 4F 52 45 4D 31 20 20 54 58 54 01 00 00 80 08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00
    00 4C 4F 52 45 4D 31 20 20 54 58 54 03 00 00 80 10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00 
    00 4C 4F 52 45 4D 31 20 20 54 58 54 05 00 00 67 18 00 19 00 1A 00 1B 00 1C 00 1D 00 1E 00 1F 00 
    E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5 E5
    

    I think this page explains the bytes http://linux.die.net/man/5/cpm You can see the block pointers incrementing. The extant number is 01 then 03 then 05 (13th byte on each row) - not sure why it increments by two. Just have to pull it all together...
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-23 19:53
    @heater: I would prefer to have it work from the prop end. I am not sure why. Maybe because I am more comfortable with that side???

    @Drac: Thanks. I have just figured out that I was misreading how it calculates the extent number. Turns out is has a shift left because we have defined the disk blocks to be 4K and 8 blocks can fit a dir entry. So now I know why I am only seeing every 2nd extent number.. I glossed over the part where I had to do a calc.

    If you look at your dump you will see after the filename (last $54) the is a byte called EX. That starts as $01, the next extent (next entry) is $03 and then $05. I was originally expecting to see $01, $02, $03 etc here. Now, back to the first entry EX (5 bits only so 0-31), the next byte is S1 and always $00, followed by S2 which are bits 5-12 for the upper part of EX (making 13 bits total). The next byte is RC and contains the record count (blocks) in this entry. If $80 is means the maximum blocks for this entry and there may be another dir entry.
    So from your example you can see 80, 80, 67. From the formula I have to add 128 to $67(=105) = 233 by 128byte sectors which means 7x 4K blocks(7x32=224) plus 9 more 128byte sectors (224+9=233). The blocks (allocations) are pairs of bytes that follow, lsb first. So you can see blocks 0018, 0019, 001A, 001B, 001C, 001D, 001E, 001F. So I only use 9 of the 32 sectors of the last $001F 4K block.
    Now the maths works - thanks to your dump - I was able to see exactly how to do this :) Now to code it!
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-23 21:18
    Good to hear all this.

    One little test - could you work out from those extant entries how big the file is?

    (the i.dsk has data from E000H to 2537FH = 17380H bytes (decimal 95104)
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-24 03:04
    The max file size is 233*128 less what is not used (filled with Ctl-z) in the last sector.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-24 05:30
    I have the cpm fie reading working now for text files <32KB which is one directory entry. Now to output that to FAT16/32 which is quite simple with Kyes routines.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-24 16:30
    Here is a test version.
    * The filenames are fixed (and the filename of the CPM diskfile). Change in the CON block!
    * Only text format files (.TXT, .PRN, .BAS)
    * The output filename is "CPMXFER.TXT" and can be opened under windows (plug SD card into pc) using WordPad. Didnt try Notepad.
    * If the output filename exists, it will be deleted first.
    * Currently only handles CPM --> FAT16/32 (using Kye's Driver)
    * Currently only handles files =<32KB (1 dir entry in CPM)
    * No date/time inserted
    * Bug if the program finds an early Ctl-Z (only looks in the last record of each block) and also needs to scan from the end backwards

    Please let me know if you find a problem.
    CPMXFER_002a - Archive [Date 2012.01.25 Time 11.23].zip
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-24 17:28
    Intriguing code. Looks like you have cracked the secrets of disk files on CP/M.

    Is there not a way of determining the size of a file from the extant, rather than looking for a ^Z ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-24 20:29
    Yes, I think I understand it all now - I hope anyway. I will update the program better to explain - I just hurried it out this morning so you could give it a go.

    Determining size...No. The dir only holds the number of sectors (128 bytes) used in each segment. So its size will hold the next higher rounded value to 128 bytes, unless it i=finishes exactly on a 128 byte boundary, in which case, there is no Ctl-Z stored. The cpm writeups acknowledge that binaries must be determined by some other method. Unfortunately, there is no way to tell if a file is text or binary, hence the old problem of some files being truncated.

    Maybe tonight I may get to doing the multiple dir entries, allowing larger files.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-24 21:17
    Determining size...No. The dir only holds the number of sectors (128 bytes) used in each segment. So its size will hold the next higher rounded value to 128 bytes, unless it i=finishes exactly on a 128 byte boundary, in which case, there is no Ctl-Z stored.

    I have a feeling that won't matter for binaries.

    If you are working in assembly or C or Mbasic and open a random access file, you read or write to that file in 128 byte records. So if your file is 5 bytes, it saves 128 bytes.

    If you do an xmodem transfer, and the boundary does not fall exactly on the 128 byte boundary xmodem pads with ^Z. But it could pad with anything really as this ^Z is not used subsequently as a marker.

    I once tried sending the same file (MBASIC.COM) back and forth with xmodem between a PC and CP/M, and yes, on the first transfer the file size grew so it was rounded up, but after that the file size did not change.

    So overall, I think you can determine the file size by looking at the extant data and the ^Z does not matter. I have a feeling there are some file read and write processes that can even leave garbage at the end. It wouldn't matter in a .COM program for instance as the code would never get there to run this part.
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-25 01:53
    Here is a multiextent (multiple dir entries). I tried the BDOS.PRN (168KB) program. I noted that I truncated the file slightly because there is a Ctl-Z about 40 bytes from the end (IIRC). But there is text that follows and not Ctl-Z at the end. So I think I will stop looking for ctlz next release.

    The maximum file size supported is 1MB and I do check for this and report an error. For now, it is slow as I am transferring bytes at a time due to the printout capability. That will change as soon as I am sure everyting is ok.

    CPMXFER_003 - Archive [Date 2012.01.25 Time 20.34].zip
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-25 13:22
    Here is the latest version v004

    Outputs all file types (text and binaries) CPM-->FAT16/32
    No test is done for Ctl-Z and the final record is not truncated.
    Max file size 1MB.
    Faster transfer (write 128 byte sectors)

    Filenames still hardcoded; one direction only; no date/time stamp.

    CPMXFER_004 - Archive [Date 2012.01.26 Time 08.13].zip
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-01-25 16:29
    Looking very nice.

    Re the hardcoding of filenames, I'm using this technique
    VAR
       byte LineOfText1[20] ' general purpose string buffer
    

    then using kye's string object
        str.copy(string("myfile.raw"),@lineoftext1)  
    

    and then open the file with
        fat.openfile(@lineoftext1,"R")                ' open an image  
    

    and in Kye's string driver, stringcopy is
    PUB stringCopy(whereToPut, whereToGet) '' 5 Stack Longs
    
    '' ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    '' // Copies a string from one location to another. This method can corrupt memory.
    '' //
    '' // Returns a pointer to the new string.
    '' //
    '' // WhereToPut - Address of where to put the copied string.
    '' // WhereToGet - Address of where to get the string to copy.
    '' ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    
      bytemove(whereToPut, whereToGet, (strsize(whereToGet) + 1))
      return whereToPut
    



    Date and time stamps may not matter since CP/M 2.2 doesn't have them anyway?
  • Cluso99Cluso99 Posts: 18,069
    edited 2012-01-25 18:05
    Thanks Drac. Have you had time to test my latest program yet? I know you have better tools to examine results.

    Thanks for the input ideas. I just hardcode because then I dont have to enter anything - it just runs for each test so its quicker for me to debug.

    Now to get the other direction working :)
Sign In or Register to comment.