Shop OBEX P1 Docs P2 Docs Learn Events
Kye's SD_FATEngine.spin (bug in WriteString???) — Parallax Forums

Kye's SD_FATEngine.spin (bug in WriteString???)

I have spent weeks looking for a bugs in my conversion of Michael Park's Sphinx compiler.

Finally I have found the last obscure bug (hopefully ;) )

In sxfile.spin the WriteString writes out the string character plus the terminating null. In sd_fatengine.spin the WriteString does not write out the null terminator.

Kye handles ReadString (terminates with <cr>, <lf> or <null>) the <cr>, <lf> or <null>is included in the string buffer as a terminator.
Therefore, I think that WriteString (which does not look for <cr> or <lf> BTW) should write out the null terminator.
PUB writeString(stringPointer) '' 36+ Stack Longs

'' ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
'' // Writes a string to the file that is currently open and advances the file position by the string length.
'' //
'' // This method will do nothing if a file is not currently open for writing.
'' //
'' // If an error occurs this method will abort and return a pointer to a string describing that error.
'' //
'' // StringPointer - A pointer to a string to write to the file. Writes nothing when at the end of a maximum size file.
'' ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

' writeData(stringPointer, strsize(stringPointer))
  writeData(stringPointer, strsize(stringPointer) + 1)                          'RR20160228 write out the null terminator

What are your thoughts???
Should SD_FATEngine WriteString include the null terminator when writing to a file or not???
I can just as easily modify the compiler - I am just not sure which way to go.

Comments

  • Absolutely not. Normal files do not have null terminators in them. Filesystems have metadata like the size of the file, where as basic character arrays do not, so files in filesystems don't need and shouldn't have null terminators.
  • ErNaErNa Posts: 1,752
    edited 2016-02-28 10:58
    That was definitely not the case in the old times. When copying files with unknown content you had to use a -b binary option to prevent the copy function from stopping when special characters were in the data. Metadata is a late extention to files systems and I can not imagine this to be consistant. I also can imagine to dump memory to a file where the memory contains 0 as string terminator and when reading kyes solution can separate the strings.
    On the other hand: it is a question of convention. I only know, linux and dos handle line end differently, CR/LF and LF I thought it would be better to have CR/LF, as this allows to discriminate different cases: CR/LF means: the user shows, he wants to start a newline at the first column. CR means, start at column 0 in the same line, LF: start at current column in the next line. That allowed to print a line, a CR and in a second run to print all decorations like hyphens, underline etc. But times have changed, now with html formatting is done by the browser and no longer under user control.
    The question remains: what is a string? Is CR part of the string or a control character. Is a string terminated if a CR is detected and is the CR to be transferred or not. My opinion: 0 is the terminator. If you read a string, you get all the characters and either a length information or the 0. Then it is your decision how you make use of the string, append a 0 or another separator or a string lenght information. With UTF8 times have changed again, confusion persists.
  • ErNa wrote: »
    That was definitely not the case in the old times. When copying files with unknown content you had to use a -b binary option to prevent the copy function from stopping when special characters were in the data.

    As far as I can tell, it was the case on old DOS/Windows systems, *nix systems never had that nonsense, if I want to copy a file why the OS should bother about the content ?
    ErNa wrote: »
    The question remains: what is a string? Is CR part of the string or a control character. Is a string terminated if a CR is detected and is the CR to be transferred or not. My opinion: 0 is the terminator. If you read a string, you get all the characters and either a length information or the 0. Then it is your decision how you make use of the string, append a 0 or another separator or a string lenght information. With UTF8 times have changed again, confusion persists.

    I don't agree, the string concept is specific to the programming language and we are talking about files here so any language-specific concept should be avoided when managing files that may be readed or written from other languages. For example, the null termination is a convention for C, the old Pascal on DOS used the first byte of a string in memory to hold the lenght (so had the 255 chars limit). If you need files formatted in a specific way then do it at the application level.

    A function named 'WriteString' implies a text source so if the convention for that particular language is that a string has a null terminator then it should write up to that and excludes the terminator (it is a convention of the language, other languages may not understand the null termination). Any other character should be written verbatim without translation. If you want to write the null termination then use a function named 'WriteBinary' with an additional lenght argument. WriteString is effectively a shortcut for WriteBinary(s,len(s)).

    Maybe add some methods that better clarify the functionality for text-based files, like 'WriteLine' which adds the line termination (either CR/LF or LF or CR alone), 'ReadLine' which reads up to the line termination. See fputs/fgets from the C library as an example. A 'ReadString' method is not needed as it means nothing.
  • kwinnkwinn Posts: 8,697
    A file needs to be able to store any type of data so the beginning and end of the file must be stored as part of the metadata for the file. The application determines how the data from that file is used, so nulls, cr, lf, and any other characters, floating point numbers, or any other type of data can be stored in a file.
  • ErNaErNa Posts: 1,752
    macca wrote: »
    A 'ReadString' method is not needed as it means nothing.
    That may be true, but if I understand correctly what Cluso wrote, readString and WriteString methods exist and are not compliant. If ReadString means something, the question is: does readstring return a 0 terminated string without the 0 and inserts a 0 to terminate the string, or does it read the string inclusive 0 and adds another 0 to terminate this zero terminated string. An experiment could show it: read to a non empty buffer and check for 1 or 2 zeros.
    And: is a CR/LF during read interpreted as end of string, means the string including CR/LF is read, but not terminated by 0 or is a 0 added to the string x, x, x, CR, LF.

  • ReadString and WriteString should be symmetric. Kye's code has been out a long time, so I don't think it should be changed. I think there should be a wrapper around WriteString that calls WriteString then writes a null to the file, maybe called WriteString0. ReadString should ideally be called ReadLine, but it's late to change it. Perhaps it could be renamed and a wrapper called ReadString could be added.
  • kwinnkwinn Posts: 8,697
    ErNa wrote: »
    macca wrote: »
    A 'ReadString' method is not needed as it means nothing.
    That may be true, but if I understand correctly what Cluso wrote, readString and WriteString methods exist and are not compliant. If ReadString means something, the question is: does readstring return a 0 terminated string without the 0 and inserts a 0 to terminate the string, or does it read the string inclusive 0 and adds another 0 to terminate this zero terminated string. An experiment could show it: read to a non empty buffer and check for 1 or 2 zeros.
    And: is a CR/LF during read interpreted as end of string, means the string including CR/LF is read, but not terminated by 0 or is a 0 added to the string x, x, x, CR, LF.

    IMO for consistency all strings should be 0 terminated with 0 written to the file.
    All ascii characters from $00 to $FF should be treated as characters for file I/O.
  • ErNa wrote: »
    That may be true, but if I understand correctly what Cluso wrote, readString and WriteString methods exist and are not compliant. If ReadString means something, the question is: does readstring return a 0 terminated string without the 0 and inserts a 0 to terminate the string, or does it read the string inclusive 0 and adds another 0 to terminate this zero terminated string.

    I believe that is silly to have a function that writes text in the convention expected by the programming language (this would break any other program expecting to read a string in another language or simply with another driver) but if the convention is to have 0 as the string terminator, WriteString should write the terminator, and ReadString should read the string with the terminator as this is the convention.
    ErNa wrote: »
    And: is a CR/LF during read interpreted as end of string, means the string including CR/LF is read, but not terminated by 0 or is a 0 added to the string x, x, x, CR, LF.

    CR/LF is not the string terminator so it should be treated as any other character. Think at what would happen if you WriteString("Hello\r\nWorld\r\n") then read it back. You expect a single string but returns two, or maybe 3 because after the last \r\n we have the null termination thus another empty string.

    This would be more easy and clear if we have ReadLine/WriteLine functions and WriteString just writes the text without the terminator.
  • Mike GreenMike Green Posts: 23,101
    edited 2016-02-28 17:00
    I think macca is correct. When Mac programming was mostly done in Pascal and you had to live with both string conventions, WriteString wrote the content of the string to the file with no terminator and WriteLine used CR as the line terminator. You could convert C style strings with a null terminator back and forth to Pascal style strings with a length byte as long as the Pascal strings didn't include a null byte.

    WriteString needs to write the content of the string (up to but not including the null byte). WriteLine should be a wrapper around WriteString that adds a user configurable line terminator (CR, LF, CR/LF) to the string content. A 4th choice could be a null byte. ReadLine should read the string up to but not including the line terminator and discard the line terminator. Since what's read is a C-string, the variable's value must be null terminated. A null, should it be present on disk, could be treated as a line terminator since it's invalid in a string value.
  • ErNaErNa Posts: 1,752
    edited 2016-02-28 20:25
    Cluso99 wrote: »
    I have spent weeks looking for a bugs in my conversion of Michael Park's Sphinx compiler.

    Finally I have found the last obscure bug (hopefully ;) )

    In sxfile.spin the WriteString writes out the string character plus the terminating null. In sd_fatengine.spin the WriteString does not write out the null terminator.

    Kye handles ReadString (terminates with <cr>, <lf> or <null>) the <cr>, <lf> or <null>is included in the string buffer as a terminator.
    Therefore, I think that WriteString (which does not look for <cr> or <lf> BTW) should write out the null terminator.
    PUB writeString(stringPointer) '' 36+ Stack Longs
    
    '' ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    '' // Writes a string to the file that is currently open and advances the file position by the string length.
    '' //
    '' // This method will do nothing if a file is not currently open for writing.
    '' //
    '' // If an error occurs this method will abort and return a pointer to a string describing that error.
    '' //
    '' // StringPointer - A pointer to a string to write to the file. Writes nothing when at the end of a maximum size file.
    '' ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    
    ' writeData(stringPointer, strsize(stringPointer))
      writeData(stringPointer, strsize(stringPointer) + 1)                          'RR20160228 write out the null terminator
    

    What are your thoughts???
    Should SD_FATEngine WriteString include the null terminator when writing to a file or not???
    I can just as easily modify the compiler - I am just not sure which way to go.
    Back to the original question:
    In ReadString as Cluso says, <cr/lf> or <0> terminate a string. In my opinion a string is an array of (ascii) characters. Our ancestors had defined the ascii character DLE as an exception. Using <DLE 0> to signal end of ascii string would have solved the problem to interpret a character as data or control. But other ancestors decided to see strings as a subset of ascii and to terminate the string by <0>. That is were is begins, not with the apple in paradies.

    Whenever I will make use of the driver I will do it this way: Exclude <0> from the characters in a string and use it as terminator.
    MyWriteString will write a 0-terminates string including the 0. MyReadString will read the string including the 0. All other characters will be seen as equal, but not the same.
  • Cluso99Cluso99 Posts: 18,069
    Thanks for the discussion guys.
    To explain further, the output file is a binary file mixed with text (strings) and binary, so its a binary file.

    I can simply avoid the problem and use writeString(@s) followed by write a null with writeData(null,1). I am using my modified version of SD_FATEngine anyway.

    What I was more concerned with whether this was considered to be a bug and requiring fixing or not. I would like to see more discussion before any changes are made to Kye's original SD_FATEngine.
  • kwinnkwinn Posts: 8,697
    Perhaps I am missing the point of this discussion, but if we r/w bytes to an sd card and want to be able to include all 256 of them would it not make sense to have a string function that writes up to and including the 0, and a binary function that r/w x number of bytes including any embedded zeros?
  • ErNaErNa Posts: 1,752
    The question is (as I see it): does it make sense to mix data and code. If they are strictly separated the data can not modify the code. The code "read a string" of data is a shortcut for "loop for read an atom until atom that is not data but control". So exception "stop-atom" in the data stream modifies the code. In the end it is a philosophical question. Like: more cogs or more ram ;-)
  • The question is "what is the content of a string". If we implement a string as some sequence of byte values 1-255 terminated by a zero byte, does the content for external purposes include the internal (to the program) marker for the end of the string?
  • ErNaErNa Posts: 1,752
    I believe this discussion is as old as the idea to create something that is a computer and much older. Before we answer the question what is content of a string we have to agree, what is a string. And what is content. In this case we agree, that there is a byte of 8 bits that can have to states and so we can respresent 256 different states of a byte. What allows us to enumerate 256 other entities which may have nothing in common exect being enumerated by that given byte. If we decide, that the byte itself is such an entity, only 255 other entities can be enumerated. If 256 is not enough there is a consense to increase the number of bits in chunks of 8. And now there are two ways: create an object "word" that has 16 bit .... or create an object array that has 2 bytes. The first way means, new object types are created and there exist as much types as are created. Or, going the second way, a new object type array is created and a rule how to define the properties of this object. One way is to create a new object type "self described array" where a first part keeps the information length of the array and the second is an array of that length that keeps the byte information. A second way is to exclude one code from the content of the bytes (like "0") and use this code to signal the end of the array. And so on... It is just a question of convention. But if I read strings that can be limited by 0, then writing the strings must create a 0. That doesn't seem to be the case. And I believe there is a bug in the moment, when cr, lf and 0 are equivalent to signal an end of string. Writing a string to a terminal is sensitive to the presence or not of cr and lf. But should not be influenced by 0's . Read string should only look to "0", while readline takes care of CR and LF and I would propose to signal what the limiter of the string is: CR, LF, CR/LF, or no end of line, but 0 as end of string.
  • All this talk about old systems, different programming languages, outdated conventions... seems pretty ridiculous. Those are all great discussions when designing something new - but this isn't new. Spin is established, the SD interface is established, FAT filesystems rules are established.... what is all this discussion for???

    Kye's driver is a FAT filesystem driver. FAT filesystems have metadata for the file length. We don't need the null terminator in the file; end of story.

    If you want to write data instead of a string, then use a different method such as WriteData which takes an array and a length instead of just an array.
  • ErNaErNa Posts: 1,752
    edited 2016-02-29 15:26
    Take it easy, it's getting complex from alone ;-)
  • All I can say is that null terminator is a pesky nuisance :)
  • Cluso99Cluso99 Posts: 18,069
    Perhaps it would be better to just have a basic interface that supports ReadData and WriteData, and a separate module to support things such as Read & Write for Line (ie ReadLine & WriteLine), Byte, Long, String, etc.

    I already have a cut down version of the spin interface to reduce the footprint, with the PASM section a separate object too.
  • ErNaErNa Posts: 1,752
    Yes. KISS. In Qt there is a mechanism, where you write a structure with an identifier, so whenever you gain access, the identifier allows you to handle the request correctly. Thats a nice idea. For personal use I could imagine to enumerate all data structures I ever created, and so, having a directory of data structures, I could always act accordingly. (one of my goals. At least I know, I will never run out of numbers)
Sign In or Register to comment.