Shop OBEX P1 Docs P2 Docs Learn Events
ATTN: WINDOWS Users - Get Rid Of Them Squares — Parallax Forums

ATTN: WINDOWS Users - Get Rid Of Them Squares

idbruceidbruce Posts: 6,197
edited 2015-02-20 10:25 in General Discussion
First off, let me just say that I am not picking on Phil, I just had his code handy to use as an example :)

Over the years I have downloaded many files, and many of those files were not completely compatible with a WINDOWS OS, especially when it comes to newlines and carriage returns. When the log files for my one of my websites started arriving with embedded squares, and being in total disarray, I finally had to do something, so I wrote a program to make these files WINDOWS compatible.

As you may have guessed, I have altered this program several times to accept a variety of file extensions, so I finally decided today to rewrite the program, to accept any type of file. Since it is a very handy program to have available, I decided I would share it with the Parallax community. However, I am not certain which WINDOWS operating systems it will work on, so all I can say is test it and see if it works.

In the images below, you will see before and after snapshots, taken of Phil's SpinScope PERL code.

attachment.php?attachmentid=113233&d=1424398160
attachment.php?attachmentid=113234&d=1424398173
500 x 400 - 65K
500 x 400 - 24K

Comments

  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-19 18:51
    Bruce, what if the offending character in Phil's code? Is it a LF without a CR? You can use WordPad under Windows to read files without a CR, and then save them with a CR-LF.
  • idbruceidbruce Posts: 6,197
    edited 2015-02-19 19:09
    Dave
    Is it a LF without a CR?

    Yes, WINDOWS requires CR + LF - E.G.: strFilelines.Replace("\n", "\r\n");
    You can use WordPad under Windows to read files without a CR, and then save them with a CR-LF.

    Never knew that.... It works well. However, there is one more step involved, thus making it more time consuming. I suppose my program would be more useful if it allowed multiple file selection, such as editing the contents of an entire folder, regardless of file extension.

    EDIT: I will have to update that option at a later date, unless of course there is some interest, then perhaps I could be persuaded to make the necessary modifications.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2015-02-19 19:46
    I always use just a LF to terminate lines in Perl. It's the Linux standard. I had to pick one standard, and that's the one I picked.

    Windows: CR+LF
    Linux: LF
    OS/X: CR

    See? You can't please everyone! :)

    idbruce: If you used a modern programming editor like UltraEdit (highly recommended, BTW), you could convert back and forth among multiple line-termination standards with ease.

    -Phil
  • idbruceidbruce Posts: 6,197
    edited 2015-02-19 20:58
    Phil
    If you used a modern programming editor like UltraEdit (highly recommended, BTW), you could convert back and forth among multiple line-termination standards with ease.

    That is one nice editor, with a lot of features. I always just wrote my PERL scripts with Notepad. Oh what a joy that was.

    I have always hosted my websites on UNIX based machines. Funny thing is, I would write the scripts in Notepad and upload them via FTP. If I ever FTP'ed them back, they were still in WINDOWS style. At least that is how it was with this one provider, back when I was writing scripts. At this point, it has been a while since I have written any serious scripts, so I don't know how they would turn out with my current providers.

    By the way, just love that image data..... New one on me.
  • potatoheadpotatohead Posts: 10,261
    edited 2015-02-19 21:02
    You can get this for windows or any unix. Drop it in your path, and batch it, whatever.

    http://waterlan.home.xs4all.nl/dos2unix.html
  • ElectrodudeElectrodude Posts: 1,661
    edited 2015-02-19 21:14
    OSX doesn't use CR. OS <= 9 did, but OSX is Unix based and uses LF. Windows is the only one still doing weird things.
  • abecedarianabecedarian Posts: 312
    edited 2015-02-19 23:10
    Windows is only following the standards that were in place long before Unix came along.

    Carriage return = move the print head (cursor on the terminal) to the beginning of the current line.
    Line feed = move the page up (or down depending on your point of view) one line.

    If you want to be at the beginning of the next line, you have to CR + LF.

    Sadly, IBM was highly influential in ASCII, but chose to implement EBCDIC on their systems. So, NL was something IBM introduced with EBCDIC; it didn't exist before hand. And, had they assigned VT (vertical tab) to the same code LF used, there wouldn't be much of a problem, other than converting *nix documents to proper formatting.

    It's more complicated than that, but you want to talk about manipulating the market for their benefit?
  • Heater.Heater. Posts: 21,230
    edited 2015-02-20 03:13
    Wait a minute. The idea of using a single character to represent the end of a line dates back to at least the multics operating system, 1964. Unix being born from that environment adopted the idea.

    It's obviously a good idea. Having CR and LF was only a hang over from the mechanical requirements of the teletypes and printers and has no place in the semantics of the text you are writing.
    Having the mechanics of a printer dictate what goes in your text file is like always having to deal with Ethernet and IP packet headers whenever you send or receive data over a network interface. multics was smart enough to realize this and invented the idea of device drivers that would do whatever was required to get a new line onto the printed output.

    Conceptually a line end was represented by a single action since the very early typewriters. One typed a line and then swiped the big carriage return lever over to the left. That single action got you to the left of the page and also advance the paper by one line. So multics was only following on the ideas of the ancient typewriter.

    MS-DOS got it's crude idea of line endings from CP/M. CP/M was very small, rude and crude and behind the state of the ideas in software engineering by ten years or so.

    So here we are with Windows still emulating a teletype four decades later and annoying everyone in the process. Thank you MS.

    Writing you own program to "fix" unix files is kind of pointless. There are programs that already do this as pointed out above. If you use a half decent editor it will sort out the line endings for you or you can tell it to convert and save as you like. I suggest Sublime Text, then you get to use the same editor and all it's great features and plugins on Linux, Mac and Windows.

    In general I would suggest not "fixing" files. What if you make useful changes to it and want to submit them back to Phil? Then you have given him a problem to deal with. I have seen plenty of messed up source codes where both styles are used.

    Aside: Famously the late George Marsaglia published his "Die Hard" statistical tests for randomness along with millions of random numbers on a CD. Sadly he copied the random binary files using MSDOS as text mode files (forgot the \b switch) and all the 0d became 0d0a. That meant his binary numbers were no as statistically random as he intended. Thank you MS.

    Of course, this is all small fry. Today we have unicode. A gigantic text representation mess that is impossible to deal with and is slowly corrupting all documents as they move around the net.
  • idbruceidbruce Posts: 6,197
    edited 2015-02-20 03:47
    Writing you own program to "fix" unix files is kind of pointless. There are programs that already do this as pointed out above.

    Heater that was a silly comment.

    I can understand that way of thinking for people that do not write programs, but if I can write my own, why wouldn't I? I know what is in there, I know it will be safe, and I know it doesn't cost me anything, plus the added benefit of keeping up with practice coding. Simple code, no sweat, no worries.
    void CNewLineModifierDlg::OnSelectFile() 
    {
    	
    	CFileDialog dlg(TRUE, NULL, NULL, NULL,
    		_T("All Files(*.*)|*.*||"));
    
    	dlg.m_ofn.lpstrTitle = _T("Select A File To Edit");
    
    	if(dlg.DoModal() == IDOK)
    	{
    		SetDlgItemText(IDC_FILE_PATH, dlg.GetPathName());		
    	}
    }
    
    void CNewLineModifierDlg::OnModifyFile() 
    {
    	CString strFilePath;
    	CString strFileline;
    	CString strFilelines;
    	CStdioFile Stdiofile;
    
    	TCHAR szFileLine[MAX_PATH] = _T("");
    
    	GetDlgItemText(IDC_FILE_PATH, strFilePath);	
    
    	Stdiofile.Open(strFilePath, CFile::modeRead);
    
    	while(Stdiofile.ReadString(szFileLine, 240))
    	{
    		strFilelines += (LPCTSTR)szFileLine;
    	}
    
    	Stdiofile.Close();
    
    	strFilelines.Replace("\n", "\r\n");
    
    	int nBufferSize = strFilelines.GetLength();
    	LPTSTR lpBuffer = strFilelines.LockBuffer();
    
    	CFile File;
    	File.Open(strFilePath, CFile::modeReadWrite | CFile::modeCreate);
    	File.SeekToEnd();
    	File.Write(lpBuffer, nBufferSize);
    	File.Close();
    
    	strFilelines.UnlockBuffer();
    
    	MessageBox("Modification is complete!");
    }
    
  • Heater.Heater. Posts: 21,230
    edited 2015-02-20 04:57
    idbruce,
    Silly comment..
    Me? No, never :)
    I can understand that way of thinking for people that do not write programs...
    In my experience programmers are very lazy. They do not want to be rewriting all the libraries, tools, utilities and operating systems they use unless there is a good reason to do so. In fact programmers work very hard at being lazy, getting all those things to be reusable and portable.
    ...if I can write my own, why wouldn't I?
    You're supposed to be working on the Tea Cup port :)

    What's wrong with using just:
    perl -0pe 's/[\n]/\r\n/g' < unix.txt > dos.txt
    
    A quick, simple, portable solution.

    EDit: I bet Phil never expected me to ever post a Perl one liner !
  • idbruceidbruce Posts: 6,197
    edited 2015-02-20 05:20
    I bet Phil never expected me to ever post a Perl one liner !

    LOL Me either. If I take a break from Teacup, I may take a peek at that one line solution. :)
  • Heater.Heater. Posts: 21,230
    edited 2015-02-20 05:39
    You are right Bruce, "practice coding". I had to dink around with 'tr' and 'sed' before realizing the Perl version was the way to go. Even that is a pointless waste of time as I have unix2dos/dos2unix here. Still substituting text in a file is a common requirement so it's good to remember how to do it.
  • idbruceidbruce Posts: 6,197
    edited 2015-02-20 05:56
    You are right Bruce, "practice coding".

    Besides all this talk about editors.... Over the last several days, I have been battling the urge to create my own editor.... And it would be good practice, creating a nice interface, working with colors, text formatting, etc.... However, I already have too many irons in the fire and I certainly don't need more distractions.

    I really want to create my own IDE, but I must remain focused.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2015-02-20 08:18
    heater wrote:
    I bet Phil never expected me to ever post a Perl one liner !
    Especially not one containing a regex!! Are you feeling okay?
    ____________________

    One does have to be careful about simple-minded wholesale changes to a source file. What if the file were to contain multi-line quoted strings? (My Perl code does, BTW; although in this case, changing the quoted line endings is probably harmless.) It's important to distinguish between line endings that are semantically insignificant and those that are part of the content. For one thing, adding CRs to a multiline quoted string changes the length of the string, which could be detrimental to some deeply-embedded program logic. Also, if a program includes a regex that operates on such a string, and it's not expecting to find CRs in it, that could throw off the logic of the regex.

    So be careful, Bruce. If you want to do this right, your conversion program needs to do a deeper syntactical analysis to avoid changing line endings that should not be changed.

    -Phil
  • potatoheadpotatohead Posts: 10,261
    edited 2015-02-20 08:36
    The botched change is one reason why I personally would just fetch a program designed for the purpose. Linked here, as what I've always used when I'm not using an editor capable of it.

    Re: Sublime. Pure awesome. Recommended. Great program.

    As for, "it doesn't cost anything...."

    But it does! Everything costs something. Heck, it cost me maybe $0.20 to author this. Does for you guys too, though your basic economic burden may vary. That cost may see a benefit, such as practice as mentioned. If so, then it's good, but there are always costs. Always.

    I very strongly recommend thinking that through. There are time, materials, and project opportunity cost benefits for doing so. But that's just me. YMMV
  • Heater.Heater. Posts: 21,230
    edited 2015-02-20 09:24
    Phil,
    Especially not one containing a regex!! Are you feeling okay?
    Some times I hide my lights under so many bushels even I can't find them! I'm a closet perl user, please don't tell anyone.

    No, I'm not feeling well, just recently I have been playing with parser generators. Like grown up regexps:)
    All that recursive decent expression parsing and operator precedence level handling gives me headache.
    What if the file were to contain multi-line quoted strings?
    I found that idea disturbing enough that I had to try it.

    Here is my test perl code with multi-line string:
    use strict;
    use warnings;
     
    my $message = <<'END_MESSAGE';
    AAAAAAAAAA
    AAAAAAAAAA
    AAAAAAAAAA
    AAAAAAAAAA
    AAAAAAAAAA
    AAAAAAAAAA
    END_MESSAGE
     
    print $message;
    print length($message);
    

    Here is what happens when I run it as both unix and DOS format:
    $ dos2unix foo.perl 
    dos2unix: converting file foo.perl to Unix format ...
    $ file foo.perl 
    foo.perl: ASCII text
    $ perl foo.perl > outUnix.txt
    $ unix2dos foo.perl 
    unix2dos: converting file foo.perl to DOS format ...
    $ file foo.perl 
    foo.perl: ASCII text, with CRLF line terminators
    $ perl foo.perl > outDos.txt
    $ ls -l out*
    -rw-r--r-- 1 michael michael 68 Feb 20 19:10 outDos.txt
    -rw-r--r-- 1 michael michael 68 Feb 20 19:09 outUnix.txt
    $ diff outUnix.txt outDos.txt 
    $
    
    Seems perl is clever enough not to worry about it's own source code being in DOS or unix format.
    The output string is always the same (unix format).

    If you run that the reported string lengths are of course the same.

    Now: The question is what result do you get when you run that on a Windows machine? Does it then output DOS format? Anyone going to try it for me, I have no Windows machines here.

    So, Bruce, sounds like you need not worry about that conversion.
  • Heater.Heater. Posts: 21,230
    edited 2015-02-20 09:39
    idbruce,
    Over the last several days, I have been battling the urge to create my own editor..
    Stop immediately. Turn off your computer. Go out and get drunk for a week. Or alternatively pull up the source code to emacs and start studying it. That should cure you of this notion.

    The world has thousands of editors or ready. I'm sure writing a nice editor is not easy. When you throw in all the work you will need for syntax highlighting plug-ins and all the other amazing things editors do now a days you have a life long project.

    Perhaps, though, using a ready made text editor component in a Qt application or WEB based editor like CodeMirror is a useful thing to learn. Text editing is required in many applications that are not primarily editors.

    I'm with Potatohead, use Sublime text. It's an amazing program. Works on Linux, Mac and Windows. It's fast. It has a huge selection of plugins to do almost anything you might want.

    And here is another shocker for the forum from me. Despite my Open Source and Free Software advocacy Sublime is the only closed source program I have actually paid money for and felt that I got my monies worth!
  • idbruceidbruce Posts: 6,197
    edited 2015-02-20 10:25
    Phil
    One does have to be careful about simple-minded wholesale changes to a source file. What if the file were to contain multi-line quoted strings? (My Perl code does, BTW; although in this case, changing the quoted line endings is probably harmless.) It's important to distinguish between line endings that are semantically insignificant and those that are part of the content. For one thing, adding CRs to a multiline quoted string changes the length of the string, which could be detrimental to some deeply-embedded program logic. Also, if a program includes a regex that operates on such a string, and it's not expecting to find CRs in it, that could throw off the logic of the regex.

    So be careful, Bruce. If you want to do this right, your conversion program needs to do a deeper syntactical analysis to avoid changing line endings that should not be changed.

    Yea, I imagine there are a probable several instances where it could cause unwanted or unnecessary grief.

    Heater
    When you throw in all the work you will need for syntax highlighting plug-ins and all the other amazing things editors do now a days you have a life long project.

    I have worked quite extensively on a couple of editors, and you are correct, they are a lot of work. In fact, I was once trying to duplicate the exact behavior of MS Word and I think I came pretty darn close, but I think there was something about OLE containers that were giving me some problems, such as not getting the full functionality that I wanted or needed, so I just threw it on the back burner.
  • cgraceycgracey Posts: 14,237
    edited 2015-09-18 22:01
    Heater. wrote: »
    And here is another shocker for the forum from me. Despite my Open Source and Free Software advocacy Sublime is the only closed source program I have actually paid money for and felt that I got my monies worth!

    Someone mentioned this on the Propeller 2 forum yesterday and I was intrigued, so I downloaded it. It is fantastic! I went and paid the $70 for the license and grabbed a SystemVerilog setup for it and it has made a world of difference.

    It's really smart about things and very compliant with whatever else you've got going on. In my case, it recognized, on it's own, that my file had tab stops at every 4th position. It just figured it out and went with it, without clobbering everything with spaces, or something. It preserves whatever scheme you've got going. It also automatically reloads open files that were written elsewhere. Without even knowing anything about its fancy capabilities, it's really cooperative with my work flow.

    Another nice thing, it has many dark-background color schemes which I need for these 4k monitors that I sit in front of, but it doesn't ruin the quiet effect with blindingly bright white status windows all over the place. It's very subtle. Also, they use really nice anti-aliasing on the text rendering.

    Here is what I get to look at now. Notice the zoomed-out view that stays on the right side, always letting you see the greater context:

    SublimeText.png

    I haven't felt this excited about an application since I got Copy ][+ for my old Apple ][ in the early 80's. That was another really good program that just clicked.
    2161 x 3840 - 1M
  • It was me, and yes, I love that program. Somebody thought through the more subtle aspects of text editing and nailed it.

    Yeah, I used it maybe 30 minutes and paid right up.

    Sure wish a Pi port would show up one day.

  • I have often had trouble using NotePad on code that was written on a Mac because the lines would sometimes be all jumbled together.

    I was thrilled to discover that if I open the file in WordPad instead then the formatting is perfect.
    Then I can Copy and Paste it over to NotePad and play with it.
    I like to copy notes into programs and using NotePad I don't need to worry about the Compiler thinking my notes are lines of code.

    Back in the DOS days a good text editor was worth it's weight in gold.
  • WhitWhit Posts: 4,191
    I wish I was smart enough to understand all this...
  • Heater.Heater. Posts: 21,230
    Great stuff Chip.

    "that my file had tab stops at every 4th position"

    I hope that is using spaces not TABS....ducks head :)
  • It's really smart about things and very compliant with whatever else you've got going on. In my case, it recognized, on it's own, that my file had tab stops at every 4th position. It just figured it out and went with it, without clobbering everything with spaces, or something. It preserves whatever scheme you've got going.

    I believe that is essential for an editor.
  • There is some saying that a real man has to plant a tree, produce a son and write a editor.

    I did that editor thing in COBOL. So not really comparable with GUI things nowadays. Quite challenging at the time of big iron.

    Enjoy!

    Mike
Sign In or Register to comment.