ATTN: WINDOWS Users - Get Rid Of Them Squares
idbruce
Posts: 6,197
First off, let me just say that I am not picking on Phil, I just had his code handy to use as an example
Over the years I have downloaded many files, and many of those files were not completely compatible with a WINDOWS OS, especially when it comes to newlines and carriage returns. When the log files for my one of my websites started arriving with embedded squares, and being in total disarray, I finally had to do something, so I wrote a program to make these files WINDOWS compatible.
As you may have guessed, I have altered this program several times to accept a variety of file extensions, so I finally decided today to rewrite the program, to accept any type of file. Since it is a very handy program to have available, I decided I would share it with the Parallax community. However, I am not certain which WINDOWS operating systems it will work on, so all I can say is test it and see if it works.
In the images below, you will see before and after snapshots, taken of Phil's SpinScope PERL code.
Over the years I have downloaded many files, and many of those files were not completely compatible with a WINDOWS OS, especially when it comes to newlines and carriage returns. When the log files for my one of my websites started arriving with embedded squares, and being in total disarray, I finally had to do something, so I wrote a program to make these files WINDOWS compatible.
As you may have guessed, I have altered this program several times to accept a variety of file extensions, so I finally decided today to rewrite the program, to accept any type of file. Since it is a very handy program to have available, I decided I would share it with the Parallax community. However, I am not certain which WINDOWS operating systems it will work on, so all I can say is test it and see if it works.
In the images below, you will see before and after snapshots, taken of Phil's SpinScope PERL code.
Comments
Yes, WINDOWS requires CR + LF - E.G.: strFilelines.Replace("\n", "\r\n");
Never knew that.... It works well. However, there is one more step involved, thus making it more time consuming. I suppose my program would be more useful if it allowed multiple file selection, such as editing the contents of an entire folder, regardless of file extension.
EDIT: I will have to update that option at a later date, unless of course there is some interest, then perhaps I could be persuaded to make the necessary modifications.
Windows: CR+LF
Linux: LF
OS/X: CR
See? You can't please everyone!
idbruce: If you used a modern programming editor like UltraEdit (highly recommended, BTW), you could convert back and forth among multiple line-termination standards with ease.
-Phil
That is one nice editor, with a lot of features. I always just wrote my PERL scripts with Notepad. Oh what a joy that was.
I have always hosted my websites on UNIX based machines. Funny thing is, I would write the scripts in Notepad and upload them via FTP. If I ever FTP'ed them back, they were still in WINDOWS style. At least that is how it was with this one provider, back when I was writing scripts. At this point, it has been a while since I have written any serious scripts, so I don't know how they would turn out with my current providers.
By the way, just love that image data..... New one on me.
http://waterlan.home.xs4all.nl/dos2unix.html
Carriage return = move the print head (cursor on the terminal) to the beginning of the current line.
Line feed = move the page up (or down depending on your point of view) one line.
If you want to be at the beginning of the next line, you have to CR + LF.
Sadly, IBM was highly influential in ASCII, but chose to implement EBCDIC on their systems. So, NL was something IBM introduced with EBCDIC; it didn't exist before hand. And, had they assigned VT (vertical tab) to the same code LF used, there wouldn't be much of a problem, other than converting *nix documents to proper formatting.
It's more complicated than that, but you want to talk about manipulating the market for their benefit?
It's obviously a good idea. Having CR and LF was only a hang over from the mechanical requirements of the teletypes and printers and has no place in the semantics of the text you are writing.
Having the mechanics of a printer dictate what goes in your text file is like always having to deal with Ethernet and IP packet headers whenever you send or receive data over a network interface. multics was smart enough to realize this and invented the idea of device drivers that would do whatever was required to get a new line onto the printed output.
Conceptually a line end was represented by a single action since the very early typewriters. One typed a line and then swiped the big carriage return lever over to the left. That single action got you to the left of the page and also advance the paper by one line. So multics was only following on the ideas of the ancient typewriter.
MS-DOS got it's crude idea of line endings from CP/M. CP/M was very small, rude and crude and behind the state of the ideas in software engineering by ten years or so.
So here we are with Windows still emulating a teletype four decades later and annoying everyone in the process. Thank you MS.
Writing you own program to "fix" unix files is kind of pointless. There are programs that already do this as pointed out above. If you use a half decent editor it will sort out the line endings for you or you can tell it to convert and save as you like. I suggest Sublime Text, then you get to use the same editor and all it's great features and plugins on Linux, Mac and Windows.
In general I would suggest not "fixing" files. What if you make useful changes to it and want to submit them back to Phil? Then you have given him a problem to deal with. I have seen plenty of messed up source codes where both styles are used.
Aside: Famously the late George Marsaglia published his "Die Hard" statistical tests for randomness along with millions of random numbers on a CD. Sadly he copied the random binary files using MSDOS as text mode files (forgot the \b switch) and all the 0d became 0d0a. That meant his binary numbers were no as statistically random as he intended. Thank you MS.
Of course, this is all small fry. Today we have unicode. A gigantic text representation mess that is impossible to deal with and is slowly corrupting all documents as they move around the net.
Heater that was a silly comment.
I can understand that way of thinking for people that do not write programs, but if I can write my own, why wouldn't I? I know what is in there, I know it will be safe, and I know it doesn't cost me anything, plus the added benefit of keeping up with practice coding. Simple code, no sweat, no worries.
Me? No, never In my experience programmers are very lazy. They do not want to be rewriting all the libraries, tools, utilities and operating systems they use unless there is a good reason to do so. In fact programmers work very hard at being lazy, getting all those things to be reusable and portable. You're supposed to be working on the Tea Cup port
What's wrong with using just: A quick, simple, portable solution.
EDit: I bet Phil never expected me to ever post a Perl one liner !
LOL Me either. If I take a break from Teacup, I may take a peek at that one line solution.
Besides all this talk about editors.... Over the last several days, I have been battling the urge to create my own editor.... And it would be good practice, creating a nice interface, working with colors, text formatting, etc.... However, I already have too many irons in the fire and I certainly don't need more distractions.
I really want to create my own IDE, but I must remain focused.
____________________
One does have to be careful about simple-minded wholesale changes to a source file. What if the file were to contain multi-line quoted strings? (My Perl code does, BTW; although in this case, changing the quoted line endings is probably harmless.) It's important to distinguish between line endings that are semantically insignificant and those that are part of the content. For one thing, adding CRs to a multiline quoted string changes the length of the string, which could be detrimental to some deeply-embedded program logic. Also, if a program includes a regex that operates on such a string, and it's not expecting to find CRs in it, that could throw off the logic of the regex.
So be careful, Bruce. If you want to do this right, your conversion program needs to do a deeper syntactical analysis to avoid changing line endings that should not be changed.
-Phil
Re: Sublime. Pure awesome. Recommended. Great program.
As for, "it doesn't cost anything...."
But it does! Everything costs something. Heck, it cost me maybe $0.20 to author this. Does for you guys too, though your basic economic burden may vary. That cost may see a benefit, such as practice as mentioned. If so, then it's good, but there are always costs. Always.
I very strongly recommend thinking that through. There are time, materials, and project opportunity cost benefits for doing so. But that's just me. YMMV
No, I'm not feeling well, just recently I have been playing with parser generators. Like grown up regexps:)
All that recursive decent expression parsing and operator precedence level handling gives me headache. I found that idea disturbing enough that I had to try it.
Here is my test perl code with multi-line string:
Here is what happens when I run it as both unix and DOS format: Seems perl is clever enough not to worry about it's own source code being in DOS or unix format.
The output string is always the same (unix format).
If you run that the reported string lengths are of course the same.
Now: The question is what result do you get when you run that on a Windows machine? Does it then output DOS format? Anyone going to try it for me, I have no Windows machines here.
So, Bruce, sounds like you need not worry about that conversion.
The world has thousands of editors or ready. I'm sure writing a nice editor is not easy. When you throw in all the work you will need for syntax highlighting plug-ins and all the other amazing things editors do now a days you have a life long project.
Perhaps, though, using a ready made text editor component in a Qt application or WEB based editor like CodeMirror is a useful thing to learn. Text editing is required in many applications that are not primarily editors.
I'm with Potatohead, use Sublime text. It's an amazing program. Works on Linux, Mac and Windows. It's fast. It has a huge selection of plugins to do almost anything you might want.
And here is another shocker for the forum from me. Despite my Open Source and Free Software advocacy Sublime is the only closed source program I have actually paid money for and felt that I got my monies worth!
Yea, I imagine there are a probable several instances where it could cause unwanted or unnecessary grief.
Heater
I have worked quite extensively on a couple of editors, and you are correct, they are a lot of work. In fact, I was once trying to duplicate the exact behavior of MS Word and I think I came pretty darn close, but I think there was something about OLE containers that were giving me some problems, such as not getting the full functionality that I wanted or needed, so I just threw it on the back burner.
Someone mentioned this on the Propeller 2 forum yesterday and I was intrigued, so I downloaded it. It is fantastic! I went and paid the $70 for the license and grabbed a SystemVerilog setup for it and it has made a world of difference.
It's really smart about things and very compliant with whatever else you've got going on. In my case, it recognized, on it's own, that my file had tab stops at every 4th position. It just figured it out and went with it, without clobbering everything with spaces, or something. It preserves whatever scheme you've got going. It also automatically reloads open files that were written elsewhere. Without even knowing anything about its fancy capabilities, it's really cooperative with my work flow.
Another nice thing, it has many dark-background color schemes which I need for these 4k monitors that I sit in front of, but it doesn't ruin the quiet effect with blindingly bright white status windows all over the place. It's very subtle. Also, they use really nice anti-aliasing on the text rendering.
Here is what I get to look at now. Notice the zoomed-out view that stays on the right side, always letting you see the greater context:
I haven't felt this excited about an application since I got Copy ][+ for my old Apple ][ in the early 80's. That was another really good program that just clicked.
Yeah, I used it maybe 30 minutes and paid right up.
Sure wish a Pi port would show up one day.
I was thrilled to discover that if I open the file in WordPad instead then the formatting is perfect.
Then I can Copy and Paste it over to NotePad and play with it.
I like to copy notes into programs and using NotePad I don't need to worry about the Compiler thinking my notes are lines of code.
Back in the DOS days a good text editor was worth it's weight in gold.
"that my file had tab stops at every 4th position"
I hope that is using spaces not TABS....ducks head
I believe that is essential for an editor.
I did that editor thing in COBOL. So not really comparable with GUI things nowadays. Quite challenging at the time of big iron.
Enjoy!
Mike