Shop OBEX P1 Docs P2 Docs Learn Events
Line endings, UTF-16, tabs and other formatting Holy Wars. - Page 2 — Parallax Forums

Line endings, UTF-16, tabs and other formatting Holy Wars.

2»

Comments

  • __red____red__ Posts: 470
    edited 2014-08-14 08:27
    jazzed wrote: »
    Why should anyone need to go through that? What possible advantage beyond byte frugality is there in having to do it?

    The reason is choice. Some people like 2 space formatting, others 4, others 8. If you use a single tab then people can choose their own value for display.

    Separation of content and presentation is a design principal.
  • jazzedjazzed Posts: 11,803
    edited 2014-08-14 08:44
    __red__ wrote: »
    The reason is choice. Some people like 2 space formatting, others 4, others 8. If you use a single tab then people can choose their own value for display.

    Separation of content and presentation is a design principal.

    I'll think about that. However my experience has been that it doesn't work either because of personal choices being committed in a repository used by hundreds of people.
  • potatoheadpotatohead Posts: 10,261
    edited 2014-08-14 10:05
    Why not upload a style document then?
  • ksltdksltd Posts: 163
    edited 2014-08-14 11:39
    The debate between spaces and tabs was resolved in the 70s. The solution is a language sensitive editor that scans, parses and pretty prints as you type.

    If you've been fortunate enough to ever have used such a beast, the pain of using character editors for creating, reading or maintaining software just seems absurd.
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-08-14 13:00
    In a previous job I worked on code that contained mixtures of tab/space indentation, DOS/Unix line termination and various preferences for the placement of braces -- all within the same source file. Whenever I needed to change one of these files I would first run it through the indent utility to produce a file with my own preferences. This took a file that looked like something from an obfuscated code contest, and turned it into something that was readable. However, once I was ready to check in my changes I incorporated them back into the original obfuscated-looking source file to minimize the amount of differences generated in the repository.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-15 06:18
    @__red__
    You can set tabstop in github's web interface to 4 (as Chip uses) like this....I believe that the reason you had cut and paste issues is because your browser or source-code viewer had tapstop set to 8.....
    I have cut and pasted below from the tab'd version with my editor set tabstop=4...So, if you have your tools configured appropriately for tab usage it works exactly as it should.
    I am very glad to hear that you have finally seen sense and agree that tabs are broken.

    What you have done there is set up all your tools so as to filter out tabs and replace them with spaces. Then when you copy and paste the code around here, there, and everywhere you get the intended result.

    So why not just ban tabs at source and then you never have to worry about setting tab widths again, you never have to recheck those settings on every tool you use, you never have to scrutinize the result to make sure it came out OK, you never have problems with contributors editing your code with incorrect tabbing and submitting patches that mess every thing up, you never have diffs full of redundant formatting changes as some half whit has "repaired" your white space, again.
    To be clear, it's no dig on you to say you don't have it configured properly for tabs because you don't use them :-)
    I do have my editors configured properly thank you very much. When I hit the tab key in my editor it inserts the number of spaces required to get the cursor to the next tab stop position. Tabs are set to 4. When I save my files there are no tabs in there.
    The reason is choice. Some people like 2 space formatting, others 4, others 8. If you use a single tab then people can choose their own value for display.
    Except that does not work and it results in people screwing up your formatting because they were not viewing it incorrectly in the first place.
    Separation of content and presentation is a design principal.
    Exactly. Well said.

    So don't go putting any presentational mark up, like tabs, in the source code.

    The meaning of a tab is totally dependent on the viewer, the presentation. Tabs might not even mean insert N spaces or move to the next multiple of N position on the line. tab means "move to the next tab stop" which could be at any distance away and each tab stop can be a different distance from the last or any other on the line. Further: tab stops need not be the same on all lines!

    I don't care how you present the thing, use a wingdings font in pink with silver glitter if you like but don't put presentational mark up, like tabs, in there.

    @klstd,
    The debate between spaces and tabs was resolved in the 70s.
    It has? Seems nobody told any one. I only discovered this debate in the late 1970's and it has been raging all around ever since.
    The solution is a language sensitive editor that scans, parses and pretty prints as you type.

    If you've been fortunate enough to ever have used such a beast, the pain of using character editors for creating, reading or maintaining software just seems absurd.
    Can you name one such beast? I'm curious as I may well have used such a thing.

    The issue is not how it looks whilst you edit. The issue is the damage that goes on to the source which messes things up for many other tools as we have seen.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-15 06:20
    Dave,
    Whenever I needed to change one of these files I would first run it through the indent utility to produce a file with my own preferences. This took a file that looked like something from an obfuscated code contest, and turned it into something that was readable. However, once I was ready to check in my changes I incorporated them back into the original obfuscated-looking source file to minimize the amount of differences generated in the repository.
    Now I understand why such files get more and more obfuscated as time goes by :)
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-08-15 06:48
    The Dilbert comic strip from August 12 shows how most engineers feel about working on someone else's code. We would rather rewrite it from scratch instead of trying to understand the convoluted logic used by the previous developer. :) However, the reality is that code changes are usually made to fix a specific issue, and large changes can introduce regression problems. Quite often this forces us to minimize the changes to as small a change as possible. Reformatting the original file causes a "diff" command to show that the whole file has changed instead just a few lines. That's why it's best to keep the original formatting.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-15 07:25
    Dave,

    All very true.

    I was just imagining some big scrambled looking source code file that has had many little changes over the years by different authors. Each one of which pretty printed the thing so they could read it, made their changes, then snipped out some small part as a patch for the original. Result, our file is now a mess of half dozen different formatting styles, naming conventions camelCase and underscores ....

    It just seemed to me I had seen this many times before and you had just explained to me how it happened:)
  • jazzedjazzed Posts: 11,803
    edited 2014-08-15 11:30
    potatohead wrote: »
    Why not upload a style document then?


    A coding style goes a long way for an institution or organization to maintain their investment in the way they find productive. It is their right.

    Can we offer a coding style here? Sure, but adoption is up to the herd of cats. The "Gold Standard" OBEX didn't go very far did it? Those of us who have the skills have various quality requirements (my way or the highway in most cases), and probably disagreed with aspects of that "Gold Standard" document anyway (and such concerns were never discussed probably because they were unresolvable or the person required to deal with it was weak from fear or other circumstances).

    Beginners may not understand the importance of doing things a certain way, and actively trying to mold them may not be such a good idea anyway, unless they are paying you to teach them good practices.

    Beginner hobbyists (which is the vast untapped market fountain of the hobbyist population) don't know, care, or even want to know or care what a coding style is. That presents a problem that has to be solved passively. To me easy readability is directly related to coding style. All we can do is suggest something easy to do that ends up creating reproducible results within the tools. We can't expect people to read a manual and learn things, because in this world of copy/paste/smile they are simply not inclined to do it.

    RTFM is like a menu you hold in your hand, and you have to decide generally without any help on the choices made (except maybe the forum contributors which always have their own conflicting opinions anyway). Offering some easy formula even though it may be bland or otherwise distasteful to some customers makes things easier to grasp for all the others.

    McCode or McIDE (pardon the analogy Ray Kroc) should be easily consumable and should be the same whether you live in New York or Kansas City. I've found over and over again that any extra steps are seriously frowned upon ... and dealing with tabs requires unnecessary steps. McCode is a number 1 or a number 2 (take that however you like LOL), but the formula has worked successfully for a long time in the Kroc descendancy.

    When you introduce a menu (RTFM) and a waiter or waitress if available, things slow down and the instant satisfaction factor is lost (no more copy/paste/smile). That is: You only get one chance to make a good first impression, and that's what beginners need. When there is a menu and more options, the quality is better for each individual, but overall it is harder to use and more expensive to the business and the user in time and money than providing a generic solution.

    Yes, I would like to suggest a coding style especially if someone expects me to offer help by just giving them fish. My one rule is "If I can't read it, I won't look at it." (should add this to my .sig but the essence of that statement would likely not be easily understood by new users even if they did read it). If someone shows that they earnestly want to learn how to catch their own fish by offering a good presentation of ideas and what they may have done to try solving a problem themselves, I'm more charitable.

    However, I see no value in trying to cater to someone who won't use the tools I make anyway because of this or that current (or future) unmet subjective judgement conditions. Experienced users will do what they like, and I am happy with that. Experienced users already know how to catch fish using tools (a vast menu of options and their own FM) and usually don't want my help or anyone else's help.

    Asking experienced users to change some entrenched habit is an exercise in futility ... all I can do is offer an opinion and if they get it, great. If not, well the amount of time I have to spend to fix something or other factors certainly influences the volume of any further remarks.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-15 11:49
    Quite so Jazzed.

    Actually there is no need for a coding standard or a "style sheet".

    It's very easy, there is a body of code created by Chip in a very nice and consistent style. Contributors can submit patches following that style as a very good example. Or they can choose not to, or they can just accidentally get it wrong. No problem. Chip just has to say "That's a nice patch but I won't merge it until you get the formatting right"

    There is no debate about it.

    My only plea here is that yes the original code looks good so can we get rid of those brain damaged tabs so that it will continue to look good into the future. Make it easy for people to follow along.
  • potatoheadpotatohead Posts: 10,261
    edited 2014-08-15 12:39
    With GitHub, a published document makes a lot of sense. If new info ends up in there, non compliant, the old info is still there, and a maintainer can alwasy edit, update and then send back to the contributor, with a nice, "Please" and the dynamics are nothing like OBEX gold were.

    Secondly, I suggested exactly this for OBEX gold too.

    Once we have the code, and I will always argue for getting code over not, then we can always invest in it to comply for mutual benefit.

    If said investment does not make sense, then I submit either the code isn't worth it, or the mutual benefit isn't, and it could be both, in which case the code is backed out of the repository, or left as is.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-15 12:45
    potatohead,

    The way I see it working is that bad code of badly formatted code simply never gets merged into the repo.

    The maintainer does not have all that "edit, update and then send back" work to do. He simply does not pull it and tells the author why.

    Linus Torvalds could never deal with the rate of change in Linux otherwise. That is why he wrote git. To remove all that busy work.
  • potatoheadpotatohead Posts: 10,261
    edited 2014-08-15 12:53
    Yes, exactly!

    I should have taken a bit more time with my post. And other code could be used to help with compliance too. Lots of options on this stuff.
  • jazzedjazzed Posts: 11,803
    edited 2014-08-15 12:57
    For a repository it comes down to this:

    If you want to contribute and expect your contribution to be accepted into a repository you must follow the rules (if any) set by the repository owner (benevolent dictator).

    It is up to the repository owners to make such rules. The owners have the right to do what makes sense to them.
  • TorTor Posts: 2,010
    edited 2014-08-19 01:47
    UTF-16: Bad. Very much so.
    UTF-8: FIne. And necessary these days.
    TAB: They can work OK if, and only if, the tabstop is fixed at 8. One should remember where tabs came from. It was simply a quick way of setting up columns on a typewriter. There it made sense to be able to put down one tab at 4, another at 7, then one at 16 and so on. So that the typist could skip quickly to where the next character should be typed (for a table of numbers, say). And after the letter or whatever was written there was no trace of the tabs. Only spaces. It makes no sense at all to try to revert the usage of tabs as in letting the _viewer_ of a document re-adjust the tabstop to something else, and after all there was never just *one* tabstop on the typewriter.
    This is even more important these days when we have version control systems and people trying to collaborate on source code or whatever. You can't have individuals reformatting the text to their personal preferences whenver they add a change, what happens then is that every change is accompanied by a lot of "whitespace changes" as they're called. Something you definitely do not want. When you do a 'git show' for a commit you want to see the actual code change, not something buried in hundreds of lines of whitespace changes. Even if you're careful to not change anything, you're bound to do that anyway if you keep adjusting the tabspace setting.
    In any case, personal preferences is _not_ what matters when collaborating on source code. The style shall be set out early, for that piece of source code, and developers shall use that style for that code whatever their own preferences [edit. I.e. as jazzed said, above]. Then they may have to use another style for another project, and that's how it works. Fix up the source so that you have a starting point, and that's the only whitespace change that should happen in the version history. There shouldn't be anymore later.
    If you really want to keep tabs in the file, they must be set to 8. Otherwise untabify them and use spaces.
    As for this particular case discussed here, if it was set up with tabspace 4 then I too would do what Heater did: Untabify, and use spaces.

    -Tor
  • ksltdksltd Posts: 163
    edited 2014-08-19 08:45
    Style guidelines are a nice idea, kinda like communism. Neither works in the real world.

    The only workable solution I've ever seen is an editor that implements pretty printing such that non-conforming code cannot be constructed. It's a bit peculiar to settle into such an environment, but after a few days, there's no going back.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-19 08:52
    ksltd,

    I think I asked already but can you point us to such an editor?

    Style guides are not like communism at all. They are imposed by a sensible dictator on a project and they work really well. See the Linux kernel for the classic example. Millions of lines of code, hundreds or thousands of developers, hundreds of update per day! Consistent style maintained.
  • __red____red__ Posts: 470
    edited 2014-08-19 10:05
    Yes Heater,
    Consistent style maintained.

    Consistent with tabs.

    Thank you Ladies and Gentlemen, I'll be here all night!
  • jazzedjazzed Posts: 11,803
    edited 2014-08-19 10:07
    __red__ wrote: »
    Thank you Ladies and Gentlemen, I'll be here all night!


    LOL. Tip glass is on the piano.
  • Heater.Heater. Posts: 21,230
    edited 2014-08-19 14:14
    Tor,
    TAB: They can work OK if, and only if, the tabstop is fixed at 8.
    You have introduced two complexities here that are not needed.

    1) The idea that TABs are OK. Which as we have demonstrated is not so because everything goes wrong when you move things from editor to editor or viewer to viewer.

    2) We now not only have to argue whether to allow tabs or not but how wide they should be.

    Never mind that a tab width of 8 is probably the least used setting in the world. Probably because it needlessly wastes a lot of horizontal space when indenting code.

    Talking about TABs back in the days of typewriters is pointless. Those TABs were used as a quick way to make spaces on the page. None of those key inputs were transmitted to anyone else. Only the finished paper with characters printed on it in the correct places.

    To reiterate banning TABs and insisting on spaces is the logical way to go because:

    0) It does not demand any change in the formatting style, layout, of the source code at all. It has nothing to say about that.

    1) It ensures the document will appear correctly everywhere without the need for anyone to think about what a TAB width should be in any particular case.

    2) It does not prevent anyone using the TAB key when they edit the source as much as they like. As long as they tell their editor to insert the correct number of spaces.

    It's the simplest thing and makes life easy all around.
  • jazzedjazzed Posts: 11,803
    edited 2014-08-19 17:19
    Heater. wrote: »
    Talking about TABs back in the days of typewriters is pointless. Those TABs were used as a quick way to make spaces on the page. None of those key inputs were transmitted to anyone else. Only the finished paper with characters printed on it in the correct places.


    Pointless except that perhaps you described exactly some points that Tor originally made. ;-)
  • TorTor Posts: 2,010
    edited 2014-08-19 19:10
    Heater,

    We basically agree, except for the one point:
    Heater. wrote: »
    Never mind that a tab width of 8 is probably the least used setting in the world. Probably because it needlessly wastes a lot of horizontal space when indenting code.
    On the contrary, it's the defacto standard tab width. But I was not talking about source code indentation, see below
    [*]. However, as for actual 8 indentation coding standards out there, you must surely have looked at a lot of BSD C code over the years? 8 indentations is the BSD programming style (you can set it in Emacs, for example), exactly because 'tab' was so easy to hit in old editors. And as the tab width was 8, that is also the width you get with the Emacs BSD style setting. MicroEmacs even uses a real tab for this. And there must be millions of lines of code out there following the BSD standard.

    As for default tab width, try this in the shell, for example:
    /bin/echo -e "123456678\n\ttab"
    
    I get
    123456678
            tab
    
    It even looks correct with cut&paste - something I didn't actually fully expect.


    (
    [*]The 8-width tab which I talked about did NOT imply that the code should be indented 8 spaces! Not at all. I was simply stating that if tabs are used then the tab setting must be 8 or it'll be a mess (again, because 8 is the TAB default width). The source code editor can use whatever indentation width setting it wishes. At work, for example, we use 4. That by itself doesn't preclude using tabs. Not that I particularly recommend it, I'm simply pointing out that this does not create 'whitespace mess' when different people update code. Every Git commit looks clean. If it doesn't, it's because of normal mistakes, although using tabs does not make it easier of course. But not that much worse either. The code looks fine in whatever editor the developer uses, because if there's a tab in the code it'll be handled as an 8-width tab (i.e. whitespace up to the next modulo 8 column). When I mentioned the typewriter tabs I meant their irrelevance, and the only part of that which can be allowed to live on in the modern world is the fixed-value 8 tab. It can be argued that it's useful, but variable-size tab is completely useless.)

    So what I'm saying is this:
    1) Define a style for the code. 4 indentations, for example, or whatever the benevolent dictator dictates.
    2) Never define 'tab' as the method to enforce that indentation. That is the really important point. Never do that. And that is what I understand was the problem and which Heater addressed.
    3) Nevertheless, any tab actually found in the source hereafter must be handled with a tab width setting of 8, that's what editors default to and this will let the code look correct, in both Emacs and vi, to cover the two editors most in opposition in the *nix world at least. Well, also every other editor I've tried.

    -Tor
Sign In or Register to comment.