Line endings, UTF-16, tabs and other formatting Holy Wars.

__red__ · 2014-08-13 06:19

Heater. wrote: »

That is exactly why I am begging and pleading for Chip to kick off a formal repository with no tabs. The formatting does not visibly change for him so it does not impose any altering of style aesthetically. It does ease collaboration into the future.

Holy War lies this way and probably best not to argue about it here. I'll just that say that I believe tabs are superior and we can debate it over a drink sometime. We both agree that a mix of tabs and spaces are evil however so there's our common ground :-)

Yes, many diff programs can ignore white space changes. Which has it's own downsides. As we all know white space change can change semantics in languages like Spin and Python and surprisingly JavaScript in subtle ways. Also it means that developers can sneak crappy formatting style into your code with out it being noticed when looking over patches.

The reason I don't touch python is that exact thing. I can't deal with whitespace as syntax.

I spent almost an hour chasing a bug in my badgehack because I was misaligned by a single space in SimpleIDE. The continuation of my routine was stuck under an if that shouldn't have been. The function under the if was over a page long so I eventually found it by putting my finger on the screen and scrolling(!)

I can "Holy War" all I want about using whitespace as syntax but it's the reality. Sure I could add braces to the tokenizer and that would make my life easier but that would be harmful to the community.

Oh my God yes, that utf-16 utf-8 thing is a nightmare. Quite frankly I think anything more than ASCII should be banned from source code. In fact unicode itself should be banned completely. Have you ever noticed that there are an increasing number of web pages containing "funny" characters or become totally unreadable, mostly due to some error in translating from one representation to another. We are slowly rotting all our information with this unicode non-sense.

I kinda sympathize a bit but unicode is where we're going over time. It's certainly better than ANSI / ASCII / EBCDIC / copepages. Unicode is at least defined to be platform consistent.

Ironically, we're rotting now but in time as unicode support broadens it will rise from the dead. Zombie SPIN!

We seem to be a bit stuck with it as Spin includes the

wacky idea of supporting drawing diagrams in the source with graphic characters. Which of course never renders correctly anywhere out side of Spin tool....

When in a closed ecosystem it isn't that wacky, it's actually kinda cool.

The question of course is whether the ability to draw diagrams in code is worth the interoperability issues? I think now, most of us would say no... but it's a reflection of the time it was made.

The other issue I come across a lot is \n, \r and \n\r - three "standards" to end a line. Argh!

Imagine how much further we'd be ahead if every just agreed with me?
Cuddle thy braces, \n your lines, use tabs and worship vim :-)

I really REALLY wish SimpleIDE didn't force me to use spaces. I used it during defcon to set an example but I'm now using Makefiles because I just can't deal with spaces.

I know we'll likely never agree, but at least give me the option to be an idiot in your view because if I'm working on your project I'll conform to what I see as your idiotic ways.

Consistancy and interoperability is key. To play nice with others we need to play by the project owner's rules.

idbruce · 2014-08-13 07:22

I don't know where the original quote came from, but if we are "arguing" space versus tabs for formatting.

I choose tabs

I hate when spaces are used for indentation in code

Publison · 2014-08-13 07:37

Bruce,

Your going to be off of Heater's Christmas Card list now.

Gadgetman · 2014-08-13 07:49

Spaces for indentation is one of the deadly sins of programming!
(You use it, whoever gets to maintain the code after you will want to kill you... )

As for source code in anything but ASCII...
See 1.
Also, those who designed ASCII back then should have been lined up against a blackboard and pelted with chalk and sponges...
(They managed to switch about a few letters... The correct sequence, after Z, is

__red__ · 2014-08-13 07:53

I've dealt with source-code that failed unit testing because of the author's name in the comments(!) contained an "illegal character". Compiled fine on the developer's box where his locale has set to some .eu language - crashed and burned in "en_US.UTF-8"

ASCII assumes an American world.

Gadgetman, What would you suggest over ASCII for portability?

Martin_H · 2014-08-13 08:01

There are really two separate issues with Unicode: encoding the characters, and displaying the characters.

My original beef with Unicode was the UTF-16 encoding which took two bytes to encode data that only required one byte. The UTF-8 encoding solved that and Unicode is no longer on my list of hated technologies

Displaying Unicode is a non-issue for me. With such a large number of characters a machine is unlikely to have fonts for all of them installed. But if someone is using a characters sets for a page that isn't installed, it is highly likely that you don't understand the language they're converse with anyway.

idbruce · 2014-08-13 08:17

Your going to be off of Heater's Christmas Card list now.

UH-OH

I should have expected that there would consequences for adding my two cents.

ctwardell · 2014-08-13 08:25

I was going to say I prefer TAB for indentation, but if that comes at the cost of being excluded from Heater's Christmas card list and worse yet means agreeing with Bruce, then by all means I support SPACES!

Actually, just kidding, I do prefer using TAB.

C.W.

P.S. Just having some fun with you Bruce!

jazzed · 2014-08-13 08:33

If you want tabs in one of the IDEs I wrote, you are welcome to add it.

The minimum requirements are:

1. A checkbox selects using tabs or spaces and will be saved in settings.
2. The default at startup when not previously selected will be to use spaces.
3. A document will have tabs or spaces and not both (other than strings, comments, or statement delimiters).
4. You must automatically detect which is being used and adjust accordingly.
5. All editor functions including suggestions and "auto-enter" indent must work properly.
6. The IDE will convert any existing files to the selected method.
7. You will be responsible for any bug fix related to tabs or spaces forever.

Alternatively, use or design another editor.

Don't touch my vim settings ;-)

__red__ · 2014-08-13 08:38

True, but I come across some of these codes when doing malware reverse engineering.

Unfortunately some of my tools don't deal will with UTF-16 which is what most malware strings are encoded in nowadays so I'm having to evolve my toolset and my mindset to do my job.

It's not really surprising that I'm tab biased, UNIX has always been that way. For example:

0) Only more recent versions of syslog allow spaces in syslog.conf
1) Makefiles require tabs and gnu make will snark at you if you use spaces:

tabsrule ~ # make spaces
Makefile:2: *** missing separator (did you mean TAB instead of 8 spaces?).  Stop.
tabsrule ~ # make --version
GNU Make 3.82

But then again I'm a middle-aged UNIX-beard who prefers the command line, vi as my editor and I do almost all my data manipulation with dd, od, cat, sed, grep, sort, cut, uniq.

Other beards don't like that I do things like:
cat filename | grep pattern

instead of:
grep pattern filename
or
grep pattern < filename

Pipes go left to right so it was always easier for me to conceptualize and now computers are fast enough that I don't have to worry about the additional process overhead for the cat.

... and really the the tabs / spaces issue is really two issues:
0) How the code appears (some people like 2 spaces, some like 4, I like 8 (again, UNIXbeard)).
1) How the code is encoded (tabs vs space).

There is a huge psychological effect of formatting on code fluency and I can quote some scientific studies if anyone is interested.

I learned this lesson the hard way last year when packetrider and I were writing the propeller firmware for the skydogcon badge. He's a spaces guy, I'm a tab guy and our codebase was in version control.

I spent 8-10 hours re-formatting the entire thing with tabs as that made it more comfortable for me to work in and pushed it back to the repo. That was the last time the two of us worked on the same version of code - he just never got comfortable with my reformatted code so we ended up effectively having two separate branches which we cut and pasted code between. A horrible loss of productivity.

I learned that day that the holy-war just isn't worth it. I got over myself.

__red__ · 2014-08-13 08:50

jazzed wrote: »

If you want tabs in one of the IDEs I wrote, you are welcome to add it.

The eternal "patches welcome" - I approve :-D

The minimum requirements are:

1. A checkbox selects using tabs or spaces and will be saved in settings.

Agreed

2. The default at startup when not previously selected will be to use spaces.

No issue

3. A document will have tabs or spaces and not both (other than strings, comments, or statement delimiters).

A noble goal but dangerous to implement. Also, there is a third way which is tabs and spaces (which is a known (but stupid) formatting pattern as you get none of the advantages and all of the disadvantages)

4. You must automatically detect which is being used and adjust accordingly.

Throw a warning yes, switch modes - no.

5. All editor functions including suggestions and "auto-enter" indent must work properly.

Yup.

6. The IDE will convert any existing files to the selected method.

Sorry, this is evil because it forces code-style on what may be a shared project. As one of our target languages has whitespace as syntax this is the equivalent of juggling with scalpels. You could load code, make no edits, save code and potentially have changed its flow.

7. You will be responsible for any bug fix related to tabs or spaces forever.

In other words, you'd want that person to maintain that area of code. I would agree. Submitting the patches is easy, maintaining the code-base is hard.

Alternatively, use or design another editor.

That's the thing about OSS. The author of the project typically implements those functions which scratch their own personal itches. I can make the request but unless I'm willing to provide patches I have no option other than to say "pretty please" :-)

potatohead · 2014-08-13 09:14

I like tabs converted to spaces the moment I hit the tab key. I nearly always benefit from the spaces on future edits.

Displaying and working with this mess is one of the reasons I really like Sublime. Doesn't matter what you and others have done to it. Sublime will just display it, and somehow, it just works. (worth the small charge for a license, IMHO Easy.)

Prop Tool has a nice mode, where you move the cursor around anywhere in the visible screen area. It doesn't matter whether there are enough characters to "get you there", the cursor just moves. And if you type something, it fills in nicely for you, no worries. Great for ASCII art, for one, but it's also pretty great for working with mixed format code. Put the cursor where desired, and then type it, and whatever it is will be there for the next editor no matter what.

I like that too.

Finally I agree on going with formatting already there, if I'm collaborating. This is important. And the more times one does this kind of thing, the more fluent one becomes, and the hassle factor goes down.

jazzed · 2014-08-13 09:27

__red__ wrote: »

Sorry, this is evil because it forces code-style on what may be a shared project. As one of our target languages has whitespace as syntax this is the equivalent of juggling with scalpels. You could load code, make no edits, save code and potentially have changed its flow.

Yes, I turned that feature off in SimpleIDE long ago (unwillingly).

But this illustrates the problem very nicely. The problem is that not doing it allows having the ability to mix tabs and spaces in a document which creates a horrendous mess that others have to suffer. I despise mixing tabs and spaces more than I despise using just tabs. Make is broken btw :-) except that it is clear on the requirement.

Any shared code base must have a specification of what is acceptable for white-space in my opinion. The best rule of course is to be aware of and follow the original author's example unless it's simply dysfunctional. Since most people are fundamentally lazy (ignorant of issues, or simply better at something else to put it kindly), the right thing to do is choose a solution that works regardless of what it is and stick to it.

idbruce · 2014-08-13 09:41

Of course I just speak for myself, especially since C.W. does not want to be included in my comments

I prefer tabs for indentation only.

One white space in between function parameters and one white space before and after operators.

Just to clarify my stance a little

Heater. · 2014-08-13 13:52

This post intentionally left blank.

pmrobert · 2014-08-13 14:04

Heater. wrote: »

This post intentionally left blank.

Are spaces or tabs responsible for the blankness? :-)

Heater. · 2014-08-13 14:17

Hmm....as my post above demonstrates it's not just that unicode is a brain damaged system with an increasingly growing pile of nonsense characters in it. It also lacks some really essential characters which can be fundamental to the meaning of what you write.

That text should have been moved over to the centre with some spaces (or dare I say it tabs

).

But the equally brain damaged world of the WEB technologies thinks it's a good idea to strip out leading white spaces. That is to say edit away my carefully crafted formatting hence altering the meaning of what I write.

Notice however that it did leave some other white space in there, namely the blank lines before and after the text. Why? Who knows?

Anyway, one of the essential missing characters of unicode I'm referring to is something that takes up the width of an em on the line but does not actually make any marks there.

In the old days we used to call that a "space" but as you see spaces are now banished.

I need something that takes up space on the line but is not a space character. A "Not Space". Anyone know if there is such a thing?

Heater. · 2014-08-13 14:28

Ha, I spoke too soon.

Of course unicode has all kinds of spaces defined.

See the, now edited, post #15 above.

The text indented as intended.

I don't need code tags any more to stop the WEB messing up my formatting.

Try that with your stupid tabs

idbruce · 2014-08-13 15:40

Try that with your stupid tabs

__red__ · 2014-08-13 16:09

It's called an \em

Heater. · 2014-08-13 16:22

__red__

It's called an \em

Not where I got it from it isn't. An "em" or if you like "\em" is a unit of measurement of character width.

The character I used is called an "EM SPACE" and it has a width of 1 em. It's code is U+2003.

One of my favourite pieces of brain damage now is the Unicode character called the "MONGOLIAN VOWEL SEPARATOR". It has code U+180E. It takes up no width at all and displays nothing. Sounds really useful !

Martin_H · 2014-08-13 16:38

@Heater, you're not supposed to use embedded space characters for formatting text on web pages. That's the job of markup and CSS! Granted CSS is an incomprehensible pile of hacks, but that's how you're supposed to do it. For example, the leading blank likes you had above are probably the result of some esoteric CSS padding attribute that cascaded down from god knows where. Nothing ruins my day like trying to figure out why some esoteric cascading padding attribute is causing text to be unaligned when it should be in a column.

Heater. · 2014-08-13 17:13

Martin_H

you're not supposed to use embedded space characters for formatting text on web pages. That's the job of markup and CSS!

I'm very sure you are not saying that seriously. But here we go...

When I take the trouble to write something I expect that if I write a character I get the character. More importantly you get the character. All of my characters are important, at least to me, otherwise I would not bother to type them now would I?

So when I type "A" you see an "A". If I type "π" you see "π". When I type a " " you see a " ".

Ah, except when the WEB decides to sensor my spaces and take them away. Whose brilliant idea was that!

For sure I don't want to be writing CSS to get what I want.

Turns out that neither does anyone else. This whole deal has become so useless, and word processors before it, that people now love to write using really simple mark up like, well, markdown.

I did not want to go to this example but you have twisted my arm. A limerick:

There was a young man called Bliss
Whose sex life was strangely amiss
For when reclining with Venus
His recalcitrant penis
Could scarcely do better than t
h
i
s.

potatohead · 2014-08-13 18:06

That little bit of pain is why people use pictures.

I have started exactly that, if I have ASCII are, or some tabular alignment of chars to present to people. Works great, except for being able to use the text itself...

A while back I encounteres Asian people struggling with all of this. Turns out, it was quicker ans easier for them to just write it, scan, send. Crazy!

I have use "non breaking space" for the same presentation purposes. Given fairly wide support for that character, and maybe it is the same one being discussed, it seems you aren't supposed to put spaces in, until you are. Wonderful.

__red__ · 2014-08-13 22:37

If we're going to talk about unicode and abuse of file formats then I do need to link again to http://poc││gtfo.pwn.me

Gadgetman · 2014-08-13 23:30

If I don't want the web to steal my spaces, I use 'non-breaking spaces'
(Yeah, I use Notepad to write the html manually. I hate Mega-sized code)

Heater. · 2014-08-14 00:33

I don't want the web to steal my spaces.

Does not work. Besides if I put that in my source code the compilers are going to love me.

Heater. · 2014-08-14 02:29

__red__

Holy War lies this way and probably best not to argue about it here. I'll just that say that I believe tabs are superior and we can debate it over a drink sometime...

My request for using spaces instead of tabs in the P1 source code is not a "Holy War", it is not a matter of aesthetics, artistic style or personal preference.

I have no desire to change the coding style we see in the P1 code. I don't want to dictate a different indentation width or change the positioning of brackets and braces or lay down rules about commenting. In fact I want it to look exactly as Chip intended when he wrote it.

No, this is born out the desire to have things "just work".

It turns out that using tabs causes things to not "just work", the formatting as seen by the end users gets messed up. Whereas with spaces we can get that exact Chip look. Ergo the no-tab rule is the logical way to go.

This is a snippet of code copied and pasted from cog.v from the original P1 sources with the tabs still in place:

always @(posedge clk_cog or negedge ena)
if (!ena)
        m <= 5'b0;
else
        m <= { (m[2] || m[4]) &&  waiti,                                // m[4] = wait
                   (m[2] || m[4]) && !waiti,                            // m[3] = write d
                        m[1],                                                                   // m[2] = read next instruction
                        m[0],                                                                   // m[1] = read d
                   !m[4] && !m[2] && !m[1] && !m[0] };          // m[0] = read s

This is the same code copied and pasted from the repaired version with no tabs:

always @(posedge clk_cog or negedge ena)
if (!ena)
    m <= 5'b0;
else
    m <= { (m[2] || m[4]) &&  waiti,                // m[4] = wait
           (m[2] || m[4]) && !waiti,                // m[3] = write d
            m[1],                                   // m[2] = read next instruction
            m[0],                                   // m[1] = read d
           !m[4] && !m[2] && !m[1] && !m[0] };      // m[0] = read s

I will leave it to the reader to decide if the supporters of tabs could be any more fanatically, illogically attached to a broken idea.

You can view the whole sorry mess in github: https://github.com/ZiCog/P8X32A_Emulation
Just go browsing the source files and see how nicely Chip formatted them. Then change to the originally released version tagged as "Version-2014-08-6" (Hit the button with "Branch" on it) and you will see what a broken mess it is with those tabs in place.

I rest my case.

Edit: __red__, Debating such things over a beer with you would be great.

__red__ · 2014-08-14 06:47

You can set tabstop in github's web interface to 4 (as Chip uses) like this:

https://github.com/ZiCog/P8X32A_Emulation/blob/00a2cef7330832010415dc86ac0b78b4b5e90429/P8X32A_DE0_Nano/cog.v?ts=4

I believe that the reason you had cut and paste issues is because your browser or source-code viewer had tapstop set to 8.

I have cut and pasted below from the tab'd version with my editor set tabstop=4

always @(posedge clk_cog or negedge ena)
if (!ena)
    m <= 5'b0;
else
    m <= { (m[2] || m[4]) &&  waiti,                // m[4] = wait
           (m[2] || m[4]) && !waiti,                // m[3] = write d
            m[1],                                   // m[2] = read next instruction
            m[0],                                   // m[1] = read d
           !m[4] && !m[2] && !m[1] && !m[0] };      // m[0] = read s

So, if you have your tools configured appropriately for tab usage it works exactly as it should. To be clear, it's no dig on you to say you don't have it configured properly for tabs because you don't use them :-)

Anyways - I see in the other post that Chip is fine with the change so as far as this project is concerned there isn't any debate. Chip is tab/space lingual and as the first pumpking, (heh - see how that happened?) whatever you guys say, goes.

Thanks,

Red

Todd Marshall · 2014-08-14 07:34

__red__ wrote: »

You can set tabstop in github's web interface to 4 (as Chip uses) like this:

https://github.com/ZiCog/P8X32A_Emulation/blob/00a2cef7330832010415dc86ac0b78b4b5e90429/P8X32A_DE0_Nano/cog.v?ts=4

I believe that the reason you had cut and paste issues is because your browser or source-code viewer had tapstop set to 8.

I have cut and pasted below from the tab'd version with my editor set tabstop=4

Red

I changed the tab setting in the Quartus editor to 4 to correspond to Chip's formatting. No big deal.

jazzed · 2014-08-14 08:04

__red__ wrote: »

You can set tabstop in github's web interface to 4 (as Chip uses) like this:
...

So, if you have your tools configured appropriately for tab usage it works exactly as it should. To be clear, it's no dig on you to say you don't have it configured properly for tabs because you don't use them :-)

When I first opened one of the files, I was miffed at the formatting. Then I realized, oh this is a tab thing. Then it took several minutes to find the tab control and play with the tabstop until it seemed right.

Why should anyone need to go through that? What possible advantage beyond byte frugality is there in having to do it?

Line endings, UTF-16, tabs and other formatting Holy Wars.

Comments