Heater, I agree that UTF-16 was a bad idea, besides doubling most text it had byte order issues that defeated its use as a universal character encoding. But UTF-8 is actually pretty good. It's basically the same thing as ASCII with extensions to allow embedding any 16 or 32 bit character. While emojis don't seem that useful to me, they're kind of a trivial byproduct of having billions of characters.
It isn't perfect as ISO Latin-1 encoded text conflicts for code points 128 - 255 differ by becoming 2-byte sequence with UTF-8 whereas they are single bytes with Latin-1. You also can't randomly index by character into a string, but that's usually not a problem.
I've also heard that some East Asian languages are larger when encoded in UTF-8 versus UTF-16, but given the complexities of character encoding some problems were inevitable.
I was a CPM user. And minicomputer user. Always internationally oriented. Never heard about 'langedit', never seen or heard about anyone using it, I have never come across it lately either, while going through all the CP/M history/archive sites on the net. So I will maintain my position that it must have been a very obscure tool.
I agree, UTF-8 is the only way to go for data exchange. Although internally software can turn that into 16 or 32 bit characters as they see fit.
While emojis don't seem that useful to me, they're kind of a trivial byproduct of having billions of characters.
This is way past trivial.
Consider the humble "Pile Of Poo" emoji. What problems could there be with that?
Well, a few years ago Android rendered the pile of poo as a rather disgusting looking blob of excrement with flies circling over it. Which presumably was meant to be an expression of disgust. This was apparently too horrible for some people, so today the same code point is rendered has a kind of happy, smiling little Smile.
So, even within the same system the same code point has had it's meaning totally changed over time.
Have a look at the way emoji are rendered on Apple and Android and other systems. The same code point can look totally different and convey a totally different meaning.
The classic case is the "I feel like dancing" emoji that was rendered as a girl dancing. Or or boy. Either of which may not be appropriate for the sender and received differently by the receiver.
Now it is proposed that it be possible to specify the gender of emoji to fix that kind of problem. That requires a pre or post fix code point. Also we should be able to change the hair color of emoji, not everyone is blonde, and the skin tone, not everyone is white, right?
So now any Unicode aware system has to work in color, in high resolution.
I find it appalling that adult human beings are sitting in standards committee meetings, and injecting sexual and race and other politics into a technical standard for computers. And coming up with such things as the Pile of Poo.
And what fun kids will have with the proposed "condom" emoji.
I don't mind that teenagers are illiterate. Historically that has always been true.
I don't mind that they have a way to send silly pictures to each other.
What I do object to is all this madness being built into the very heart and soul of every operating system and other software we use. This is serious technical stuff and it being filled with 💩
Anyone here know how to write a politically correct, internationalized, "Hello world" program in C ?
Comments
It isn't perfect as ISO Latin-1 encoded text conflicts for code points 128 - 255 differ by becoming 2-byte sequence with UTF-8 whereas they are single bytes with Latin-1. You also can't randomly index by character into a string, but that's usually not a problem.
I've also heard that some East Asian languages are larger when encoded in UTF-8 versus UTF-16, but given the complexities of character encoding some problems were inevitable.
I agree, UTF-8 is the only way to go for data exchange. Although internally software can turn that into 16 or 32 bit characters as they see fit. This is way past trivial.
Consider the humble "Pile Of Poo" emoji. What problems could there be with that?
Well, a few years ago Android rendered the pile of poo as a rather disgusting looking blob of excrement with flies circling over it. Which presumably was meant to be an expression of disgust. This was apparently too horrible for some people, so today the same code point is rendered has a kind of happy, smiling little Smile.
So, even within the same system the same code point has had it's meaning totally changed over time.
Have a look at the way emoji are rendered on Apple and Android and other systems. The same code point can look totally different and convey a totally different meaning.
The classic case is the "I feel like dancing" emoji that was rendered as a girl dancing. Or or boy. Either of which may not be appropriate for the sender and received differently by the receiver.
Now it is proposed that it be possible to specify the gender of emoji to fix that kind of problem. That requires a pre or post fix code point. Also we should be able to change the hair color of emoji, not everyone is blonde, and the skin tone, not everyone is white, right?
So now any Unicode aware system has to work in color, in high resolution.
I find it appalling that adult human beings are sitting in standards committee meetings, and injecting sexual and race and other politics into a technical standard for computers. And coming up with such things as the Pile of Poo.
And what fun kids will have with the proposed "condom" emoji.
Insanity.
Hopefully it won't be that bad, but I agree that with a few exceptions they are a waste of time and effort.
Yes I agree it's all a big mess though:
http://www.npr.org/sections/thetwo-way/2016/04/12/473965971/lost-in-translation-study-finds-interpretation-of-emojis-can-vary-widely
I don't mind that they have a way to send silly pictures to each other.
What I do object to is all this madness being built into the very heart and soul of every operating system and other software we use. This is serious technical stuff and it being filled with 💩
Anyone here know how to write a politically correct, internationalized, "Hello world" program in C ?
-Phil
http://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz
Oh wait, that's not even internationalized is it? It needs to be even longer.