Is there a Unicode to ASCII converter that can be integrated into a webpage? (PHP?)
Microcontrolled
Posts: 2,461
There is a new web-based piece of software that I am writing, and it needs the ability to convert Unicode files to ASCII files, as it needs them in ASCII before it can work with them. Is there a PHP (or any web language for that matter) script that can do this for me? I have found a converter on the web here: http://www.thepcmanwebsite.com/unicode_converter.shtml
but it would be most convenient if it converted the files automatically, rather then having the user convert them.
Any help would be appreciated.
Thanks,
Microcontrolled
P.S. These are .spin files it's converting
but it would be most convenient if it converted the files automatically, rather then having the user convert them.
Any help would be appreciated.
Thanks,
Microcontrolled
P.S. These are .spin files it's converting
Comments
-Phil
So, as Phil asked, what do you want to do with the characters above 127? That would be 0x80 to 0xFFFD?
Do you mean localization or do yo want to convert a 1 to 1?
I imagine C# can also deal with Unicode since Microsoft understands multi-culture aware software. Well except for their timezone localization API's, which Java does better. I haven't done anything with Unicode in PHP or Ruby, so I don't know if they can handle it.
Windows uses UTF-16 as its native format in memory, but can save files in ASCII, UTF-8, or UTF-16. When you see an ASCII file saved as UTF-16 every other byte is zero, and ASCII and UTF-8 are the same for Latin characters. Non Latin language files generally have some Latin characters in them and you'll see a similar pattern. It is possible to use a symbol histogram to infer the source language, but I haven't done that myself.
-Phil
I noted that if you insert a single unicode character into the file the propeller tool will save the file as UTF-16 and even if you remove all unicode characters the
propeller tool does not change it back. To me this makes sense as why go to all the effort to convert the file back and forth chasing someone entering unicode and
taking it back out .
Tom
http://php.net/manual/en/function.utf8-decode.php
I'm not sure about UTF-16, you try this.
http://www.php.net/manual/en/function.mb-convert-encoding.php
There's .NET too
http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx