Shop OBEX P1 Docs P2 Docs Learn Events
Is there a Unicode to ASCII converter that can be integrated into a webpage? (PHP?) — Parallax Forums

Is there a Unicode to ASCII converter that can be integrated into a webpage? (PHP?)

MicrocontrolledMicrocontrolled Posts: 2,461
edited 2011-05-03 18:36 in General Discussion
There is a new web-based piece of software that I am writing, and it needs the ability to convert Unicode files to ASCII files, as it needs them in ASCII before it can work with them. Is there a PHP (or any web language for that matter) script that can do this for me? I have found a converter on the web here: http://www.thepcmanwebsite.com/unicode_converter.shtml
but it would be most convenient if it converted the files automatically, rather then having the user convert them.

Any help would be appreciated.

Thanks,
Microcontrolled

P.S. These are .spin files it's converting

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-05-03 15:15
    What do the special unicode characters get converted to (or what do you want them to get converted to), question marks?

    -Phil
  • MicrocontrolledMicrocontrolled Posts: 2,461
    edited 2011-05-03 15:52
    Not question marks, probably dashes or such. However, it is not just a matter of converting the special characters to ASCII, but the whole file. It appears that when a file is saved as Unicode, all of the characters are encoded differently then one of just ASCII. I found this out because when I ran a .spin file with no special characters through the script, it worked fine, but when I used files with Unicode characters, it didn't. Even if I removed ALL the Unicode characters and re-saved the file, it still wouldn't work. For the online converter I linked to, It will ignore the file if it has any unicode special characters but will convert fine otherwise with working results.
  • Mike GMike G Posts: 2,702
    edited 2011-05-03 16:51
    AS far as I know the Latin characters "A" is 0x41 in Unicode and ASCII. The basic ASCII character set is 0 to 127. There's an extend ASCII character set too 128 to 255.

    So, as Phil asked, what do you want to do with the characters above 127? That would be 0x80 to 0xFFFD?

    Do you mean localization or do yo want to convert a 1 to 1?
  • Martin_HMartin_H Posts: 4,051
    edited 2011-05-03 17:51
    Java can deal with Unicode. One thing you have to know when dealing with Unicode is that it has two different encodings. There's classic UTF-16 which uses two bytes for each character, and UTF-8 which uses one byte for Latin characters and two bytes for all others. So you need to know the encoding of the file you want to process. There's also UTF-32 which I've never seen used.

    I imagine C# can also deal with Unicode since Microsoft understands multi-culture aware software. Well except for their timezone localization API's, which Java does better. I haven't done anything with Unicode in PHP or Ruby, so I don't know if they can handle it.

    Windows uses UTF-16 as its native format in memory, but can save files in ASCII, UTF-8, or UTF-16. When you see an ASCII file saved as UTF-16 every other byte is zero, and ASCII and UTF-8 are the same for Latin characters. Non Latin language files generally have some Latin characters in them and you'll see a similar pattern. It is possible to use a symbol histogram to infer the source language, but I haven't done that myself.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-05-03 17:55
    Spin files are saved either as ASCII or UTF-16. You can tell which by looking at the first two bytes. For UTF-16 they will be $FF and $FE, respectively.

    -Phil
  • tdlivingstdlivings Posts: 437
    edited 2011-05-03 18:02
    To add to what Phil says, In my experimenting around with making a propeller tool using Delphi and Propellent.dll and studing what the propeller tool did with unicode
    I noted that if you insert a single unicode character into the file the propeller tool will save the file as UTF-16 and even if you remove all unicode characters the
    propeller tool does not change it back. To me this makes sense as why go to all the effort to convert the file back and forth chasing someone entering unicode and
    taking it back out .

    Tom
  • Mike GMike G Posts: 2,702
    edited 2011-05-03 18:36
    At first I did not understand what you needed (I'm easily confused) but PHP has a utf-8 encode/decode methods.
    http://php.net/manual/en/function.utf8-decode.php

    I'm not sure about UTF-16, you try this.
    http://www.php.net/manual/en/function.mb-convert-encoding.php

    There's .NET too
    http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx
Sign In or Register to comment.