Shop OBEX P1 Docs P2 Docs Learn Events
Using Chinese Chars in an array? — Parallax Forums

Using Chinese Chars in an array?

jazzedjazzed Posts: 11,803
edited 2013-03-24 15:56 in Propeller 1
Someone has tried using Chinese characters in a char array.
I suspect that the array should be wchar_t and locale set to C.UTF8.

Can someone please describe what it takes to make the program below work?
I'm super busy with bugs otherwise I would dig it myself.

Thanks,
--Steve
/**
 * This is the main Blank Simple Project program file.
 */
#include <stdio.h>
#include "simpletools.h"


/*
char chi_TextArray[];
*/
char eng_Text[] = "Hello There"; 
char chi_Text[] = "&#20320;&#22909;&#21527;&#65311;";


int main()
{
    pause(500);
    printf("English: %s\n", eng_Text);
    pause(500);
    printf("Chinese: %s\n", chi_Text);
    pause(1000);
    
    
}

Comments

  • Heater.Heater. Posts: 21,230
    edited 2013-03-23 11:25
    Strangely enough when I compile that code on my Linux machine, with the simpletools and pauses commented out it prints exactly what I see in the chi_Text string in my editor (vim). That is: a square box, a space, a chinese character, a chinese character, a square box, a space and another square box.

    Seems to be working fine.

    Those "pause" statements on the other hand are an issue, according to the man pages here:
    pause() causes the calling process (or thread) to sleep until a signal is delivered that either terminates the
    process or causes the invocation of a signal-catching function.
    And sure enough putting those back hangs the whole thing.
  • ersmithersmith Posts: 6,054
    edited 2013-03-23 11:26
    It works for me out of the box on Linux, but then my locale is set to "en_CA.utf8" already and so my terminal program can interpret UTF-8 correctly. On Windows you may have to specify -finput-charset=UTF-8 as an option to propeller-elf-gcc, and also make sure to run a terminal program that understands UTF-8 characters. There's no need to use wide characters (UTF-8 is encoded in bytes) or change locale if they're only printing. Having said that, adding:
        setlocale(LC_ALL, "");
    
    to the start of main() is probably a good idea -- it'll make sure that multibyte and wide character functions will use UTF-8.

    Eric
  • jazzedjazzed Posts: 11,803
    edited 2013-03-23 11:41
    Thanks for the help guys.

    It would be great if adding the option -finput-charset=UTF-8 would just solve the problem.

    Unfortunately I had to add #include <locale.h> and do setlocale(LC_ALL,""); to make it work with the terminal.

    I know it's not a good idea, but if the value for LC_ALL could be used instead of the macro, things would be easier. Objections?

    Postedit: Looks like the include is necessary in any event. Seems like -finput-charset=UTF-8 is not necessary then.
  • ersmithersmith Posts: 6,054
    edited 2013-03-23 14:57
    jazzed wrote: »
    It would be great if adding the option -finput-charset=UTF-8 would just solve the problem.

    Unfortunately I had to add #include <locale.h> and do setlocale(LC_ALL,""); to make it work with the terminal.
    That's very strange. Are you sure you didn't change something, like modify the char to wchar_t? The setlocale() should only change runtime behavior, not compile time, and the default runtime behavior should be the same for programs compiled with Linux and Windows, namely to literally copy the array contents to the terminal. The tricky parts are (a) making sure the array contains UTF-8; this is the default for GCC, but I don't know if a Windows editor and/or runtime might screw this up, and (b) making sure the terminal program interprets UTF-8 correctly.
    I know it's not a good idea, but if the value for LC_ALL could be used instead of the macro, things would be easier. Objections?
    If you have to change the source anyway, why not add the include? LC_ALL is almost certainly going to be different on different systems, so it's a good idea to use the symbol rather than the constant.

    Eric
  • jazzedjazzed Posts: 11,803
    edited 2013-03-23 19:41
    My conclusion at this point: all we need to add is #include <locale.h> It works.
    Is there a still compelling reason for adding the flag -finput-charset=UTF-8 anyway?
  • ersmithersmith Posts: 6,054
    edited 2013-03-24 05:22
    jazzed wrote: »
    My conclusion at this point: all we need to add is #include <locale.h> It works.
    Just adding the include? Or the include plus setlocale() call? The latter, I presume.
    Is there a still compelling reason for adding the flag -finput-charset=UTF-8 anyway?

    Probably not. GCC usually defaults to UTF-8 input, but I don't know if there is any special handling for Windows (which does not handle UTF-8 well, and insists on using obsolete code pages for 8 bit data).

    Eric
  • jazzedjazzed Posts: 11,803
    edited 2013-03-24 10:03
    ersmith wrote: »
    Just adding the include? Or the include plus setlocale() call?

    Just adding the include seems to fix it. I'll suggest using setlocale() regardless.
  • lardomlardom Posts: 1,659
    edited 2013-03-24 10:12
    Google Translate says "你好吗" means "How are you doing?"
  • Heater.Heater. Posts: 21,230
    edited 2013-03-24 10:19
    非常感谢您。 lardom.
  • ersmithersmith Posts: 6,054
    edited 2013-03-24 12:47
    jazzed wrote: »
    Just adding the include seems to fix it. I'll suggest using setlocale() regardless.

    There's something wrong here... the include file itself has nothing that would suggest a character encoding or change the behavior of the program.

    What exactly is the output if you compile it without the include?

    Eric
  • jazzedjazzed Posts: 11,803
    edited 2013-03-24 15:56
    ersmith wrote: »
    There's something wrong here... the include file itself has nothing that would suggest a character encoding or change the behavior of the program.

    What exactly is the output if you compile it without the include?

    Eric

    I've seen some different behaviors now that I can't really explain. I lost my basis.

    Now I can't get it to fail unless i use Simple printf which fails without stdio.h but works with stdio.h.
    Normal printf seems to work out of the box.
    #include <stdio.h>
    #include <propeller.h>
    
    
    int main(void)
    {
        char chi_Text[] = "&#20320;&#22909;&#21527;&#65311;";
        waitcnt(CLKFREQ/5+CNT);
        putchar('\0');
        printf("Hello World.\n");
        printf("Chinese: %s\n", chi_Text);
        return 0;
    }
    
    Hello World.
    Chinese: &#20320;&#22909;&#21527;&#65311;
    
    
Sign In or Register to comment.