SPIN compilers and national charsets/encodings

Andrey Demenev · 2011-02-01 07:42

SPIN compilers are not good in handling strings containing characters that fall outside of Latin-1 charset. PST replaces them with $FF, and BST - with $3F (question mark). So, the question is : what can be done here? I need to have russian strings in my programs

Heater. · 2011-02-01 08:49

Define all the characters you need as constants in a CON section somewhere. Perhaps in an object that has nothing else in it so it can be used anywhere.
Use those constants in byte declarations etc where you normally use quoted "strings".
Tedious but workable.

Perhaps create a utility program for your PC that reads in lines of such text in unicode and outputs the required Spin declarations.

Andrey Demenev · 2011-02-01 14:58

Actually, the roots of this question come from my willingness to promote the propeller among the Russian-speaking. I do not mind using character codes myself, but that is not what I would recommend to a novice. So far, I only have one approach that would work more or less painlessly:

have a program that will parse files starting from top file to find all objects, an copy them into a temp directory, converting to single-byte encoding, then launch the actual compiler for the copy of top file in the temp dir.

Andrey Demenev · 2011-02-02 03:41

I've managed to make my programs speak Russian! In attachments there are screenshots of my programming environment (still work in progress, uses bstc as compiler - many thanks to Brad Campbell for creating it) and Windows terminal receiving characters in CP1251 Cyrillic encoding. I used the approach of temporary files re-encoded as desired from unicode.

Heater. · 2011-02-02 04:24

So what you have there is a more advance version of my suggestion, in that it starts with the actual Spin file and outputs the converted spin file.

However I was thinking about this:

In the general case you are tackling internationalization of Spin programs. That is to say that in the original source code I might have a string like "Hello" but I want to compile the program translating "Hello" into some other target language.

So I want to write something like serial.str(string("Hello")) in my code and it comes out as "Terve" for the Finnish version of the program.

Perhaps a smarter scheme could be used which is not much harder to implement. For example in the Qt graphical user environment I might write something like:

serial.str(string(tr("Hello")));
[FONT=Arial]

Where the "tr" causes the correct translation of the string "Hello" to be used.

You could do the same in Spin. Have your utility parse the source looking for calls to "tr" and replacing them with the translated string characters.

This also need to work for strings defined in DAT like

a_string BYTE tr("Hello")

Translations would be provided in some text files for the parser/converter to use. Then a program could be used in many languages without changing the original source.

Such a feature could be useful in the BST tool for example.

[/FONT]

Andrey Demenev · 2011-02-02 05:43

Well, I was not intending to go that far - I just wanted to allow users to have strings from any charset and encoding they want.But the preprocessor I wrote can be extended to support tr() or _().

SPIN compilers and national charsets/encodings

Comments