NEEDED -- a Chinese dumb terminal emulation software - Linux preferred

LoopyByteloose · 2013-08-01 10:57

Actually, I am hoping for all of Unicode so that I could cover Japanese, Korean, Arabic, Crylic, Hebrew, and more ... but Chinese may just be a good place to start.

What's required? An ability to generate all 7000 characters in one reasonable font from Unicode 16 bit sent over a serial port.

Keyboard input to generate all 7000 characters.
I suspect this might be the bigger challenge than the text display. There are several methods of selecting characters. Just the top three are...
A. Phonetic representation.
1. United Nations recognized Roman Pinyin
2. Taiwan's BPMF representation (have keyboards that show all the right input keys)

B. Character Visual components
3. Zang Jie Speed typing (this is really the best for anyone that seriously inputs Chinese text).

Items 2 and 3 are the main means of input by native Chinese.

~~~~~~~~~~~~~~~~~~~~~~
I can and do use Chinese from the keyboard and to the display in Linux. Ubuntu Linux supports it. So there are the systems in place to get characters from the keyboard, to place characters in order on the display, and to save to a text file of sorts (Libre Office).

What I'd like is the UNICODE equivalent of an ASCII .txt file format that is capable of being sent over RS232/RS422.

I'd like to be able to communicate in Chinese in a full-duplex mode with two of these terminals.

Heater. · 2013-08-01 16:43

Loopy,

What I'd like is the UNICODE equivalent of an ASCII .txt file format that is capable of being sent over RS232/RS422.

So what you need is already a defined standard. It's called UTF-8. It's the standard of the WEB.

Basically it's your normal 8 bit ASCII but the codes greater that 127 are used to get you into the unicode world. https://en.wikipedia.org/wiki/UTF-8

jazzed · 2013-08-01 16:55

SimpleIDE's terminal handles Chinese via UTF8. There was a bug in the input side until recently though.

Phil Pilgrim (PhiPi) · 2013-08-01 17:24

A serial terminal that handles Unicode has to be more complicated than a display for a Unicode file or webpage. In the file/webpage case, which is a static entity, special flag charcters are prepended to indicate what kind of encoding it uses (e.g. UTF-8, UTF-16-big-endian, etc.) But in a streaming situation, things get a little dicey. What if the prepended characters are missed? Can you insert the code flags midstream from time to time? And how do you resynchronize on double bytes if you should get out of sync? Does Unicode have a "synchronizing sequence" that can be transmitted occasionally?

-Phil

Heater. · 2013-08-02 03:48

Phil,

As far as I can tell that is not so.

UTF-8 is specifically designed to be "self synchronizing". That is to say you can start reading a stream or document at any point and get correct characters out of it.

I guess one migh might start reading in the middle of a multi-byte character sequence and have to drop that character. But thats not worse than starting to listen to a serial stream half way through an ASCII character.

UTF-16/32 are not robust in that way, needing some bytes at the start of a document to indicate what coding it is and endianness. They are falling out of favor though.

LoopyByteloose · 2013-08-02 09:51

From what little I understand, MS uses UTF-16, while Linux uses UTF-8.

UTF-8 seems to have been specifically designed to operate with Unix legacy RS232 8bit communications. That's what I want, and that is likely to be the best for Parallax as well. I guess Simple IDE also provides a UTF-8 terminal from within Windows. That would be extremely good news.

pedward · 2013-08-02 10:14

I wrote the UTF-8 state machine for SimpleIDE a long time ago, it's worked since the first mention of internationalization.

Heater. · 2013-08-02 12:44

Loopy,

From what little I understand, MS uses UTF-16, while Linux uses UTF-8.

No. Never mind Unix uses this and MS uses that. In the modern world the fundamental unit of data, after the bit, is the byte. At the OS level bytes rule for network interfaces, serial interfaces, file systems etc. Unix and Windows the same.When this starts to matter is at the application level. Does your text editor understand ASCII or UTF-8 or UTF-16 and so on. What format are you files saved in? What format are your communication streams in?At this point we enter a whole world of confusion. Apps, editors, web browses and such have to understand all this. (Aside: And then people start complaining that software is so "bloated" now a days, well, if you want internationalization and such it has to be bloated out to do that.)Have a look at wikipedia to see what you are up against: https://en.wikipedia.org/wiki/UTF-8All in all I think UTF-8 is what you are looking for.

LoopyByteloose · 2013-08-02 15:40

@ Heater
Regardless of what Unix does right, MS will always try to find a way to muck it up and make users dependent on Windows. UTF-16/32 appears to be yet another case in point.

Data is not just data. IBM has thier EBIDIC to confound others, we started out with extremely expensive pre-formated floppy disks (with proprietary formats) to confound everyone and to charge absurd prices for a rather cheap piece of plastic. And the list goes on and on.

~~~~~
I suppose you all might be wondering what I am trying to do.

It is quite simple, I am trying to program Forth in Chinese. I did find LInux provides a UTF-8 terminal application called, RXVT-UNICODE, and it appears that with a version of Linux that completely supports Asian Languages I can just add words that are Chinese, Japanese, Korean, Hebrew, and so on. All I have to do is open my serial commuincation (minicom, PuTTy, or whatever) inside the RXVT-UNICODE terminal window.

If Simple IDE will do the same, I can revise all the names for the lower level Forth words to Chinese and have a programing language for the Propeller that is fully Chinese.

If Simple IDE will allow users a terminal in UTF-8 inside Windows, that makes this a widely supported feature.

+++++++++
This is interesting, but it is even more interesting as Taiwan has an active Forth community. And I also suspect that many new learners that are put off by both English and learning to compile programs will take to an interpreted computer language in their native tongue more readily.

It might be a big jump ahead of what the Arduino did... after all, a lot of us got started with computers in interactive Color Basic on a Tandy Color Computer or a Z80 Sinclair. Why not do it again in Asian languages.

Parallax could even sell Propeller boards with preloaded EEproms that allow the Chinese user to just plug in their USB, get a terminal interface, and have a working interpreter. They can learn Spin and Propeller Tool later... maybe never.

The interpreter doesn't have to just be Forth, PropBasic could be in the EEprom instead. Or, you could have two or three EEproms that can be switched to provide different alternative interpreters... IN THE NATIVE LANGUAGE of the new user.

prof_braino · 2013-08-03 07:29

Program FORTH in Asian languages - I thought the Japanese forth folks had done this back in the 80's? At that time, the DOS type OS's had a Front End Processor to do the complex key sequence to character translation. As OS evolved, so did the input methods. Now there is just a input mode where you type the first couple sounds and it gives a list of Chinese characters. Does this not work? Or do they skip this since it make the name field many times longer or something?

Anyway, I thought I saw something where there was a two part dictionary, where the names were a table ID instead of text. When you search for a word in the dictionary, the name field returns a pointer to the table that holds all the name text. You only access the name table when you need the text, which is only during code interpretation on input. The names can be as long and and strange as you want, and they can be kept on the host PC so the FORTH dictionary actually gets smaller instead of larger. The result was supposed to be that they could use standard tools on any standard OS that supported Asian languages. I haven't kept up on this.

Dave Hein · 2013-08-03 08:04

In theory, Forth word names are just strings of bytes, and the value of the bytes can be anything between 0 to 255. However, various Forth interpreters may impose some restrictions on the names, which could conflict with extended character sets. The LF and CR characters are usually used to indicate and end-of-line for ACCEPT and some of the other words. And the standard ASCII characters should be preserved to support the standard Forth words.

Some Forth interpreters convert lower case to upper case when searching the dictionary. This would need to be disabled to support extended character sets. Also, some interpreters depend on the character value being less than 128 so they can search backwards through the name looking for a flag bit set in the MSB of a byte. A different method would be needed to determine the start of a name from the body or XT of a word. Without those restrictions there shouldn't be any problem supporting extended character sets.

jazzed · 2013-08-03 08:17

pedward wrote: »

I wrote the UTF-8 state machine for SimpleIDE a long time ago, it's worked since the first mention of internationalization.

Indeed. And that has been very good to have for output. Thanks.
The only problems I've seen so far are with the Quickstart..

I had to change the input side of the terminal to allow processing UTF8.
It should work in version 0-9-36 or later.

LoopyByteloose · 2013-08-03 08:33

I guess a bit of a mision statement is in order.

The purpose is to deploy Propeller based interpreted languages across language borders.

When the Japanese did in the 1980s may have been significante at the time. And I suppose one could even argue that Japanese is Chinese characters used in a different way. But there are historic details.

1. The Japanese introduced a phonetic alphabet to use in place of Chnese characters with computers as they just didn't have the capacity to encode and represent the 7000 or so Chinese characters at the time. Forth in the 1980s in DOS likely didn't use Chinese characters. Korean may seem like Chinese charasters, but it is a phonetic symbol text as well.. a lot easier to encode and display than Chinese

2. While the Japense never invented a written lanugage of their own and adopted Chinese, they did so entirely for their own purposes to govern their nation. Phonology and actually the meaning of Chinese characters can be quite divergent from Chinesee.

I just figure that if Unicode can handle Forth on the Propeller at this point, Parallax as an avenue to sell Propellers with pre-loaded interpreter languages to new uses in many parts of the world. The new user can have a very different and many much easier entry point to learning about digital compution. The interprete can be a version of Basic or Forth or even both if morre than one eeprom is provided.

As it is, there is a great deal of English being thrown at the Chinese that is front loading new users process of learning about the Propeller. And while Arduino has managed to do quite a bit of translation into Chinese (I do have a nice 350 page Arduino text nearly entirely in Chinese), it falls back on some mysterious English text.

I don't know how many of you have ever tried to type in Chinese, but it is a huge challenge for the non-native. The same goes for the other direction of the non-native user programing in English. Working the keyboard and finding typos can bog down work to a snail's pace. At the same time, a university student in China or Taiwan might actually type fast and accurately in Tsang Jie.. a very interesting speed typing method just for Chinese. I have learned it and if you can visualize a character, you can type any character of the 7000 characters in 5 or less keystrokes.

In sum, it is all about finding a fit in Unicode that would allow Parallax to succeed in Asia --- Japan, Korea, Mainland China, and Taiwan. The barrier is much more significant than many of you percieve.

Giving up case insensitivty may not be necessary. After all, Japanese, Chinese, and Korean don't have capitalization.

+++++++++++++++++
As it stands, I am struggling just to get my Ubuntu 12.04 to allow me to input Chinese to a simple text file. I suppose I can run trials in PropForth with 'cut and paste' to see if words in Chinese really work.

And some similar can also be done in pfth... once I get the text editting squared away.

After a few test words are loaded, then it is the question of whether the serial terminal will allow me to reach them.

+++++++++++++++++
Am I crazy for thinking about a way for Parallax to reach new markets?

So far, this is going slow on my part just because I have too many projects going on at once. Studying the Chinese language interface in Linux has never been much fun for me. I have had the input working in Libre Office previously, but I may have to start over and completely reinstall the Ubuntu to get it running again.

For now, I don't need Simple IDE. I am trying to do it all from withing resident Linux resources. But if Simple IDE will support the same outcome in Windows there is a lot to be gained.

NEEDED -- a Chinese dumb terminal emulation software - Linux preferred

Comments