Unicode processing for Chinese character set
LoopyByteloose
Posts: 12,537
I know this may get bumped to the sandbox, but I am considering using the Sx-48 protoboard to impliment an HP printer in Chinese.
{I really am considering a chopstick printer.· Japanese, Koreans, and Chinese are quite amused by such absurdities.}
What I need is a Unicode character set that is very generic [noparse][[/noparse]16x16] and easy to port serially in a series of 8 bytes.· This would be the Chinese equivalent of dot matrix.
I don't expect the SX to manage a complete look up of all 7000 characters. I would just like to find a way to select and transfer individual characters on an as needed basis.
Free or shareware software would be wonderful.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
{I really am considering a chopstick printer.· Japanese, Koreans, and Chinese are quite amused by such absurdities.}
What I need is a Unicode character set that is very generic [noparse][[/noparse]16x16] and easy to port serially in a series of 8 bytes.· This would be the Chinese equivalent of dot matrix.
I don't expect the SX to manage a complete look up of all 7000 characters. I would just like to find a way to select and transfer individual characters on an as needed basis.
Free or shareware software would be wonderful.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
Comments
I found this link, which has a CJK true type font (CyberCJK.zip), read the ReadMe.htm, you may be able to generate a bitmap file using a utility (ttf2bmp is such a utility, but I don't know how well it works).
Here's also a page of links about Chinese fonts, I only explored a couple of them: http://cgm.cs.mcgill.ca/~luc/china.html
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
·1+1=10
Post Edited (Paul Baker) : 1/20/2006 3:48:44 PM GMT
http://www.sxlist.com/techref/datafile/charsets.htm
that will assist you in converting the graphic image of the characters into data tables.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
James Newton, Host of SXList.com
james at sxlist,com 1-619-652-0593 fax:1-208-279-8767
SX FAQ / Code / Tutorials / Documentation:
http://www.sxlist.com Pick faster!
Eventually changing over to Simplified is quite easy [noparse][[/noparse]it is harder to go the other direction because of the actually changes made by Simplified's simplificaton process].
I see huge LED signs everywhere, everyday in this 16X16 format -- but I cannot find anyone amongst my students that has the savy to just locate the info.
This whole area of transfering technology across the Asian language and culture barriers is quite challenging. Sometimes the best documentation are resources are in English, and sometimes not.
I have looked at both sites and it is all helpful, but still quite a bit to come to terms with.· I have to migrate font images from a Windows based format to something that is purely dot based before I can transfer it to the HP Printer.
Since the HP Printer is 96 DPI and 1/4 high, I may actually need a 24x24dot format rather than the 16x16 that I was thinking of.
Think, think, think....
Western's culture invention of the alphabet really is profound - 26 characters versus 7000.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
Post Edited (Kramer) : 1/21/2006 4:28:02 PM GMT
There was talk of trying to get a Korean font together for the Inkjets, but the full Han zi may prove too complicated for such small resolution- (probably not readable)-
Ryan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Ryan Clarke
Parallax Tech Support
RClarke@Parallax.com
This is something I really have to come to terms with.
Currently both seem to be in competition for viability. I suppose there is an issue of functunionality, but it has always seemed to indicate a sociopolitical barrier to me [noparse][[/noparse]Like Simplified and Traditional]. My computer seems currently adapted for both, but I honestly don't know which it is defaulting to. One may actually have more characters and be better at handling the relationship of individual characters to their sound [noparse][[/noparse]some characters have two or more pronunciations that indicate different meanings].
Can you tell me which is the best inventory and data management tool? I dunno...
Can you honestly say one or the other is easier to process? I dunno....
As you begin to see, the Printing is a very small part of the Asian code problem. The Basic 7000 character set and the larger 25,000 character set of Chinese requires a 16bit code for the identity and a much more detailed level of quality for printing legible small type.
Of course, when you go from a 5X7 matrix to a 24x24 matrix for printing, things begin to slow down geometrically.
It would be easy to print ASCII on a chopstick, but that wouldn't appeal to the Asian culture at all. There is a deep reverence for the calligraphy and a deep pride of having the most ancient written languages that have survived.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
·1+1=10
First,
I need 'bitmapped fonts', not 'outline fonts' -
The bitmapped font provides date on each bit in a grid. In my case I plan to use a 24x24 bit grid.
Truetype and various other schemes are outline fonts. These seem to have advantages in being graphically easier to modify to adjust for different pixel and printer resolution, but are more demanding of computing power due to their adjustiblity.
There are BITMAPed Chinese Fonts that are easy to buy, but what can I do with these if I cannot look and select from thousands of characters? There are also BITMAPed Font creators, but I would have to create each character individually as I went along.
So we come to Second,
I need to come to terms with a system that will actually give me access to characters in a genuine usable scheme.
Unicode took over because BIG5 and GB were a big buggy in how they handled 16bits as double bytes.
You would go along with Chinese and then suddenly have a few Western Chracters slip in. I am not sure why it happened, but it was quite prevalent. Unicode also seems to accommodate all the characters of all the world languages within it's encoding scheme -- not just Asian languages or Chinese.
Fortunately, I have TwinBridge installed as I was able to buy a licensed copy here from a Clearance Item Sale for 90% off and I have kept it updated. I even have a book of documentation. Up to this point, it has always seemed cryptic. This is where much of the muddle of Unicode, GB, or BIG5 comes into play. Also where I might sort it out.
I really don't care which I use as long as I can get a file created that has a coherent string of byte codes [noparse][[/noparse]three per vertical line times 24 for horizontal movement]. It appears that each character would take 72 bytes to have a complete bitmap - some of that might include 'white space' for horizontal separation, or maybe not. I can add the separation in software and fully use the 24 dots for better resolution.
Incidentally, I have been Googleing around in Chinese Bitmaps and page two of my Google search referred me to Parallax Forums, this thread! Information seems somewhat sparse and dated.
And yet, that makes it really an interesting challange as there are a lot of graphic LCDs that can use Chinese if one understands how to get access to the appropriate Bitmap data.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
2. in word processor, type each symbol, one after the other
3. Adjust size to 24 pixels per symbol (or whatever size you need)
4. Press Alt+Prt Sc
5. In paint or other bitmap editor, press edit / paste
6. Save as PNG
7. Use Character Set Extractor program from my prior post to make data tables from ALL characters in one pass.
http://www.piclist.com/techref/datafile/charset/extractor/charset_extractor.htm
You will have to do it 255-32 characters at a time but that is much better than 1 at a time.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
James Newton, Host of SXList.com
james at sxlist,com 1-619-652-0593 fax:1-208-279-8767
SX FAQ / Code / Tutorials / Documentation:
http://www.sxlist.com Pick faster!
I jumped into my TwinBridge software documentation last night and it is quite a historic survey of Asian Fonts, Asian Input schemes; and pre-Unicode language GUIs.
Nonetheless, eliminating all that history and generating a graphic file to get an appropriate bitmap is definitely the way to go.
Since I am trying to print on a chopstick, I will never need 255 characters and probably not even 32 characters.
I just need a file to provide appropriate bytes in sequence.· It seems that the file will be too big to store on the BasicStamp or the SX-28, so a serial link [noparse][[/noparse]either RF or hardwire] to my computer will have to be used.
Congradulations!
You have also pointed out an obvious pathway for printing ANY graphic material with·an HP inkjet printhead.
So this really begins to open up the HP Printer Contest to more people that are just starting to learn.
[noparse][[/noparse]There is an Art Work Prize in one of the 3 catagories.]
As far as the GB, it was originally Simplified; the Big5 was originally Traditional; and Unicode was originally intended to be universal.· GB and Big5 have trying to 'universalize', but while a document may have both Traditional and Simplified, they still are different and cannot be combined in one document.
They all use 16bits, but have a different data scheme. TwinBridge created its own fonts [noparse][[/noparse]the CJK set] which can only be used by people that have TwinBridge software installed. There is also another code for just Internet HTML email.·
And finally, Microsoft has for many years produced three versions of its operating system: one for most of the western world, one for files described in Simplified GB Chinese, and one for files described in Traditional Big5 Chinese.· [noparse][[/noparse]I had suspected as much and bought the American version of WindowsXP at a hefty premium in Taiwan].· You notice this when you try to switch menus to another language or try to install a Chinese originating software on an incompatible system [noparse][[/noparse]garbled characters].
In sum, if you can write Japanese, Korean, Chinese on your system - don't worry about all that. Just create the graphic file for output.· The graphic output is generic and somewhat universal.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
Post Edited (Kramer) : 1/28/2006 6:33:30 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
·1+1=10
What I must do now is buy the HP Inkjet Kit. I have been holding back because of the part of the software seemed so complex.
GUIs [noparse][[/noparse]graphic user interfaces] are complex on their own, but the Asian languages went through a very choppy develoment phase and documentation never fully was translated back to English [noparse][[/noparse]I am sure the users that needed it understood it.]
While I have a much better understanding than a week ago, I am quite relieved to be able to sidestep all that.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
This is a classic example of how one point of view, one set of experiences, one map of the facts of life can make communication difficult with others who have a different mind.
Now... How often does this happen between people who were raised reading English and people who were raised reading Asian languages? How many more assumptions must there be? How many simple, quick, obvious (to one) communicatoins lead to a complete lack of communications (between two)?
Illustrating a method of keeping oneself from making these assumptions would be of great value.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
James Newton, Host of SXList.com
james at sxlist,com 1-619-652-0593 fax:1-208-279-8767
SX FAQ / Code / Tutorials / Documentation:
http://www.sxlist.com Pick faster!
Having a simple Bitmap proceedure to small displays is actually important to this process as manufactures can easily create translated menus rather than expect people to learn English.
Actually, the whole topic of Asian Characters is quite a muddle. And quite a programing distraction.
Much of it is tied up in the evolution of the GUI as it would be nearly impossible to data process Chinese without it.
First you have three or more 16bit encoding formats that are used for documents [noparse][[/noparse]GB. Big5, Unicode, and Internet] which are mutally exclusive;
then you have different possible keyboard encodings;
then you have TwinBridge coming along with proprietary Asian Outline Fonts [noparse][[/noparse]Window incompatibe] and a system that adapts Windows via proprietory overlays;
then you have China and Taiwan creating two separate Character Sets that are not one-to-one translations and don't even use the same radicals to seek a word in a dictionary;
then you have Microsoft creating three versions of their operating system to accomodate the politcal realities of Taiwan and China [noparse][[/noparse]which means that when you buy Chinese fonts they may not install to your particular Windows system];
then you have a huge variety of printers and print drivers;
then you have all those different pixel densities that have evolved.
Also, systems with their fonts vary from as little as 1000 plus character to as many as 17,000.
So, if you can sidestep going into that maze - happiness.
It just ain't ASCII and makes EBDIC a snap.
Happily, I am much closer to actually being able to program a graphics LCD with a Chinese menu because of this revelation.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)
······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
Post Edited (Kramer) : 1/30/2006 6:09:14 AM GMT