[Tutor] Printing Chinese characters?

Thu Oct 16 00:44:28 EDT 2003

Well, I think the idea that it is at least similar to big5 is right.
But it may have a Japanese hiragana character also.

But to make that work I had to drop the '?' characters, as well as the
\xc8 (trial and error....)

I used linux and the free, quirky but very handy "recode" program to
do the recoding.  I inserted a few newlines to help keep my place....

original string:
 '\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf'

my script:
$ python2 -c "print '\xba\xda\xcf\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0\xac\xba\xda\n\xe7\xba\xf8\xb9\xa5\xa3'" |
 recode big5..dump

Output, in Unicode UCS2 form:

UCS2   Mne   Description

7AAA      
6D09      
90FD      
7357      
8154      
3081   me    hiragana letter me
8867      
7AAA      
000A   LF    line feed (lf)
8765      
7E97      
5974      
000A   LF    line feed (lf)

Those characters can be looked up via the Unihan.txt file at
unicode.org, yielding the name of each character, and in many common
cases also pronunciation and a definition:

$ for i in 7AAA 6D09 90FD 7357 8154 3081 8867 7AAA 000A 8765 7E97 5974 000A; do
  fgrep $i Unihan.txt | grep kDefinition; done

U+7AAA kDefinition hollow; pit; depression; swamp
U+90FD kDefinition metropolis, capital; all, the whole; elegant,
refined
U+7357 kDefinition unruly, wild, violent, lawless
U+8154 kDefinition chest cavity; hollow in body
U+7AAA kDefinition hollow; pit; depression; swamp
U+8765 kDefinition a fly which is used similarly to cantharides
U+5974 kDefinition slave, servant

The other characters weren't in that "dictionary".

I don't know what the deal is with the characters I had to drop out,
so it may be some other character set which is related to big5.

But I think that  for someone who knows no Chinese, using
free tools and databases....

Cheers,

Neal McBurnett                 http://bcn.boulder.co.us/~neal/
Signed and/or sealed mail encouraged.  GPG/PGP Keyid: 2C9EBA60

On Thu, Oct 16, 2003 at 01:54:54PM +1000, Alfred Milgrom wrote:
> Hi Danny:
> 
> Thanks for your reply.
> Given that this is a Chinese string, I think it might be a BIG-5 encoding, 
> but I am unable to find the proper encoding files.
> 
> In my distribution of Python, there is an encodings directory under 
> Python22/Lib, and a file called aliases.py. As I understand it, this module 
> is used by the encodings package search function to map encodings names to 
> module names.
> 
> There is an interesting comment under CJK encodings (Chinese, Japanese, 
> Korean) as follows:
>     # The codecs for these encodings are not distributed with the
>     # Python core, but are included here for reference, since the
>     # locale module relies on having these aliases available.
> 
> Do you (or anyone else) know where I can get the Chinese encodings, 
> including BIG-5?
> 
> Thanks in advance,
> Fred Milgrom
> 
> 
> At 02:52 PM 15/10/03 -0700, Danny Yoo wrote:
> 
> ><snip>
> >But that character string you've posted:
> >
> >###
> >s = ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
> >     '\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
> >###
> >will need to be first decoded from whatever byte encoding it is in now
> >into Unicode before any display approach will work.
> >
> ><snip> Do you have more information on
> >the byte encoding is being used for your string 's'?
> >
> >Good luck to you!
> 
> 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor