[Tutor] Printing Chinese characters?
Neal McBurnett
neal at bcn.boulder.co.us
Thu Oct 16 00:44:28 EDT 2003
Well, I think the idea that it is at least similar to big5 is right.
But it may have a Japanese hiragana character also.
But to make that work I had to drop the '?' characters, as well as the
\xc8 (trial and error....)
I used linux and the free, quirky but very handy "recode" program to
do the recoding. I inserted a few newlines to help keep my place....
original string:
'\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf'
my script:
$ python2 -c "print '\xba\xda\xcf\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0\xac\xba\xda\n\xe7\xba\xf8\xb9\xa5\xa3'" |
recode big5..dump
Output, in Unicode UCS2 form:
UCS2 Mne Description
7AAA
6D09
90FD
7357
8154
3081 me hiragana letter me
8867
7AAA
000A LF line feed (lf)
8765
7E97
5974
000A LF line feed (lf)
Those characters can be looked up via the Unihan.txt file at
unicode.org, yielding the name of each character, and in many common
cases also pronunciation and a definition:
$ for i in 7AAA 6D09 90FD 7357 8154 3081 8867 7AAA 000A 8765 7E97 5974 000A; do
fgrep $i Unihan.txt | grep kDefinition; done
U+7AAA kDefinition hollow; pit; depression; swamp
U+90FD kDefinition metropolis, capital; all, the whole; elegant,
refined
U+7357 kDefinition unruly, wild, violent, lawless
U+8154 kDefinition chest cavity; hollow in body
U+7AAA kDefinition hollow; pit; depression; swamp
U+8765 kDefinition a fly which is used similarly to cantharides
U+5974 kDefinition slave, servant
The other characters weren't in that "dictionary".
I don't know what the deal is with the characters I had to drop out,
so it may be some other character set which is related to big5.
But I think that for someone who knows no Chinese, using
free tools and databases....
Cheers,
Neal McBurnett http://bcn.boulder.co.us/~neal/
Signed and/or sealed mail encouraged. GPG/PGP Keyid: 2C9EBA60
On Thu, Oct 16, 2003 at 01:54:54PM +1000, Alfred Milgrom wrote:
> Hi Danny:
>
> Thanks for your reply.
> Given that this is a Chinese string, I think it might be a BIG-5 encoding,
> but I am unable to find the proper encoding files.
>
> In my distribution of Python, there is an encodings directory under
> Python22/Lib, and a file called aliases.py. As I understand it, this module
> is used by the encodings package search function to map encodings names to
> module names.
>
> There is an interesting comment under CJK encodings (Chinese, Japanese,
> Korean) as follows:
> # The codecs for these encodings are not distributed with the
> # Python core, but are included here for reference, since the
> # locale module relies on having these aliases available.
>
> Do you (or anyone else) know where I can get the Chinese encodings,
> including BIG-5?
>
> Thanks in advance,
> Fred Milgrom
>
>
> At 02:52 PM 15/10/03 -0700, Danny Yoo wrote:
>
> ><snip>
> >But that character string you've posted:
> >
> >###
> >s = ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
> > '\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
> >###
> >will need to be first decoded from whatever byte encoding it is in now
> >into Unicode before any display approach will work.
> >
> ><snip> Do you have more information on
> >the byte encoding is being used for your string 's'?
> >
> >Good luck to you!
>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list