[Tutor] Printing Chinese characters?
Neal McBurnett
neal at bcn.boulder.co.us
Thu Oct 16 00:54:41 EDT 2003
Ahh - and the final step - that would yield this utf-8 encoding (of
the original string minus the troublesome characters) rendered as a
python string:
print '\xe7\xaa\xaa\xe6\xb4\x89\xe9\x83\xbd\xe7\x8d\x97\xe8\x85\x94\xe3\x82\x81\xe8\xa1\xa7\xe7\xaa\xaa\xe8\x9d\xa5\xe7\xba\x97\xe5\xa5\xb4\x0a'
which prints fine on my utf-8-enabled xterm, as described so
wonderfuly at http://www.cl.cam.ac.uk/~mgk25/unicode.html by Markus
Kuhn. Though a few characters aren't in the free font I got with
X11/Redhat 7.3.
Of course I may be way off-base here - just playing around with it.
-Neal
On Wed, Oct 15, 2003 at 10:44:28PM -0600, Neal McBurnett wrote:
> Well, I think the idea that it is at least similar to big5 is right.
> But it may have a Japanese hiragana character also.
>
> But to make that work I had to drop the '?' characters, as well as the
> \xc8 (trial and error....)
>
> I used linux and the free, quirky but very handy "recode" program to
> do the recoding. I inserted a few newlines to help keep my place....
>
> original string:
> '\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf'
>
> my script:
> $ python2 -c "print '\xba\xda\xcf\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0\xac\xba\xda\n\xe7\xba\xf8\xb9\xa5\xa3'" |
> recode big5..dump
>
> Output, in Unicode UCS2 form:
>
> UCS2 Mne Description
>
> 7AAA
> 6D09
> 90FD
> 7357
> 8154
> 3081 me hiragana letter me
> 8867
> 7AAA
> 000A LF line feed (lf)
> 8765
> 7E97
> 5974
> 000A LF line feed (lf)
>
> Those characters can be looked up via the Unihan.txt file at
> unicode.org, yielding the name of each character, and in many common
> cases also pronunciation and a definition:
>
> $ for i in 7AAA 6D09 90FD 7357 8154 3081 8867 7AAA 000A 8765 7E97 5974 000A; do
> fgrep $i Unihan.txt | grep kDefinition; done
>
> U+7AAA kDefinition hollow; pit; depression; swamp
> U+90FD kDefinition metropolis, capital; all, the whole; elegant,
> refined
> U+7357 kDefinition unruly, wild, violent, lawless
> U+8154 kDefinition chest cavity; hollow in body
> U+7AAA kDefinition hollow; pit; depression; swamp
> U+8765 kDefinition a fly which is used similarly to cantharides
> U+5974 kDefinition slave, servant
>
> The other characters weren't in that "dictionary".
>
> I don't know what the deal is with the characters I had to drop out,
> so it may be some other character set which is related to big5.
>
> But I think that for someone who knows no Chinese, using
> free tools and databases....
>
> Cheers,
>
> Neal McBurnett http://bcn.boulder.co.us/~neal/
> Signed and/or sealed mail encouraged. GPG/PGP Keyid: 2C9EBA60
>
>
> On Thu, Oct 16, 2003 at 01:54:54PM +1000, Alfred Milgrom wrote:
> > Hi Danny:
> >
> > Thanks for your reply.
> > Given that this is a Chinese string, I think it might be a BIG-5 encoding,
> > but I am unable to find the proper encoding files.
> >
> > In my distribution of Python, there is an encodings directory under
> > Python22/Lib, and a file called aliases.py. As I understand it, this module
> > is used by the encodings package search function to map encodings names to
> > module names.
> >
> > There is an interesting comment under CJK encodings (Chinese, Japanese,
> > Korean) as follows:
> > # The codecs for these encodings are not distributed with the
> > # Python core, but are included here for reference, since the
> > # locale module relies on having these aliases available.
> >
> > Do you (or anyone else) know where I can get the Chinese encodings,
> > including BIG-5?
> >
> > Thanks in advance,
> > Fred Milgrom
> >
> >
> > At 02:52 PM 15/10/03 -0700, Danny Yoo wrote:
> >
> > ><snip>
> > >But that character string you've posted:
> > >
> > >###
> > >s = ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
> > > '\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
> > >###
> > >will need to be first decoded from whatever byte encoding it is in now
> > >into Unicode before any display approach will work.
> > >
> > ><snip> Do you have more information on
> > >the byte encoding is being used for your string 's'?
> > >
> > >Good luck to you!
> >
> >
> >
> > _______________________________________________
> > Tutor maillist - Tutor at python.org
> > http://mail.python.org/mailman/listinfo/tutor
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list