[Tutor] Printing Chinese characters?
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Wed Oct 15 17:52:16 EDT 2003
On Wed, 15 Oct 2003, Alfred Milgrom wrote:
> I have a string which I believe is made up of Chinese characters, but I
> cannot display it properly.
> The string is
> >>> b =3D
> '\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0?\xac\xba\xda\xc8\x=
e7\xba?\xf8\xb9\xa5\xa3\xbf'
>
> This prints out as:
>
> >>> print b
> =BA=DA=CF?=AC=B3=A3=BC=FB=B5=C4=C6=E5=D0?=AC=BA=DA=C8=E7=BA?=F8=B9=A5=A3=
=BF
>
> which clearly is not Chinese :)
Hi Alfred,
It all looks Greek to me. *grin*
When you say "display", do you mean display in a Tkinter window? Most
terminal windows don't natively support extended character sets, so you
might need to use a GUI to properly see those characters.
I'm guessing that you might be doing Unicode? If so, maybe use can use
your web browser to help? I wrote a small post a while back:
http://mail.python.org/pipermail/tutor/2002-December/019087.html
that generates the Korean unicode character for the letter "Yoo". *wink*
But that character string you've posted:
###
s =3D ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
'\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
###
will need to be first decoded from whatever byte encoding it is in now
into Unicode before any display approach will work.
Let's try something:
###
>>> s =3D ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
=2E.. '\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
>>> s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: unexpected code byte
>>> s.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: UTF-16 decoding error: truncated data
###
Hmmm... no luck there. The 'encodings' page:
http://www.python.org/doc/lib/node126.html
shows the native encodings that Python supports out of the box. I'm not
quite sure if we can guess the byte encoding, although the tests above
immediately discount UTF-8 and UTF-16. Do you have more information on
the byte encoding is being used for your string 's'?
Good luck to you!
More information about the Tutor
mailing list