[Tutor] Printing Chinese characters?

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Wed Oct 15 17:52:16 EDT 2003



On Wed, 15 Oct 2003, Alfred Milgrom wrote:

> I have a string which I believe is made up of Chinese characters, but I
> cannot display it properly.
> The string is
>  >>> b =3D
> '\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6\xe5\xd0?\xac\xba\xda\xc8\x=
e7\xba?\xf8\xb9\xa5\xa3\xbf'
>
> This prints out as:
>
>  >>> print b
> =BA=DA=CF?=AC=B3=A3=BC=FB=B5=C4=C6=E5=D0?=AC=BA=DA=C8=E7=BA?=F8=B9=A5=A3=
=BF
>
> which clearly is not Chinese :)

Hi Alfred,

It all looks Greek to me.  *grin*


When you say "display", do you mean display in a Tkinter window?  Most
terminal windows don't natively support extended character sets, so you
might need to use a GUI to properly see those characters.

I'm guessing that you might be doing Unicode?  If so, maybe use can use
your web browser to help?  I wrote a small post a while back:

    http://mail.python.org/pipermail/tutor/2002-December/019087.html

that generates the Korean unicode character for the letter "Yoo".  *wink*



But that character string you've posted:

###
s =3D ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
     '\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
###

will need to be first decoded from whatever byte encoding it is in now
into Unicode before any display approach will work.


Let's try something:

###
>>> s =3D ('\xba\xda\xcf?\xac\xb3\xa3\xbc\xfb\xb5\xc4\xc6' +
=2E..      '\xe5\xd0?\xac\xba\xda\xc8\xe7\xba?\xf8\xb9\xa5\xa3\xbf')
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: unexpected code byte
>>> s.decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: UTF-16 decoding error: truncated data
###

Hmmm... no luck there.  The 'encodings' page:

    http://www.python.org/doc/lib/node126.html

shows the native encodings that Python supports out of the box.  I'm not
quite sure if we can guess the byte encoding, although the tests above
immediately discount UTF-8 and UTF-16.  Do you have more information on
the byte encoding is being used for your string 's'?


Good luck to you!




More information about the Tutor mailing list