[Tutor] Printing Chinese characters?

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Thu Oct 16 02:47:31 EDT 2003



On Wed, 15 Oct 2003, Neal McBurnett wrote:

> Ahh - and the final step - that would yield this utf-8 encoding (of the
> original string minus the troublesome characters) rendered as a python
> string:
>
> print
> '\xe7\xaa\xaa\xe6\xb4\x89\xe9\x83\xbd\xe7\x8d\x97\xe8\x85\x94\xe3
> \x82\x81\xe8\xa1\xa7\xe7\xaa\xaa\xe8\x9d\xa5\xe7\xba\x97\xe5\xa5
> \xb4\x0a'


Ah, then it is UTF-8 then?  Oh, I must have introduced some weird
characters when I copied and pasted.  You're right!  Oh, cool!

###
>>> s = ('\xe7\xaa\xaa\xe6\xb4\x89\xe9\x83\xbd\xe7\x8d\x97\xe8'
...    + '\x85\x94\xe3\x82\x81\xe8\xa1\xa7\xe7\xaa\xaa\xe8\x9d'
...    + '\xa5\xe7\xba\x97\xe5\xa5\xb4').decode('utf8')
>>> s
u'\u7aaa\u6d09\u90fd\u7357\u8154\u3081\u8867\u7aaa\u8765\u7e97\u5974'
###

There, now it's decoding properly.  Yes, it matches what Neal decoded:


> > U+7AAA kDefinition hollow; pit; depression; swamp
> > U+90FD kDefinition metropolis, capital; all, the whole; elegant,
> > refined
> > U+7357 kDefinition unruly, wild, violent, lawless
> > U+8154 kDefinition chest cavity; hollow in body
> > U+7AAA kDefinition hollow; pit; depression; swamp
> > U+8765 kDefinition a fly which is used similarly to cantharides
> > U+5974 kDefinition slave, servant


Wow, that sounds rather... um... grim.  *grin*


Most web browsers have native support for utf8-encoded files, so, in a
pinch, you might be able to see the message this way:

###
msg = ('\xe7\xaa\xaa\xe6\xb4\x89\xe9\x83\xbd\xe7\x8d\x97\xe8'
       + '\x85\x94\xe3\x82\x81\xe8\xa1\xa7\xe7\xaa\xaa\xe8\x9d'
       + '\xa5\xe7\xba\x97\xe5\xa5\xb4')
print """<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>%s</p>
</body>
</html>""" % msg
###

Redirect the result of this to an HTML file, and then try browsing it.

If you're still having trouble seeing it, visit:

    http://hkn.eecs.berkeley.edu/~dyoo/weird_chinese_msg.pdf

I've printed it out as a PDF as a stopgap measure if you're really
desperate to see the Chinese characters.  *grin*


But does anyone know if ReportLab's happy with UTF-8 characters?



> > > There is an interesting comment under CJK encodings (Chinese, Japanese,
> > > Korean) as follows:
> > >     # The codecs for these encodings are not distributed with the
> > >     # Python core, but are included here for reference, since the
> > >     # locale module relies on having these aliases available.
> > >
> > > Do you (or anyone else) know where I can get the Chinese encodings,
> > > including BIG-5?


Here you go:

    http://cjkpython.i18n.org/

It looks like we won't need them this time, but if we run across BIG5
encoded files, we'll know what to do to transform them to utf8 now.


Good luck!




More information about the Tutor mailing list