Chinese language support of Python?

Boudewijn Rempt boud at valdyas.org
Sun Jul 7 02:22:10 EDT 2002


Leon Wang wrote:

> Hi, I got the Chinese displayed correctly in window title without
> change the default encoding in site.py by:
> 
> root.title(u'\u4e2d\u6587')
> 
> But still can not put Chinese directly as string in source, I can not
> live with so much \u... for a whole Chinese sensence/paragraph, it's
> impossible to read and edit them :(
> However, I can print Chinese string (normal string, without u prefix
> and \u codes) in console with command line python.exe. How can I let
> Tkinter accept that?

I don't think that's going to work (caveat: I use PyQt which has different
conventions). If you absolutely want to have Chinese characters in your
source files*, you can do something like the following**:

root.title(unicode('伱好?', 'utf-8')

Note that you _will_ have to construct a unicode object, not an ordinary
string, since ordinary strings are just containers for bytes, one character
per byte. If you want the system to understand what you mean.

You can find out which encodings are available by inspecting the 
python/lib/encodings directory (or, python\lib\encodings): you can use
any encoding instead of the 'utf-8'. Of course, the string must then
be in the right encoding, too.

There are some errors in my handling of this topic in my book, but it might 
still be useful to you:

http://www.opendocspublishing.com/pyqt/index.lxp?lxpwrap=c2029%2ehtm

errata:

http://www.valdyas.org/python/book.html

The paper version has nice pictures that are quite useful in this chapter.

* Actually I still think it would be great to be able to have sourcefiles
in utf-8, not limited to unicode strings. I want to type:

def 印刷():
    pass

That this would make my source code unreadable for a lot other people, tant
pis, I still would like the power. Just as I want the power to do a quick
sys.setAppDefaultEncoding('utf-8') to make sure this application sees all
its strings as encoded in utf-8.

** Note that this posting is encoded in utf-8. If you see gibberish instead
of a friendly greeting, then either the message is mangled, or your 
newsreader can't handle the encoding, or you don't have the fonts to show
Chinese.

-- 
Boudewijn Rempt | http://www.valdyas.org



More information about the Python-list mailing list