How to display Chinese in a list retrieved from database via python
Mark Tolonen
metolone+gmane at gmail.com
Mon Dec 29 04:06:18 EST 2008
"zxo102" <zxo102 at gmail.com> wrote in message
news:2560a6e0-c103-46d2-aa5a-8604de4d1968 at b38g2000prf.googlegroups.com...
> I have a list in a dictionary and want to insert it into the html
> file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
> can see "中文" in CASE 1 but that is not what I want. CASE 2 does not
> show me correct things.
> So, in CASE 3, I hacked the script of CASE 2 with a function:
> conv_list2str() to 'convert' the list into a string. CASE 3 can show
> me "中文". I don't know what is wrong with CASE 2 and what is right with
> CASE 3.
>
> Without knowing why, I have just hard coded my python application
> following CASE 3 for displaying Chinese characters from a list in a
> dictionary in my web application.
>
> Any ideas?
>
See below each case...新年快乐!
> Happy a New Year: 2009
>
> ouyang
>
>
>
> CASE 1:
> ########################################################
> f=open('test.html','wt')
> f.write('''<html><head>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> <title>test</title>
> <script language=javascript>
> var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
> \xc4']
> alert(test[0])
> alert(test[1])
> alert(test[2])
> </script>
> </head>
> <body></body></html>''')
> f.close()
In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the
correct gb2312 encoding for 中文.
> CASE 2:
> #######################################################
> mydict = {}
> mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
> \xc4']
> f_str = '''<html><head>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> <title>test</title>
> <script language=javascript>
> var test = %(JUNK)s
> alert(test[0])
> alert(test[1])
> alert(test[2])
> </script>
> </head>
> <body></body></html>'''
>
> f_str = f_str%mydict
> f=open('test02.html','wt')
> f.write(f_str)
> f.close()
In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file,
which is NOT the correct gb2312 encoding for 中文, and will be interpreted
however javascript pleases. This is because the str() representation of
mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4',
'\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']".
> CASE 3:
> ###################################################
> mydict = {}
> mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
> \xc4']
>
> f_str = '''<html><head>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> <title>test</title>
> <script language=javascript>
> var test = %(JUNK)s
> alert(test[0])
> alert(test[1])
> alert(test[2])
> </script>
> </head>
> <body></body></html>'''
>
> import string
>
> def conv_list2str(value):
> list_len = len(value)
> list_str = "["
> for ii in range(list_len):
> list_str += '"'+string.strip(str(value[ii])) + '"'
> if ii != list_len-1:
> list_str += ","
> list_str += "]"
> return list_str
>
> mydict['JUNK'] = conv_list2str(mydict['JUNK'])
>
> f_str = f_str%mydict
> f=open('test03.html','wt')
> f.write(f_str)
> f.close()
CASE 3 works because you build your own, correct, gb2312 representation of
mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for 中文).
That said, learn to use Unicode strings by trying the following program, but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the encoding
used to write the file *do* have to match.
# coding: utf8
import codecs
mydict = {}
mydict['JUNK'] = [u'中文',u'中文',u'中文']
def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'
f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''
s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()
-Mark
P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.
More information about the Python-list
mailing list