[XML-SIG] how to get xml parse with non ascii charset
pita
jyllyj at gcom-cn.com
Tue Jul 1 14:17:45 EDT 2003
i had installed the CJKCodec package correctly.
in python shell
>>> import codecs
>>> codecs.lookup('gb2312')
(<built-in method encode of MultibyteCodec object at 0x0090E0E0>, <built-in
method decode of MultibyteCodec object at 0x0090E0E0>, <class
cjkcodecs.gb2312.StreamReader at 0x01239E40>, <class
cjkcodecs.gb2312.StreamWriter at 0x01239E10>)
>>>
>>> t='´ó¼ÒºÃ' #some gb2312 char
>>> ut=t.decode('gb2312')
>>> print ut
´ó¼ÒºÃ #looks fine in my computer
>>> utf8str=ut.encode(utf-8)
>>>
but i can't use the gb2312 in xml document
i.e:
str_gb2312="""<?xml version="1.0" encoding="gb2312"?>
<gb2312_root>
<child1>
this is a children.
</child1>
<child2>
this is other children.
</child2>
</gb2312_root>
"""
print 'parse str with encoding gb2312'
dom=minidom.parseString(str_gb2312)
error here:
Traceback (most recent call last):
File "E:\temp\xmlparse_gb2312.py", line 50, in ?
dom=minidom.parseString(str_gb2312)
File "C:\Python23\lib\xml\dom\minidom.py", line 1925, in parseString
return expatbuilder.parseString(string)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: unknown encoding: line 1, column 30
>>>You need to download and install the CJKCodec package I mentioned
>>>in an earlier mail on this list:
>>> http://sourceforge.net/project/showfiles.php?group_id=46747
>>>For more info:
>>> http://mail.python.org/pipermail/i18n-sig/2003-June/001586.html
More information about the XML-SIG
mailing list