[XML-SIG] how to get xml parse with non ascii charset

pita jyllyj at gcom-cn.com
Tue Jul 1 14:13:45 EDT 2003


i had installed the CJKCodec package correctly.
in python shell
>>> import codecs
>>> codecs.lookup('gb2312')
(<built-in method encode of MultibyteCodec object at 0x0090E0E0>, <built-in
method decode of MultibyteCodec object at 0x0090E0E0>, <class
cjkcodecs.gb2312.StreamReader at 0x01239E40>, <class
cjkcodecs.gb2312.StreamWriter at 0x01239E10>)
>>>
>>> t='´ó¼ÒºÃ' #some gb2312 char
>>> ut=t.decode('gb2312')
>>> print ut
´ó¼ÒºÃ #looks fine in my computer
>>> utf8str=ut.encode(utf-8)
>>>

but i can't use the gb2312 in xml document
i.e:
str_gb2312="""<?xml version="1.0" encoding="gb2312"?>
<gb2312_root>
 <child1>
  this is a children.
 </child1>
 <child2>
  this is other children.
 </child2>
</gb2312_root>
"""
print 'parse str with encoding gb2312'
dom=minidom.parseString(str_gb2312)


error here:
Traceback (most recent call last):
  File "E:\temp\xmlparse_gb2312.py", line 50, in ?
    dom=minidom.parseString(str_gb2312)
  File "C:\Python23\lib\xml\dom\minidom.py", line 1925, in parseString
    return expatbuilder.parseString(string)
  File "C:\Python23\lib\xml\dom\expatbuilder.py", line 940, in parseString
    return builder.parseString(string)
  File "C:\Python23\lib\xml\dom\expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: unknown encoding: line 1, column 30


>>>You need to download and install the CJKCodec package I mentioned
>>>in an earlier mail on this list:

>>>    http://sourceforge.net/project/showfiles.php?group_id=46747

>>>For more info:

>>>    http://mail.python.org/pipermail/i18n-sig/2003-June/001586.html







More information about the XML-SIG mailing list