[XML-SIG] how to get xml parse with non ascii charset

Dieter Maurer dieter at handshake.de
Fri Jul 4 21:53:27 EDT 2003


pita wrote at 2003-7-1 13:13 +0800:
 > ...
 > str_gb2312="""<?xml version="1.0" encoding="gb2312"?>
 > <gb2312_root>
 >  <child1>
 >   this is a children.
 >  </child1>
 >  <child2>
 >   this is other children.
 >  </child2>
 > </gb2312_root>
 > """
 > print 'parse str with encoding gb2312'
 > dom=minidom.parseString(str_gb2312)
 > ...
 >   File "C:\Python23\lib\xml\dom\expatbuilder.py", line 223, in parseString
 >     parser.Parse(string, True)
 > xml.parsers.expat.ExpatError: unknown encoding: line 1, column 30

XML processing systems (such as the parser "Expat") are only required
to support UTF-8 (and maybe UTF-16) encodings. All other
encodings are optional.

Apparently, "Expat" does not support "gb2312".

A work around would be to convert your string first to unicode
and then to UTF-8.


Dieter



More information about the XML-SIG mailing list