[XML-SIG] how to get xml parse with non ascii charset
Dieter Maurer
dieter at handshake.de
Fri Jul 4 21:53:27 EDT 2003
pita wrote at 2003-7-1 13:13 +0800:
> ...
> str_gb2312="""<?xml version="1.0" encoding="gb2312"?>
> <gb2312_root>
> <child1>
> this is a children.
> </child1>
> <child2>
> this is other children.
> </child2>
> </gb2312_root>
> """
> print 'parse str with encoding gb2312'
> dom=minidom.parseString(str_gb2312)
> ...
> File "C:\Python23\lib\xml\dom\expatbuilder.py", line 223, in parseString
> parser.Parse(string, True)
> xml.parsers.expat.ExpatError: unknown encoding: line 1, column 30
XML processing systems (such as the parser "Expat") are only required
to support UTF-8 (and maybe UTF-16) encodings. All other
encodings are optional.
Apparently, "Expat" does not support "gb2312".
A work around would be to convert your string first to unicode
and then to UTF-8.
Dieter
More information about the XML-SIG
mailing list