[XML-SIG] Parsing the XML file which has encoding 'gb2312' .
Mike Brown
mike at skew.org
Sat Dec 13 08:14:13 EST 2003
Xinzhi Zhao wrote:
> Hi,
> My XML files have to use other encoding instead of the default one, i.e.
> 'gb2312'. When I was parsing my XML files by dint of DOM or SAX , some
> errors occurred. The Python xml packages can't do it now? Is there any way
> can finish my job? How shall I do it? Please help me.
Limitations of the underlying parser, Expat, prevent certain encodings from
being supported without an additional layer of code. GB2312 is among them.
I think you will have to transcode your document to one of the encodings that
is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or
US-ASCII; you probably want UTF-8 or UTF-16), and then either rewrite the
encoding declaration in the XML, or find a way to make the declaration
externally. Expat does support external declaration of encoding, but I don't
know offhand how to do it from Python.
More information about the XML-SIG
mailing list