[XML-SIG] Re: SAX encoding and special characters
fredrik at pythonware.com
Sat Apr 17 11:46:34 EDT 2004
> I'm playing with SAX with Python-2.3.3. My goal is to parse XML files
> (I don't want to generate them).
> My XML file starts with:
> <?xml version="1.0" encoding="iso-8859-2" ?>
> I would like to get the encoding before parsing (I would like to use
> it in ContentHandler class).
just curious, but why do you need the encoding to handle the content?
> My second problem/question is about special characters in XML.
> Sometimes I have spec. chars (with char code 0-31) in XML and the
> parser ends with:
> xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61:
> not well-formed (invalid token)
as the parser says, control characters are not allowed in XML files (except
for a few whitespace codes). if you really need to parse those files, you
to fix them up before passing them to the parser (you can simply read them
into a python string, delete all junk characters, and then use parseString
More information about the XML-SIG