[XML-SIG] problems with encoding and SAX

Daniel Clerc clerc at uni-bremen.de
Thu Feb 9 14:09:23 CET 2006


Hi everybody!

I have some trouble with SAX and encondings...

When I try to parse the following XML-code:

<?xml version="1.0" encoding="WINDOWS-1252" ?>
</TRANSACTION>
<TRANSACTION TIME="03.04.2003 01:52:15" TIME_CODED="37714.0779513889"
DURATION="1001">
  <QUESTION>K'R&#174;</QUESTION>
                       ^^^^^^^^^^^^^^
...

I get this error message.

<SNIPP>
    self._err_handler.fatalError(exc)
  File "C:\Python24\Lib\site-packages\_xmlplus\sax\handler.py", line
38, in fatalError
    raise exception
SAXParseException: xml_temp.xml:3766:13: not well-formed (invalid token)
...

Here you can find the python-code I use:

   http://knopaste.de/index.php?module=hilight&id=142

Maybe the encoding of the content between the xml-elements is
mismatching from the encoding specified. As I have to parse quite a
lot of log files (~1GB zipped), and there are only a handful of such
errors I would be very happy when I could find a way to tell sax just
not to worry and write the string anyway.

Parsing the xml-code with the MS-XML-DOM, or a JAVA-based parser is
not a problem, but I would prefer a solution in Python.

Thanks,

Daniel


More information about the XML-SIG mailing list