[XML-SIG] problems with encoding and SAX
Daniel Clerc
clerc at uni-bremen.de
Thu Feb 9 14:09:23 CET 2006
Hi everybody!
I have some trouble with SAX and encondings...
When I try to parse the following XML-code:
<?xml version="1.0" encoding="WINDOWS-1252" ?>
</TRANSACTION>
<TRANSACTION TIME="03.04.2003 01:52:15" TIME_CODED="37714.0779513889"
DURATION="1001">
<QUESTION>K'R®</QUESTION>
^^^^^^^^^^^^^^
...
I get this error message.
<SNIPP>
self._err_handler.fatalError(exc)
File "C:\Python24\Lib\site-packages\_xmlplus\sax\handler.py", line
38, in fatalError
raise exception
SAXParseException: xml_temp.xml:3766:13: not well-formed (invalid token)
...
Here you can find the python-code I use:
http://knopaste.de/index.php?module=hilight&id=142
Maybe the encoding of the content between the xml-elements is
mismatching from the encoding specified. As I have to parse quite a
lot of log files (~1GB zipped), and there are only a handful of such
errors I would be very happy when I could find a way to tell sax just
not to worry and write the string anyway.
Parsing the xml-code with the MS-XML-DOM, or a JAVA-based parser is
not a problem, but I would prefer a solution in Python.
Thanks,
Daniel
More information about the XML-SIG
mailing list