[XML-SIG] problems with encoding and SAX
clerc at uni-bremen.de
Thu Feb 9 14:09:23 CET 2006
I have some trouble with SAX and encondings...
When I try to parse the following XML-code:
<?xml version="1.0" encoding="WINDOWS-1252" ?>
<TRANSACTION TIME="03.04.2003 01:52:15" TIME_CODED="37714.0779513889"
I get this error message.
File "C:\Python24\Lib\site-packages\_xmlplus\sax\handler.py", line
38, in fatalError
SAXParseException: xml_temp.xml:3766:13: not well-formed (invalid token)
Here you can find the python-code I use:
Maybe the encoding of the content between the xml-elements is
mismatching from the encoding specified. As I have to parse quite a
lot of log files (~1GB zipped), and there are only a handful of such
errors I would be very happy when I could find a way to tell sax just
not to worry and write the string anyway.
Parsing the xml-code with the MS-XML-DOM, or a JAVA-based parser is
not a problem, but I would prefer a solution in Python.
More information about the XML-SIG