Python parsing XML file problem with SAX

Stefan Behnel stefan_ml at behnel.de
Wed Jul 28 06:48:54 EDT 2010


jia li, 28.07.2010 12:10:
> I have an XML file with hundreds of<error>  elements.
>
> What's strange is only one of there elements could not be parsed correctly:
> <error>
> <checker>REVERSE_INULL</checker>
> <function>Dispose_ParameterList</function>
> <unmangled_function>Dispose_ParameterList</unmangled_function>
> <status>UNINSPECTED</status>
> <num>146</num>
> <home>1/146MMSLib_LinkedList.c</home>
> </error>
>
> I printed the data in "characters(self, data)" and after parsing. The result
> is one "\r\n" is inserted after "1/" and "146MMSLib_LinkedList.c" for the
> latter.
>
> But if I make my XML file only this element left, it could parse correctly.

First of all: don't use SAX. Use ElementTree's iterparse() function. That 
will shrink you code down to a simple loop in a few lines.

Then, the problem is likely that you are getting separate events for text 
nodes. The "\r\n" most likely only occurs due to your print statement, I 
doubt that it's really in the data returned from SAX. Again: using 
ElementTree instead of SAX will avoid this kind of problem.

Stefan




More information about the Python-list mailing list