[Tutor] parsing XML

Stefan Behnel stefan_ml at behnel.de
Tue Nov 10 11:19:26 CET 2009


Alan Gauld, 10.11.2009 06:53:
> "Christopher Spears" <cspears2002 at yahoo.com> wrote
>> I need to parse several XML documents into a Python dictionary.  Is
>> there a module that would be particularly good for this?  I heard
>> beginners should start with ElementTree.  However, SAX seems to make a
>> little more sense to me.  

Note that ElementTree provides both a SAX-like interface (look for the
'target' property of parsers) and an incremental parser (iterparse). So the
question is not "ElementTree or SAX?", it's more like "how much time do I
have to implement, run and maintain the code?".


> XML parsers fall into 2 groups. Those that parse the whole structure and
> create a tree of objects - usually accessed like a dictionary, and those
> that parse line by line looking for patterns.

Except that parsing XML is not about lines but about bytes in a stream.


> The former approach is usually slightly slower and more resource hungry

I'd better leave the judgement about this statement to a benchmark.


> If SAX makes sense for you and meets your needs go with it.

I'd change this to:

Unless you really know what you are doing and you have proven in benchmarks
that SAX is substantially faster for the problem at hand, don't use SAX.

Stefan



More information about the Tutor mailing list