"Full" element tag listing possible with Elementtree?

Stefan Behnel stefan_ml at behnel.de
Fri Sep 5 06:15:37 EDT 2008


jaime.dyson at gmail.com wrote:
> I have the unenviable task of turning about 20K strangely formatted
> XML documents from different sources into something resembling a
> clean, standard, uniform format.  I like Elementtree and have been
> using it to step through the documents to get a feel for their
> structure.  .getiterator() gives me a depth-first traversal that
> eliminates the hierarchy of the elements.  What I'd like is to be able
> to traverse elements while keeping track of ancestors, and print out
> the full structure of all of an ancestor's nodes as I arrive at each
> node.

Try lxml.etree. It's an extended re-implementation of ElementTree based on
libxml2. Amongst tons of other features, it provides its Elements with a
getparent() method and allows you to iterate over their ancestors (and other
XPath axes), or to iterate over a parsed document in an iterparse-like fashion
(called iterwalk).

http://codespeak.net/lxml/

Stefan



More information about the Python-list mailing list