"Full" element tag listing possible with Elementtree?

jaime.dyson at gmail.com jaime.dyson at gmail.com
Fri Sep 5 02:27:55 EDT 2008


Hello all,

I have the unenviable task of turning about 20K strangely formatted
XML documents from different sources into something resembling a
clean, standard, uniform format.  I like Elementtree and have been
using it to step through the documents to get a feel for their
structure.  .getiterator() gives me a depth-first traversal that
eliminates the hierarchy of the elements.  What I'd like is to be able
to traverse elements while keeping track of ancestors, and print out
the full structure of all of an ancestor's nodes as I arrive at each
node.  So, for example, if I had a document that looked like this:

<a>
  <b att="atttag" content="b"> this is node b </b>
  <c> this is node c
    <d />
    <e> this is node e </e>
  </c>
  <f> this is node f </f>
</a>

I would want to print the following:

<a>
<a> <b>
<a> <b> text: this is node b
<a> <c>
<a> <c> text: this is node c
<a> <c> <d>
<a> <c> <e>
<a> <c> <e> text: this is node e
<a> <f>
<a> <f> this is node f


Is there a simple way to do this?  Any help would be appreciated.
Thanks..




More information about the Python-list mailing list