ElementTree oddities

Mon Sep 15 11:44:28 EDT 2008

I'm not sure, but I think your document is not well formated...

Anyone as the name of the module you must think about XML, not as a
flat doc, but as a tree that's the only way I got to parse XML.

Brian Cole a écrit :
> I'm trying to extract the text from some xml. I figured this
> convenient python two-liner would do it for me:
> >>> from xml.etree.ElementTree import *
> >>> from cStringIO import StringIO
> >>> root = parse(StringIO(xml)).getroot()
> >>> ' '.join([n.text for n in root.getiterator() if n.text is not None])
>
> However, it's missing some of the text. For example, the following
> XML:
> >>> xml = "<highlight><sp />Bar</highlight>"
>
> Returns me a empty string. Seems the "<sp />" tag is borking it.
>
>
> Also, the for the following XML:
> >>> xml = "<highlight><ref>Bar</ref>:</highlight>"
>
> I only get "Bar". It's missing the trailing colon.
>
> I'm not that experienced with XML so perhaps I am just missing
> something here. Please enlighten me.
>
> Thanks,
> Brian