coleb2 at gmail.com
Mon Sep 15 17:22:29 CEST 2008
I'm trying to extract the text from some xml. I figured this
convenient python two-liner would do it for me:
>>> from xml.etree.ElementTree import *
>>> from cStringIO import StringIO
>>> root = parse(StringIO(xml)).getroot()
>>> ' '.join([n.text for n in root.getiterator() if n.text is not None])
However, it's missing some of the text. For example, the following
>>> xml = "<highlight><sp />Bar</highlight>"
Returns me a empty string. Seems the "<sp />" tag is borking it.
Also, the for the following XML:
>>> xml = "<highlight><ref>Bar</ref>:</highlight>"
I only get "Bar". It's missing the trailing colon.
I'm not that experienced with XML so perhaps I am just missing
something here. Please enlighten me.
More information about the Python-list