ElementTree, how to get the whole content of a tag

Fredrik Lundh fredrik at pythonware.com
Thu Mar 17 11:46:29 CET 2005

Damjan wrote:

>>> Is there any way I could get everything between the <div> and </div> tag?
>>> <div>
>>>  text
>>>  some other text<br/>
>>>  and then some more
>>> </div>
>>>>> gettext(et)
>> '\n  text\n  some other text\n  and then some more\n'
> I acctually need to get
> '\n  text\n  some other text<br/>\n  and then some more\n'

that's not the tree content, that's a serialized XML fragment.

the quickest way to do that is to serialize the entire element, and
strip off the start and end tags:

text = ElementTree.tostring(elem)
text = text.split(">", 1)[1].rsplit("<", 1)[0]

alternatively, you can serialize the subelements, and add in properly
encoded text and tail attributes:

def innersource(elem, encoding="ascii"):
    text = ElementTree._encode(elem.text or "", encoding)
    for subelem in elem:
        text = text + ElementTree.tostring(subelem)
        if subelem.tail:
            text = text + ElementTree._encode(subelem.tail, encoding)
    return text

(but _encode is not an official part of the elementtree API, so this code
may not work in post-1.2 releases)


More information about the Python-list mailing list