hello,

I have a document with a format like this:
<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>

I want to extract 'text1text3text5' from <doc> but the text attribute returns just 'text1'. Here is an example:

from lxml import html
doc = html.fromstring('<doc>text1<b>text2</b>text3<b>text4</b>text5</doc>')
print doc.text # 'text1'
print doc.tail # ''
print doc.text_content() # 'text1text2text3text4text5'

for child in doc:
    child.drop_tree()
print doc.text # 'text1text3text5'


From the example you can see I can get what I want by first dropping the subelements.
Is there a better way to access this text?

regards,
Richard