Why is .text_content() only in HTML?

Frederik Elwert, 24.06.2014 11:20:
I am using lxml to parse TEI, an XML format for marking up historical documents and things like that. In many cases, I want to get the text content of an element. lxml.html has the method .text_content() for this, which seems to be ideal for this case. Why is it not in lxml.etree? Seems to make sense for a lot of XML formats besides HTML. Would it be possible to add it to the normal Element class?
The usual way to do this is with tostring(), e.g. if you want a Unicode string for further processing: etree.tostring(element, method='text', encoding='unicode') Wrap it in a nicely named function and you're done. Alternatively, you can use an XPath expression, e.g. "string()" or "normalize-text()" and compile it into an XPath() callable to get the same thing. I generally consider a function better than a method here. Stefan

Am 24.06.2014 18:45, schrieb Stefan Behnel:
Frederik Elwert, 24.06.2014 11:20:
I am using lxml to parse TEI, an XML format for marking up historical documents and things like that. In many cases, I want to get the text content of an element. lxml.html has the method .text_content() for this, which seems to be ideal for this case. Why is it not in lxml.etree? Seems to make sense for a lot of XML formats besides HTML. Would it be possible to add it to the normal Element class?
The usual way to do this is with tostring(), e.g. if you want a Unicode string for further processing:
etree.tostring(element, method='text', encoding='unicode')
Wrap it in a nicely named function and you're done.
Thanks, I was unaware of the "text" method of tostring().
Alternatively, you can use an XPath expression, e.g. "string()" or "normalize-text()" and compile it into an XPath() callable to get the same thing.
Yes, I also thought about that.
I generally consider a function better than a method here.
I guess I don’t see much of a difference. The method in lxml.html just seemed so nice and intuitive that I looked for something similar. I am just teaching XML parsing with lxml, and ''.join(element.itertext()) wasn’t really easy to explain to beginners. Thanks again, Frederik
participants (2)
-
Frederik Elwert
-
Stefan Behnel