
March 19, 2025
8:17 p.m.
Hi, tomi.belan--- via lxml - The Python XML Toolkit schrieb am 15.03.25 um 01:15:
I noticed that the text_content() method of lxml.html elements returns a _ElementUnicodeResult, i.e. a 'smart' string.
However, its getparent(), attrname are None, and is_tail, is_text, is_attribute are False. This is the case even if the element contains a single text node. The XPath "string()" used in text_content()'s implementation never returns an existing text node, but always a new string.
Wouldn't it make more sense for text_content() to return a normal str? E.g. by adding smart_strings=False to _collect_string_content.
Yes, that seems useless and unintended. I'll change it for lxml 6.0. Thanks for reporting this. Stefan