Hi Fredrik, thanks for the clarification. Fredrik Lundh wrote:
Not sure - that you can get None back from findtext when the element is there looks like an accidental change when the ElementPath engine was rewritten. I think I'll consider that a bug in findtext.
I thought so, too.
As for distinguishing between <element/> and <element></element>
That's not what I meant, although that actually is the result when you serialise with or without an empty string value. A parsed empty element will always have its .text set to None in lxml.etree, regardless of the way the parser saw it. I rather meant the difference between users setting el.text = None and el.text = '' in the code. In the second case, lxml.etree creates a text node with an empty string in the underlying libxml2 tree. That way, it can return the expected result on later requests. This is actually compatible with ET, which (obviously) also remembers what the user set as value. You can think of the above as an emulation of the ET behaviour, but also as a way to prevent surprised faces on user side when you see el.text = '' for i in range(10: el.text += 'xyz' fail mysteriously.
the ET specification allows an implementation to use either None or an empty string for the text and tail attributes in either case to simplify the tree building. However, an application shouldn't abuse this - an XML producer should be free to use either form to indicate an empty element, and application code should use "truth testing" when necessary, when inspecting the text/tail attributes of a given element.
I fully agree.
And I think findtext should be reverted to the 1.2 behaviour - just add an <or ""> to the suitable place in ElementPath, and leave the rest as is.
That's what I did for lxml 2.2. It just makes findtext() simpler to use. Stefan