Re: [lxml-dev] Behaviour change in findtext

21 Feb 2009

      Hi Fredrik,

thanks for the clarification.

Fredrik Lundh wrote:
...
Not sure - that you can get None back from findtext when the element
is there looks like an accidental change when the ElementPath engine
was rewritten.  I think I'll consider that a bug in findtext.
I thought so, too.
...
As for distinguishing between <element/> and <element></element>
That's not what I meant, although that actually is the result when you
serialise with or without an empty string value. A parsed empty element
will always have its .text set to None in lxml.etree, regardless of the way
the parser saw it. I rather meant the difference between users setting

    el.text = None

and

    el.text = ''

in the code. In the second case, lxml.etree creates a text node with an
empty string in the underlying libxml2 tree. That way, it can return the
expected result on later requests. This is actually compatible with ET,
which (obviously) also remembers what the user set as value. You can think
of the above as an emulation of the ET behaviour, but also as a way to
prevent surprised faces on user side when you see

    el.text = ''
    for i in range(10:
        el.text += 'xyz'

fail mysteriously.
...
the ET specification allows an implementation to use either
None or an empty string for the text and tail attributes in either
case to simplify the tree building.  However, an application shouldn't
abuse this - an XML producer should be free to use either form to
indicate an empty element, and application code should use "truth
testing" when necessary, when inspecting the text/tail attributes of a
given element.
I fully agree.
...
And I think findtext should be reverted to the 1.2
behaviour - just add an <or ""> to the suitable place in ElementPath,
and leave the rest as is.
That's what I did for lxml 2.2. It just makes findtext() simpler to use.

Stefan

Re: [lxml-dev] Behaviour change in findtext

Stefan Behnel