lxml empty versus self closed tag
Robin Becker
robin at reportlab.com
Thu Mar 3 04:21:42 EST 2022
On 02/03/2022 18:39, Dieter Maurer wrote:
> Robin Becker wrote at 2022-3-2 15:32 +0000:
>> I'm using lxml.etree.XMLParser and would like to distinguish
>>
>> <tag></tag>
>>
>> from
>>
>> <tag/>
>>
>> I seem to have e.getchildren()==[] and e.text==None for both cases. Is there a way to get the first to have e.text==''
>
> I do not think so (at least not without a DTD):
I have a DTD which has
<!ELEMENT tag (content)*>
so I guess the empty case is allowed as well as the self closed.
I am converting from an older parser which has text=='' for <tag></tag> and text==None for the self closed version. I
don't think I really need to make the distinction. However, I wonder how lxml can present an empty string content
deliberately or if that always has to be a semantic decision.
> `<t
ag/>' is just a shorthand notation for '<tag></tag>' and
> the difference has no influence on the DOM.
>
> Note that `lxml` is just a Python binding for `libxml2`.
> All the parsing is done by this library.
yes I think I knew that
More information about the Python-list
mailing list