[lxml-dev] lxml 1.3 annotate() behaviour for empty string data elements
Hi, I just noticed that annotate() does not add type information to empty string elements when parsed:
root = etree.fromstring(""" ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ... xmlns:py="http://codespeak.net/lxml/objectify/pytype" ... xmlns:xsd="http://www.w3.org/2001/XMLSchema"> ... <s1>foobar</s1> ... <s2></s2> ... </root> ... """) objectify.annotate(root)
print etree.tostring(root, pretty_print=True)oot) <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <s1 py:pytype="str">foobar</s1> <s2/> </root>
Whereas type annotation happens when setting attributes manually:
root = objectify.Element("root") root.s1 = "foobar" root.s2 = "" objectify.annotate(root) print etree.tostring(root, pretty_print=True) <root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <s1 py:pytype="str">foobar</s1> <s2 py:pytype="str"></s2> </root>
I know this happens due to the .text of the node being None in the 1st case instead of '' in the second case (which is lxml/ElementTree/libxml2 behaviour that bites me once and again). Still, I'd prefer to have annotate() provide all data elements with type information; after all, the element in question is treated as a StringElement (the default emtpy_data_class) anyway. Objections? Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
Hi Holger, finally coming back to this. jholg@gmx.de wrote:
I just noticed that annotate() does not add type information to empty string elements when parsed: I know this happens due to the .text of the node being None in the 1st case instead of '' in the second case (which is lxml/ElementTree/libxml2 behaviour that bites me once and again). Still, I'd prefer to have annotate() provide all data elements with type information; after all, the element in question is treated as a StringElement (the default emtpy_data_class) anyway.
Well, since the default class is definable in the lookup class, I prefer having it definable in annotate(), too. So I would add an "empty_type" keyword argument where you can provide the Python type name as string. Question: should this default to None or to "str"? I have no real preference myself, though "str" would mean we annotate everything by default, so that sounds better. Stefan
participants (2)
-
jholg@gmx.de -
Stefan Behnel