[lxml-dev] should _setElementValue add type attributes?

Hi, I discussed this with Stefan before and I'm anxious to know if this is the way to go (maybe as switchable behaviour), removing the need for a beast like the discussed PT() factory, as well as making type behaviour arguably more "straightforward", at the cost of auto-adding py:pytype attributes: # _setElementValue implementation that auto-adds type(RVAL).__name__ as # py:pytype cdef _setElementValue(_Element element, value): if value is None: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif isinstance(value, _Element): _replaceElement(element, value) else: cetree.delAttributeFromNsName( element._c_node, _XML_SCHEMA_INSTANCE_NS, "nil") if not python._isString(value): pytype_name = type(value).__name__ if isinstance(value, bool): value = _lower_bool(value) else: value = str(value) else: pytype_name = "str" cetree.setAttributeValue(element, PYTYPE_ATTRIBUTE, pytype_name) cetree.setNodeText(element._c_node, value) I'm +1 for that. By making it switchable we could cater for those who don't care about the types that much but who do not want to see any non-explicitly created attributes. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

jholg@gmx.de wrote:
Actually, you were the one who proposed it in the first place, so there's nothing to add to. :)
I dislike the idea of adding a switch here. We already add pytype attributes in a couple of places, so people who do not like it will have to deannotate() their XML anyway (or not use objectify...). I think that always adding a pytype will give us more predictable behaviour. On the other hand, we could just check if the pytype the type inference mechanism returns is the type of the value, and only add the attribute if that is not the case. What do you think? It would not work if you exchange annotated data with other machines that use different setups, but if you do that, you'd probably annotate everything by hand anyway. Stefan

Hi,
Yes, but I admit I was unsure then if this muddies the API by making
Kind of losing sort of a symmetry. But then again, we actually *do* have more information in the first case, namely the python type, so we should use it. Now I think that practicality beats purity here.
Right, there's also TREE attributes and stuff.
I'd rather always add the pytype, then. I just think this is simpler. And if you want to exchange data with other machines, better xsiannotate() to fall back to XML standard types, or deannotate() and rely on type inference. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

jholg@gmx.de wrote:
I can't see much of a symmetry there anyway. I'm more concerned about putting in "3" and getting back the number 3, than putting in "<value>3</value>" and getting back a number. The latter sounds natural to me.
Sure. So be it. :) (... for lxml 2.0, that is) Stefan

jholg@gmx.de wrote:
What bothers me more (and where I do see a symmetry) is:
root = objectify.fromstring("<root><flag>true</flag></root>")
I'm not sure what to think about that. It would be wrong to special case it, but it kinda feels wrong the way it would work in the future... Stefan

Hi Stefan,
Hm, not for me (any more :). I think this is just the same case as having a literal 3 in the XML document. When parsing XML from a string or a file with no type information whatsoever, there is really only 2 things we can do: 1. Make strings of everything. 2. Use type-inference provided by the lookup mechanisms. (1) does not make much sense as we would not really need objectify at all (except for the syntactic sugar of its __setattr__-API). On the other hand, when setting elements by hand, i.e. in Python code, we well know the (python-)type information: For me, it begins to rather feel more natural to do:
instead of
which is, in the end, pretty much the same as
So, let's go for the auto-pytype-addition in _setElementValue, without special-casing, imo. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi Holger, jholg@gmx.de wrote:
Fine, no special casing here. One more thing, though: we shouldn't store Python type hints that were not registered as their instantiation wouldn't work anyway. So I added a lookup before the attribute setter call. So, the new rules are: - what you put in comes back out (as long as the type is registered) - for non-annotated XML data, type inference is used to determine the return type (which may be ambiguous in some cases). Simple enough, I'd say. Stefan

jholg@gmx.de wrote:
Actually, you were the one who proposed it in the first place, so there's nothing to add to. :)
I dislike the idea of adding a switch here. We already add pytype attributes in a couple of places, so people who do not like it will have to deannotate() their XML anyway (or not use objectify...). I think that always adding a pytype will give us more predictable behaviour. On the other hand, we could just check if the pytype the type inference mechanism returns is the type of the value, and only add the attribute if that is not the case. What do you think? It would not work if you exchange annotated data with other machines that use different setups, but if you do that, you'd probably annotate everything by hand anyway. Stefan

Hi,
Yes, but I admit I was unsure then if this muddies the API by making
Kind of losing sort of a symmetry. But then again, we actually *do* have more information in the first case, namely the python type, so we should use it. Now I think that practicality beats purity here.
Right, there's also TREE attributes and stuff.
I'd rather always add the pytype, then. I just think this is simpler. And if you want to exchange data with other machines, better xsiannotate() to fall back to XML standard types, or deannotate() and rely on type inference. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

jholg@gmx.de wrote:
I can't see much of a symmetry there anyway. I'm more concerned about putting in "3" and getting back the number 3, than putting in "<value>3</value>" and getting back a number. The latter sounds natural to me.
Sure. So be it. :) (... for lxml 2.0, that is) Stefan

jholg@gmx.de wrote:
What bothers me more (and where I do see a symmetry) is:
root = objectify.fromstring("<root><flag>true</flag></root>")
I'm not sure what to think about that. It would be wrong to special case it, but it kinda feels wrong the way it would work in the future... Stefan

Hi Stefan,
Hm, not for me (any more :). I think this is just the same case as having a literal 3 in the XML document. When parsing XML from a string or a file with no type information whatsoever, there is really only 2 things we can do: 1. Make strings of everything. 2. Use type-inference provided by the lookup mechanisms. (1) does not make much sense as we would not really need objectify at all (except for the syntactic sugar of its __setattr__-API). On the other hand, when setting elements by hand, i.e. in Python code, we well know the (python-)type information: For me, it begins to rather feel more natural to do:
instead of
which is, in the end, pretty much the same as
So, let's go for the auto-pytype-addition in _setElementValue, without special-casing, imo. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi Holger, jholg@gmx.de wrote:
Fine, no special casing here. One more thing, though: we shouldn't store Python type hints that were not registered as their instantiation wouldn't work anyway. So I added a lookup before the attribute setter call. So, the new rules are: - what you put in comes back out (as long as the type is registered) - for non-annotated XML data, type inference is used to determine the return type (which may be ambiguous in some cases). Simple enough, I'd say. Stefan
participants (2)
-
jholg@gmx.de
-
Stefan Behnel