[lxml-dev] [objectify] Typed E-factory for objectify, PT DataElement()-wrapper

Hi, attached patch (against trunk) * adds a typed E-factory (called T-factory) * inserts NoneType into the E-factory/T-factory typemap * adds the PT() (="PyTyped(<arg>)) convenience function that is a thin wrapper uses the argument value's type to set the pytype * provides unittests for E-factory, T-factory and PT() * fixes DataElement() to care for some previously-unhandled corner cases concerning None and/or _pytype "none" Despite of what I previously said ;-) I now think it would be better to rename "none" to "NoneType", to use the same name as the Python builtin original. While it is a longer name I seriously doubt you need to actually use it explicitly very often. By convention, the PyType name should match the Python builtin type name; then both the T-factory and the PT() function can work smoothly (the only thing special-cased is the Python type name "unicode" with gets substituted by "str"). Therefore, the patch also changes "none" to "NoneType" in objectify and the objectify tests/doctests. I'd really like to see the PT() function go into the 1.3 series, too. Please take a look, I can come up with some documentation if you like it. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hi, due to (seemingly) pressing needs of my users, I propose a change of ObjectifiedElement '.'-operator subelement-setting (the _setElementValue function, to be exact) behaviour, possibly configurable as a module setup. The proposed change is to auto-add the python type() of the RVAL of an assignment as a py:pytype attribute. This is much what you would use the proposed-below PT convenience function for, with current behaviour. Current behaviour:
root = objectify.Element("root") root.s = "0003" print objectify.dump(root) root = None [ObjectifiedElement] s = 3 [IntElement]
Proposed behaviour (switchable):
root = objectify.Element("root") root.s = "0003" print objectify.dump(root) root = None [ObjectifiedElement] s = '0003' [StringElement] * py:pytype = 'str'
I am well aware that this * auto-adds an attribute to an Element, where you now need to tell this explicitly, e.g. by using DataElement() * in a sense maybe means some loss of symmetry, considering adding content throught the objectify API vs. parsing from an XML file or string However, my users just can't seem to grasp the notion of "assignment only cares about RVAL literals, and type-lookup happens on element-access by type-guessing". Especially, they seem to have trouble with situations like this:
root = objectify.Element("root") root.comment = "this is my 500. comment" print objectify.dump(root) root = None [ObjectifiedElement] comment = 'this is my 500. comment' [StringElement] root.comment = root.comment.pyval[11:14] print objectify.dump(root) root = None [ObjectifiedElement] comment = 500 [IntElement]
where you cut some parts out of a string and might then get this presented as an IntElement, due to the int-able literal. In addition, I still propose what I posted before ;-): Betreff: [lxml-dev] [objectify] Typed E-factory for objectify, PT DataElement()-wrapper
Hi,
attached patch (against trunk)
* adds a typed E-factory (called T-factory) * inserts NoneType into the E-factory/T-factory typemap * adds the PT() (="PyTyped(<arg>)) convenience function that is a thin wrapper uses the argument value's type to set the pytype * provides unittests for E-factory, T-factory and PT() * fixes DataElement() to care for some previously-unhandled corner cases concerning None and/or _pytype "none"
Despite of what I previously said ;-) I now think it would be better to rename "none" to "NoneType", to use the same name as the Python builtin original. While it is a longer name I seriously doubt you need to actually use it explicitly very often. By convention, the PyType name should match the Python builtin type name; then both the T-factory and the PT() function can work smoothly (the only thing special-cased is the Python type name "unicode" with gets substituted by "str").
Therefore, the patch also changes "none" to "NoneType" in objectify and the objectify tests/doctests.
I'd really like to see the PT() function go into the 1.3 series, too.
Please take a look, I can come up with some documentation if you like it.
Holger
Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

jholg@gmx.de wrote:
due to (seemingly) pressing needs of my users, I propose a change of ObjectifiedElement '.'-operator subelement-setting (the _setElementValue function, to be exact) behaviour, possibly configurable as a module setup. The proposed change is to auto-add the python type() of the RVAL of an assignment as a py:pytype attribute. This is much what you would use the proposed-below PT convenience function for, with current behaviour.
Current behaviour:
root = objectify.Element("root") root.s = "0003" print objectify.dump(root) root = None [ObjectifiedElement] s = 3 [IntElement]
Proposed behaviour (switchable):
root = objectify.Element("root") root.s = "0003" print objectify.dump(root) root = None [ObjectifiedElement] s = '0003' [StringElement] * py:pytype = 'str'
Hmmm, this makes sense at first sight, but I'll have to think this through to figure out the implications. I'm not all together happy with the attribute type business today, as it keeps people from generating 'clean' XML. Ok, you can run deannotate() on trees before you serialise, but that might mean that objectify could behave differently the next time you parse it. So it's somewhat quirky either way: live with the artifacts or live with surprises. Sounds like the first is a lot better, though. :)
I am well aware that this * auto-adds an attribute to an Element, where you now need to tell this explicitly, e.g. by using DataElement()
We already do that in a couple of places now, so it wouldn't add much ugliness. It would even make the type behaviour less surprising - you get out what you put in. I agree that this actually helps users. We're talking about 2.0 behaviour here, though.
In addition, I still propose what I posted before ;-):
Right, I'll look at that also. We really need a bug tracker for lxml... Stefan

Hi,
I'm not all together happy with the attribute type business today, as it keeps people from generating 'clean' XML. Ok, you can run deannotate() on trees before you serialise, but that might mean that objectify could behave differently the next time you parse it. So it's somewhat quirky either way: live with the artifacts or live with surprises. Sounds like the first is a lot better, though. :)
If we make the behaviour I proposed switchable, then maybe this switch should also affect the auto-generation of py:pytype="TREE" in the objectify.Element() factory. That way the user can decide what you suggested: -no artefacts, build up clean XML, at the cost of type-"uncertainty" -with artefacts, you get py:pytype attributes everywhere and can rely on "stable types" So one could basically use the objectify API in 2 ways, one being more or less an (arguably simpler) alternative to the etree API (if you need "clean" trees and are prepared to protect yourself against type confusion), the other being "fully type-annotated". I think objectify.Element() is the only place where "TREE" gets auto-generated. Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi Holger, sorry, I keep pushing non trivial decisions back into the FIFO when I first see them and the queue was pretty long this time. I already looked at your patch, but didn't get through it completely. You should really cut down the size of your patches... :) jholg@gmx.de wrote:
* adds a typed E-factory (called T-factory) * inserts NoneType into the E-factory/T-factory typemap * adds the PT() (="PyTyped(<arg>)) convenience function that is a thin wrapper uses the argument value's type to set the pytype * provides unittests for E-factory, T-factory and PT() * fixes DataElement() to care for some previously-unhandled corner cases concerning None and/or _pytype "none"
I'll take another look at the patch.
Despite of what I previously said ;-) I now think it would be better to rename "none" to "NoneType", to use the same name as the Python builtin original. While it is a longer name I seriously doubt you need to actually use it explicitly very often. By convention, the PyType name should match the Python builtin type name; then both the T-factory and the PT() function can work smoothly (the only thing special-cased is the Python type name "unicode" with gets substituted by "str").
Therefore, the patch also changes "none" to "NoneType" in objectify and the objectify tests/doctests.
This will break existing documents, though, if they do not additionally use xsi:nil. No idea how many there are... We could accept both names for the time being, though, and write out the new one in 2.0 and the old one in 1.3. Stefan

Hi,
patch, but didn't get through it completely. You should really cut down the size of your patches... :)
Just for the defence the biggest portion was the unittests ;-) Btw I now have an svn account (thanks, Philipp) so if I can help in a way that is easier for you to manage & quality-assure, just let me know. And I do have one other thing in the queue, namely adding a keep_tree option to the *annotate() functions, renaming annotatate() to pyannotate() and giving _annotate a public interface that is pretty backwards-compatible to the 1.x annotate().
Therefore, the patch also changes "none" to "NoneType" in objectify and the objectify tests/doctests.
This will break existing documents, though, if they do not additionally use xsi:nil. No idea how many there are...
We could accept both names for the time being, though, and write out the new one in 2.0 and the old one in 1.3.
Sounds good. I'll take a look at DataElement() to see where it should handle both names. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
participants (2)
-
jholg@gmx.de
-
Stefan Behnel