[lxml-dev] objectify feedback

Hi there, I just read through the objectify.txt documentation/doctest. Quite interesting and impressive stuff! One thing that worries me is that it does introduce quite a new API with different behavior than ElementTree in some fundamental ways. How close is the behavior of the new API to Amara? I'd be nice if we weren't inventing too much that's new here somehow.. Anyway, this is "are we inventing too many new wheels?" worry - one of the ideas of lxml is not to invent too many, though on the other hand we shouldn't stop people from building cool stuff on top of the core, which is what this is. The other thing that worries me from a more practical perspective is that this is, as far as I can see, controlled globally. The beginning of the document says "Don't mix!" and this sounds like sensible advice, but as far as I understand if you use objectify in your application you cannot use ElementTree anymore in the same application, unless you do a lot of registration and reset work. It'd be *very* nice if this were not so - create an objectified tree separately not affecting the way normal trees are created. This way, you could have one module of your application using objectify but another module still sticking to normal ElementTree. What to do when someone tries to mix bits of one tree with another? Perhaps there's an efficient way to compare baseclass between the two lxml objects that are combined, and bail out with some reasonably clear exception in case of illegal combinations. Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
I just read through the objectify.txt documentation/doctest. Quite interesting and impressive stuff!
Thanks. Wasn't even my idea (not alone, at least).
One thing that worries me is that it does introduce quite a new API with different behavior than ElementTree in some fundamental ways.
It is fundamentally different in some aspects (e.g. slicing works on siblings, not children), but I'm trying to keep it close enough to take as much advantage of the ET API as possible. What helps is that most parts of etree already work directly on the C tree, which does /not/ change its API, so there are few places that actually break.
There are some things to invent, some things to keep. For example, Amara has all sorts of functionality that ET already provides, so I leave that out as much as possible (attribute access, for example). Things like the attribute access and the behaviour of slicing/indexing are directly borrowed from Amara. Another thing is XSD type support. When you add an xsi:type attribute to your elements, objectify will pick it up and look for a corresponding Python data type. So it's even somewhat standards compliant here. :) In a way it's a new API with many ideas borrowed from a few good places.
It is. You can get things totally messed up by mixing elements from different APIs in the same tree. This will break some parts of the API in a non-obvious way. One prominent example is _elementpath, which traverses the tree level by level. Now think of one element iterating over its children, the other one yielding its siblings. Great. Its only OK as long as you can control which API you use where, but that can be hard enough to control already.
This can be done and I already started providing the infrastructure. Look at the lxml.elements.classlookup module (elements.txt). It allows you to change the way nodes are mapped to element classes. I managed to let it support lookup chains by now so that you can define fallbacks if the selected strategy does not find a suitable class. One of the lookup schemes delegates to the parsers, so when you set that one globally, each parser can have its own lookup mechanism (with a fallback to the default lookup). I will soon integrate the objectify class lookup into this framework, which should answer your question. :)
This way, you could have one module of your application using objectify but another module still sticking to normal ElementTree.
I will try to make sure in the docs that that is the main intention and that mixing elements from different sources is A Bad Idea.
Well, even worse, the problem will go away when element classes are garbage collected. Which can lead to nicely surprising effects like a function working in one run and failing in the next - without obvious changes and without an easily visible difference between the elements that were passed in (except for their type, that is). Even better, debugging then means that you have to figure out where the wrong element came from, or where the last reference to the element was stored that prevented garbage collection. Cool. Guess I'll have to make the respective warnings *very* clear in the docs ... Stefan

Hi Martijn, Martijn Faassen wrote:
I just read through the objectify.txt documentation/doctest. Quite interesting and impressive stuff!
Thanks. Wasn't even my idea (not alone, at least).
One thing that worries me is that it does introduce quite a new API with different behavior than ElementTree in some fundamental ways.
It is fundamentally different in some aspects (e.g. slicing works on siblings, not children), but I'm trying to keep it close enough to take as much advantage of the ET API as possible. What helps is that most parts of etree already work directly on the C tree, which does /not/ change its API, so there are few places that actually break.
There are some things to invent, some things to keep. For example, Amara has all sorts of functionality that ET already provides, so I leave that out as much as possible (attribute access, for example). Things like the attribute access and the behaviour of slicing/indexing are directly borrowed from Amara. Another thing is XSD type support. When you add an xsi:type attribute to your elements, objectify will pick it up and look for a corresponding Python data type. So it's even somewhat standards compliant here. :) In a way it's a new API with many ideas borrowed from a few good places.
It is. You can get things totally messed up by mixing elements from different APIs in the same tree. This will break some parts of the API in a non-obvious way. One prominent example is _elementpath, which traverses the tree level by level. Now think of one element iterating over its children, the other one yielding its siblings. Great. Its only OK as long as you can control which API you use where, but that can be hard enough to control already.
This can be done and I already started providing the infrastructure. Look at the lxml.elements.classlookup module (elements.txt). It allows you to change the way nodes are mapped to element classes. I managed to let it support lookup chains by now so that you can define fallbacks if the selected strategy does not find a suitable class. One of the lookup schemes delegates to the parsers, so when you set that one globally, each parser can have its own lookup mechanism (with a fallback to the default lookup). I will soon integrate the objectify class lookup into this framework, which should answer your question. :)
This way, you could have one module of your application using objectify but another module still sticking to normal ElementTree.
I will try to make sure in the docs that that is the main intention and that mixing elements from different sources is A Bad Idea.
Well, even worse, the problem will go away when element classes are garbage collected. Which can lead to nicely surprising effects like a function working in one run and failing in the next - without obvious changes and without an easily visible difference between the elements that were passed in (except for their type, that is). Even better, debugging then means that you have to figure out where the wrong element came from, or where the last reference to the element was stored that prevented garbage collection. Cool. Guess I'll have to make the respective warnings *very* clear in the docs ... Stefan
participants (2)
-
Martijn Faassen
-
Stefan Behnel