[lxml-dev] extended XPath function

Hi, before trying to code it I'd like to get opinions about the following. I'd like to add a global xpath function to lxml, that takes an element (the context) and an xpath expression (as string) and returns a list of tuples. Where each tuple has a) 3 parts, the attribute name, value and containing element b) 2 parts, the text and the element that has the text as text or tail c) 1 part, the element itself. I'd like to have this for XPathEvaluator (so I can highlight attribute and text nodes in the tree) and I'm willing to try and implement it in lxml. Andreas -- Your talents will be recognized and suitably rewarded.

Hi Andreas, Andreas Pakulat wrote:
before trying to code it I'd like to get opinions about the following.
I'd like to add a global xpath function to lxml, that takes an element (the context) and an xpath expression (as string) and returns a list of tuples. Where each tuple has
a) 3 parts, the attribute name, value and containing element b) 2 parts, the text and the element that has the text as text or tail c) 1 part, the element itself.
I'm not very comfortable with the idea of inferring the semantics of a tuple result from the number of its entries.
I'd like to have this for XPathEvaluator (so I can highlight attribute and text nodes in the tree) and I'm willing to try and implement it in lxml.
Hmmm, it feels like a rather specific requirement. What you really want is to always receive a reference to the libxml2 node that carries the result of an XPath expression. The problem is that this does not match very well with the rest of the API, where attributes 'do not exist', for example. I mean, your real problem is that you must evaluate arbitrary (opaque) XPath expressions and do not know what the non-element result of them means: Did a string come from an attribute or from element text? Which element did it come from? So you are interested in meta-data about the XPath result, not the result itself. I do not think many applications have this requirement. Stefan

On 28.06.06 16:42:00, Stefan Behnel wrote:
Hi Andreas,
Andreas Pakulat wrote:
before trying to code it I'd like to get opinions about the following.
I'd like to add a global xpath function to lxml, that takes an element (the context) and an xpath expression (as string) and returns a list of tuples. Where each tuple has
a) 3 parts, the attribute name, value and containing element b) 2 parts, the text and the element that has the text as text or tail c) 1 part, the element itself.
I'm not very comfortable with the idea of inferring the semantics of a tuple result from the number of its entries.
Hmm, right...
I'd like to have this for XPathEvaluator (so I can highlight attribute and text nodes in the tree) and I'm willing to try and implement it in lxml.
Hmmm, it feels like a rather specific requirement.
Right.
What you really want is to always receive a reference to the libxml2 node that carries the result of an XPath expression.
I think, yes.
The problem is that this does not match very well with the rest of the API, where attributes 'do not exist', for example. I mean, your real problem is that you must evaluate arbitrary (opaque) XPath expressions and do not know what the non-element result of them means: Did a string come from an attribute or from element text? Which element did it come from? So you are interested in meta-data about the XPath result, not the result itself.
Well, yes. I need to know "where" in the tree the result(s) of the xpath expression are. This is easy enough for elements as I can walk up the tree (or even better use their hash-value as key into a dictionary). However when I get strings back I'm out of options...
I do not think many applications have this requirement.
Possibly not. So then I'll do something better with my time :-) It's just that lxml looks like "the worst implementation wrt. to the application" in the list of PyXML, libxml2 and lxml and I would have liked to change that a bit... Andreas -- You will always have good luck in your personal affairs.

Hi Andreas, Andreas Pakulat wrote:
On 28.06.06 16:42:00, Stefan Behnel wrote:
Andreas Pakulat wrote:
I'd like to add a global xpath function to lxml, that takes an element (the context) and an xpath expression (as string) and returns a list of tuples. Where each tuple has
a) 3 parts, the attribute name, value and containing element b) 2 parts, the text and the element that has the text as text or tail c) 1 part, the element itself.
I'd like to have this for XPathEvaluator (so I can highlight attribute and text nodes in the tree) and I'm willing to try and implement it in lxml.
Hmmm, it feels like a rather specific requirement. What you really want is to always receive a reference to the libxml2 node that carries the result of an XPath expression. The problem is that this does not match very well with the rest of the API, where attributes 'do not exist', for example. I mean, your real problem is that you must evaluate arbitrary (opaque) XPath expressions and do not know what the non-element result of them means: Did a string come from an attribute or from element text? Which element did it come from? So you are interested in meta-data about the XPath result, not the result itself.
Well, yes. I need to know "where" in the tree the result(s) of the xpath expression are. This is easy enough for elements as I can walk up the tree (or even better use their hash-value as key into a dictionary). However when I get strings back I'm out of options...
Ok, I'm not completely opposed to the idea of providing more semantics for string results. After all, the API works with Elements, so there should be a way to point back to the Element source of a string inside the document. I'm just not sure how to make this fit into the current API. I could imagine something like returning a string subclass from XPath evaluations that has additional attributes "element" for the Element containing the string and "attribute" for the attribute name (or None for element text content). Still, I'm not sure how to deal with tail text or text generated by the XPath expression itself. Also, how would extension functions fit into this? There are many cases where you cannot determine where a string came from. Should we simply set the element/attribute attributes to None in that case? How much overhead is it to determine where the string came from? Lots of open questions ... Stefan

Stefan Behnel wrote: [snip]
Lots of open questions ...
Yes, lots of good questions. I mentioned XUpdate in my reply just now; it may make sense to take a look at how it tackles some of these issues (if it does). As far as I know it uses XPath to target bits of an XML document to change. So, Andreas, don't be discouraged but just let's have a discussion about how to get a solid kind of API for this stuff first, and see where we end up. If we can imagine something pretty, then Some XUpdate stuff, I don't know how recent or relevant: http://xmldb-org.sourceforge.net/xupdate/index.html browsing through it briefly it doesn't seem to really discuss these issues in detail. Still, XUpdate support for lxml would be cool. :) Regards, Martijn

On Wednesday 28 June 2006 15:48, Martijn Faassen wrote:
Stefan Behnel wrote: [snip]
Lots of open questions ...
Yes, lots of good questions. I mentioned XUpdate in my reply just now; it may make sense to take a look at how it tackles some of these issues (if it does). As far as I know it uses XPath to target bits of an XML document to change.
So, Andreas, don't be discouraged but just let's have a discussion about how to get a solid kind of API for this stuff first, and see where we end up. If we can imagine something pretty, then
Some XUpdate stuff, I don't know how recent or relevant:
http://xmldb-org.sourceforge.net/xupdate/index.html
browsing through it briefly it doesn't seem to really discuss these issues in detail. Still, XUpdate support for lxml would be cool. :) XUpdate is not being supported by most implementors since its an old, poorly and badly defined standard. Besides, it's been abandoned by its authors (it is still a draft for six years) and there is no sign it will change.
Better then explaining myself, please refer to: http://dev.sleepycat.com/resources/faq_show.html?id=19&back=%3Fproduct_id%3D3%26action%3Dsearch I think it would be much more interesting to support XQuery (through Xerces?) and XQuery Update when it gets defined... -- Best Regards, Steve Howe

Steve Howe wrote:
On Wednesday 28 June 2006 15:48, Martijn Faassen wrote:
Stefan Behnel wrote: [snip]
Lots of open questions ... Yes, lots of good questions. I mentioned XUpdate in my reply just now; it may make sense to take a look at how it tackles some of these issues (if it does). As far as I know it uses XPath to target bits of an XML document to change.
So, Andreas, don't be discouraged but just let's have a discussion about how to get a solid kind of API for this stuff first, and see where we end up. If we can imagine something pretty, then
Some XUpdate stuff, I don't know how recent or relevant:
http://xmldb-org.sourceforge.net/xupdate/index.html
browsing through it briefly it doesn't seem to really discuss these issues in detail. Still, XUpdate support for lxml would be cool. :)
XUpdate is not being supported by most implementors since its an old, poorly and badly defined standard. Besides, it's been abandoned by its authors (it is still a draft for six years) and there is no sign it will change.
While I agree XUpdate is indeed badly defined and the definition is old, it's simple and implementable. 4suite implements it, along with other libraries in Java-land.
Better then explaining myself, please refer to: http://dev.sleepycat.com/resources/faq_show.html?id=19&back=%3Fproduct_id%3D3%26action%3Dsearch
I think it would be much more interesting to support XQuery (through Xerces?) and XQuery Update when it gets defined...
XQuery is much much harder to implement. libxml2 won't be implementing it any time soon (if ever), and requiring another library to get XQuery going is a serious step. In addition, I don't even think the XQuery standard is final yet, let alone XQuery update. XQuery is also a *lot* more complicated to support than XUpdate. It is probably easier to get involved in updating XUpdate and cleaning up the definition than to implement XQuery. :) Anyway, I think something like XUpdate would be nice. Simplicity has its value. That's not to say that it wouldn't be really cool to have XQuery support in lxml too. It's just a big project to implement it, and XQuery tends to make the most sense in an XML database setting, and lxml isn't an XML database. Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Steve Howe wrote:
I think it would be much more interesting to support XQuery (through Xerces?) and XQuery Update when it gets defined...
It is probably easier to get involved in updating XUpdate and cleaning up the definition than to implement XQuery. :)
Anyway, I think something like XUpdate would be nice. Simplicity has its value.
XUpdate has its own namespace and uses XPath. It should be easy to implement it on top of lxml in callable namespace classes. Another candidate for "lxml.elementlib", preferably in its own lxml/elementlib/xupdate.py. So, if anyone wants to implement it... Stefan

Stefan Behnel wrote:
Martijn Faassen wrote:
Steve Howe wrote:
I think it would be much more interesting to support XQuery (through Xerces?) and XQuery Update when it gets defined... It is probably easier to get involved in updating XUpdate and cleaning up the definition than to implement XQuery. :)
Anyway, I think something like XUpdate would be nice. Simplicity has its value.
XUpdate has its own namespace and uses XPath. It should be easy to implement it on top of lxml in callable namespace classes. Another candidate for "lxml.elementlib", preferably in its own lxml/elementlib/xupdate.py.
I'm not clear on the purpose of 'elementlib' - I haven't followed discussion on that. Is that code that could apply to any ElementTree ipmlementation? If so, XUpdate wouldn't apply as it would rely on XPath.
So, if anyone wants to implement it...
Right. I'm thinking about it, but I don't have immediate time or need, so we'll see. :) Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Stefan Behnel wrote:
Martijn Faassen wrote:
Steve Howe wrote:
I think it would be much more interesting to support XQuery (through Xerces?) and XQuery Update when it gets defined... It is probably easier to get involved in updating XUpdate and cleaning up the definition than to implement XQuery. :)
Anyway, I think something like XUpdate would be nice. Simplicity has its value.
XUpdate has its own namespace and uses XPath. It should be easy to implement it on top of lxml in callable namespace classes. Another candidate for "lxml.elementlib", preferably in its own lxml/elementlib/xupdate.py.
I'm not clear on the purpose of 'elementlib' - I haven't followed discussion on that. Is that code that could apply to any ElementTree ipmlementation?
No, I brought up lxml.elementlib in the discussion with Holger on supporting different XML APIs on top of lxml. It's meant as a collection of commonly useful element classes, like a data.binding.like.subelement.attribute.access API (that may still make it into 1.1, BTW). Namespace implementations for things like XUpdate would best fit in there, too, if they get their own modules. You'd then go
from lxml.elementlib import xupdate # auto-register XUpdate namespace update = lxml.etree.XML("<xupdate:update ...>") update(tree)
So, if anyone wants to implement it...
Right. I'm thinking about it, but I don't have immediate time or need, so we'll see. :)
Same for me. :) Stefan

On Friday 30 June 2006 06:18, Martijn Faassen wrote:
While I agree XUpdate is indeed badly defined and the definition is old, it's simple and implementable. 4suite implements it, along with other libraries in Java-land. As long as it is a separate module as do not interfire with anything else, I don't see as a bad thing to implement it (something like etree.xupdate()). But let's just have in mind it's not the definitive solution for the problem.
Better then explaining myself, please refer to: http://dev.sleepycat.com/resources/faq_show.html?id=19&back=%3Fproduct_id %3D3%26action%3Dsearch
I think it would be much more interesting to support XQuery (through Xerces?) and XQuery Update when it gets defined...
XQuery is much much harder to implement. libxml2 won't be implementing it any time soon (if ever), and requiring another library to get XQuery going is a serious step. In addition, I don't even think the XQuery standard is final yet, let alone XQuery update. XQuery is also a *lot* more complicated to support than XUpdate.
It is probably easier to get involved in updating XUpdate and cleaning up the definition than to implement XQuery. :)
Anyway, I think something like XUpdate would be nice. Simplicity has its value. The simplest standard is probably XUpdate. XQuery is more complicated but as I said there is Xerces implementing it, but that would be more like for version 2.0. By the way, I'm not aware of any bindings implementing XQuery, that would be another lxml exclusivity.
That's not to say that it wouldn't be really cool to have XQuery support in lxml too. It's just a big project to implement it, and XQuery tends to make the most sense in an XML database setting, and lxml isn't an XML database. XQuery is totally database-relaed, just as XUpdate or XML; is all about data. And XQuery is very good at querying it. I totally see XQuery as an interesting thing to implement, but I agree that's something more complicated to be done later.
-- Best Regards, Steve Howe

Stefan Behnel wrote:
Hi Andreas,
Andreas Pakulat wrote:
before trying to code it I'd like to get opinions about the following.
I'd like to add a global xpath function to lxml, that takes an element (the context) and an xpath expression (as string) and returns a list of tuples. Where each tuple has
a) 3 parts, the attribute name, value and containing element b) 2 parts, the text and the element that has the text as text or tail c) 1 part, the element itself.
I'm not very comfortable with the idea of inferring the semantics of a tuple result from the number of its entries.
Agreed. Better to return some object that has an API that tells you what you got instead. A bit DOMish, but it would work.
I'd like to have this for XPathEvaluator (so I can highlight attribute and text nodes in the tree) and I'm willing to try and implement it in lxml.
Hmmm, it feels like a rather specific requirement. What you really want is to always receive a reference to the libxml2 node that carries the result of an XPath expression.
The problem is that this does not match very well with the rest of the API, where attributes 'do not exist', for example. I mean, your real problem is that you must evaluate arbitrary (opaque) XPath expressions and do not know what the non-element result of them means: Did a string come from an attribute or from element text? Which element did it come from? So you are interested in meta-data about the XPath result, not the result itself. I do not think many applications have this requirement.
That's true, but if Andreas is willing to do the work and we can agree on a good API, I wouldn't mind having an advanced API in lxml that gives this kind of information. I can imagine other situations, such as an XUpdate implementation, that would need such an API. Regards, Martijn

On 28.06.06 20:42:00, Martijn Faassen wrote:
Stefan Behnel wrote:
Andreas Pakulat wrote:
I'd like to have this for XPathEvaluator (so I can highlight attribute and text nodes in the tree) and I'm willing to try and implement it in lxml.
Hmmm, it feels like a rather specific requirement. What you really want is to always receive a reference to the libxml2 node that carries the result of an XPath expression.
The problem is that this does not match very well with the rest of the API, where attributes 'do not exist', for example. I mean, your real problem is that you must evaluate arbitrary (opaque) XPath expressions and do not know what the non-element result of them means: Did a string come from an attribute or from element text? Which element did it come from? So you are interested in meta-data about the XPath result, not the result itself. I do not think many applications have this requirement.
That's true, but if Andreas is willing to do the work and we can agree on a good API, I wouldn't mind having an advanced API in lxml that gives this kind of information.
Well, before you two expect too much from me: I only very recently "touched" the C-Python-Bridging stuff in libxml2 and I surely have no real idea how the pyrex stuff works (even though lxml.pyx doesn't look to complicated). On the other hand, I will have quite some time after my last exam at end of july, so I have time to learn it... Andreas -- Avoid reality at all costs.

Andreas Pakulat wrote:
before you two expect too much from me: I only very recently "touched" the C-Python-Bridging stuff in libxml2 and I surely have no real idea how the pyrex stuff works (even though lxml.pyx doesn't look to complicated).
Sadly, etree.pyx is the easy part already. The XPath and extension function stuff is a bit more complicated... Stefan
participants (4)
-
Andreas Pakulat
-
Martijn Faassen
-
Stefan Behnel
-
Steve Howe