Re: [lxml-dev] xpath extension functions

Jan. 28, 2005

      Marc-Antoine Parent wrote:
...
...
First of all, thanks Kapil and Marc-Antoine for the discussion about 
xpath extension functions. I hope you don't mind me admitting I'm 
somewhat overwhelmed. I thought a good way to look at it now would be 
from the perspective of the API developers see, and write down some 
concrete use cases.
....
OK. I finally took some time to read this.
...
...
...
...
from lxml import etree
doc = etree.parse('doc.xml')
xpath = XPath(doc)
Overall, it is very close to what I had in mind, except that I would not 
call that class XPath. I understand it wraps XPath functionality, but 
what you describe is very much what a xmlXPathContext does... i.e. bind 
namespaces and extension functions! So I suggest we call the class 
XPathContext.... I read your whole document with 
s/XPath/XPathContext/g;    ;->
It's indeed probably just an xmlXPathContext wrapper.

I don't think we need to bother any developer with talk about a 
'Context'; it's a superfluous term carrying over from libxml2 in my 
mind, just like the API doesn't mention RelaxNG contexts or XSLT 
contexts. Unless a class 'XPath' brings something else to mind and 
confuses you, I want to go with XPath. :)
...
...
...
...
...
xpath.registerNamespaces(namespace_dict)
xpath.registerFunction('foo', f)
I had already written something closer to
...
...
...
xpath.registerFunction(f)
which is the same as
xpath.registerFunction(f, 'f')
The name I see as optional, so it should be the second arg.
How would the name otherwise be deduced? If we make the name 
non-optional, they can be dictionary keys...
...
...
...
...
...
results = xpath.evaluate('//p')
OK, here you assume relative to the document. The context also allows 
the idea of current node, which is a good thing when you work within an 
extension function.
So I would have a syntax
...
...
...
results = xpath.evaluate('//p', node)
which accepts a given node; by default the context's current location.
I though I described that later on in my document. :)
...
On that note, if you want to reuse functions and namespaces, I would 
allow a clone function:
...
...
...
xpathContext2 = xpathContext.cloneWithDoc(doc)
or something like that.
Yeah, a clone() could be doable, and is a reasonable idea, but I won't 
worry about it for now. If initializing the XPath object is as easy as 
putting in a document and two dictionaries (one for namespaces, one for 
functions), I think people can do that themselves. I'm looking to cut 
out the API we really need first.
...
An upside would be the following: Assume that we store the (Pyrex) 
XPathContext object in the void* userData in the (C) xmlXPathContext.
Recall that (Python) extension functions receive the (C) 
xmlXPathParserContext (as a Pyrex object of course), and hence can 
access the (C) xmlXPathContext, so we could give them access to the 
(Pyrex) XPathContext.
That way, Kapil can store user data in some object variable in the 
(per-session) clone of the XPathContext; and make all the clones from a 
single one which is configured with namespaces and extensions.
I'm not sure I understand. Why not provide a new object altogether for 
UserData? I don't see why this need be the task of the XPath context. 
Could be a third argument to evaluate(), perhaps?
...
Also, 
each document gets to reuse a XPathContext, which means that we do not 
have to set one up each time we evaluate a xpath.
This is a separate idea from the above user data story, right? I see 
this as an optimization of the .xpath method. It would require a way to 
see the XPath object of the document to be seeded with namespaces 
through a separate API on the document or something.. An alternative is 
just to do away with the .xpath method altogether and require people to 
use XPath() directly, which would make it harder for people to make 
performance mistakes. I'm not sure, the convenience it offers right now, 
especially for evaluating in element contexts, is pretty nice.
...
Finally, if we want to use a XPathContext with a different origin from 
within an extension function, I see two solutions:
What is a an XPathContext with a different origin and why would we want 
to do that? You mean to have an extension function do its own XPath 
evaluation? I can see here why cloning would be convenient, as you 
wouldn't need to figure out the extension functions anymore to set up; 
presumably you'd want to use the same set as before.
...
The first is, again, to clone the context:
xpathContext2 = xpathContext.cloneWithNode(node)
(so the target node would be read-only)
The second way could be to save the target node in a local variable 
while calling the extension function, in case it messes the target node. 
(Much less happy about that option.)
What is the target node?
...
Finally, another thing I would add to the xpathContext API is the option 
to declare variables (again as a dictionary) that can be read (only) by 
extension functions. Another way for Kapil to do things... though there 
he only gets a literal or node, not a Python object. Still, a literal 
can also be a key into a thread-global dictionary of session objects.
Having some way to pass along Python objects to extension functions 
would be nice. This is separate from the XPath $variable concept, which 
we'll also need to support.
...
Overall, I think this can fly.
...
Registration of functions could work as a dictionary too:
...
...
...
xpath.registerFunctions(function_dict) # function name : python func
Yes... I do like this, but let us look at the XSLT philosophy, which 
assumes modules, before we do too much that is incompatible... So I am a 
tad less sure here.
I think worrying about XSLT when we get to it would be fine. A 
dictionary of extension functions could be turned into some kind of 
module, right?
...
The way I see it, we should actually be able to associate extension 
functions and elements with a namespace. A
Extension elements are a separate story again, right? Can extension 
functions have a namespace?
...
module allows that in a neat fashion:
a module encapsulates a namespace URI, extension functions and elements, 
and management function that are called at beginning/end of module setup 
and document transform respectively. (And yes, I have found both of 
those useful!)
Also, modules are very much registered globally. (They are associated to 
a local name at the level of an individual transform, however.)
SO....
One option is to actually reuse the existing libxslt.extensionModule 
class.
I won't allow anything like the libxml2-style APIs near the lxml API. :) 
I'm sure there are useful concepts in there, but I first want to tackle 
XPath. Then we'll look at XSLT. If we keep the XPath API as minimal and 
simple as we can (for the Python developer using lxml), we should be 
able to translate some of those concepts into the XSLT API.

Anyway, I'll skip the XSLT part for now until we've implemented an XPath 
API. I think I'll sit down with your patch sometime soon and try to 
build up the API we've sketched here, at least in basic form. Then 
hopefully you can give feedback on how to tackle the tricky bits, of 
which there are many.

[snip rest of long mail which I'll look at later;
I see there's some XPath stuff mixed in that I need to read too]

Regards,

Martijn

Re: [lxml-dev] xpath extension functions

Martijn Faassen