Mailman 3 Fwd: [lxml-dev] xpath extension functions - lxml - The Python XML Toolkit

29 Jan 2005

      ...
...
...
...
...
...
from lxml import etree
doc = etree.parse('doc.xml')
xpath = XPath(doc)
 Overall, it is very close to what I had in mind, except that I would 
not call that class XPath. I understand it wraps XPath functionality, 
but what you describe is very much what a xmlXPathContext does... 
i.e. bind namespaces and extension functions! So I suggest we call 
the class XPathContext.... I read your whole document with 
s/XPath/XPathContext/g;    ;->
It's indeed probably just an xmlXPathContext wrapper.
I don't think we need to bother any developer with talk about a 
'Context'; it's a superfluous term carrying over from libxml2 in my 
mind, just like the API doesn't mention RelaxNG contexts or XSLT 
contexts. Unless a class 'XPath' brings something else to mind and 
confuses you, I want to go with XPath. :)
Well, precisely. A XPath is a XPath, like "/p", this on the other hand 
is an object that will be used to interpret many XPaths. Frankly, even 
if this were not how it were called in the libxml library, XPathContext 
is probably how I would name it... I am not opposed to another name, 
but -1 on XPath, and frankly I cannot think of a name that describes it 
better than XPathContext... (DocumentContext, since it also contains 
namespace aspects? But again, the XPath may introduce namespaces that 
the document did not know about.)
...
...
...
...
...
...
xpath.registerNamespaces(namespace_dict)
xpath.registerFunction('foo', f)
I had already written something closer to
xpath.registerFunction(f)
which is the same as
xpath.registerFunction(f, 'f')
The name I see as optional, so it should be the second arg.
How would the name otherwise be deduced? If we make the name 
non-optional, they can be dictionary keys...
The name of the function, f.__name__. Or, even better, a translation of 
camel-caps to hyphenated conventions which are the norm in the xslt 
world.
...
...
...
...
...
...
results = xpath.evaluate('//p')
 OK, here you assume relative to the document. The context also 
allows the idea of current node, which is a good thing when you work 
within an extension function.
So I would have a syntax
results = xpath.evaluate('//p', node)
which accepts a given node; by default the context's current location.
I though I described that later on in my document. :)
Oh, sorry. I misinterpreted the "context element" later in your 
document. I thought you meant the context of the application. It's true 
that the word context can be misleading.
...
...
xpathContext2 = xpathContext.cloneWithNode(node)
(so the target node would be read-only)
The second way could be to save the target node in a local variable 
while calling the extension function, in case it messes the target 
node. (Much less happy about that option.)
What is the target node?
This is how I call it ;-)
...
...
Finally, if we want to use a XPathContext with a different origin 
from within an extension function, I see two solutions:
What is a an XPathContext with a different origin and why would we 
want to do that? You mean to have an extension function do its own 
XPath evaluation?
Precisely. I have used this in some functions I wrote.
...
I can see here why cloning would be convenient, as you wouldn't need 
to figure out the extension functions anymore to set up; presumably 
you'd want to use the same set as before.
Correct. This was the recursion problem described in an earlier email.
...
...
On that note, if you want to reuse functions and namespaces, I would 
allow a clone function:
...
...
...
xpathContext2 = xpathContext.cloneWithDoc(doc)
or something like that.
Yeah, a clone() could be doable, and is a reasonable idea, but I won't 
worry about it for now. If initializing the XPath object is as easy as 
putting in a document and two dictionaries (one for namespaces, one 
for functions), I think people can do that themselves. I'm looking to 
cut out the API we really need first.
Fair enough, though I think we do need some way to do the clone with a 
different [ origin | target node | context element ] for the reasons 
above.
If you want to keep the API down, a way to do this would be to have 
getters on the namespace, functions _and_modules_ dictionaries, which 
we could use in the constructor:

newContext = XPathContext(doc, oldContext.namespaces, 
oldContext.extensionFunctions, oldContext.extensionModules)

(discussion later on why modules.)
I think this is possible, but heavy; this is why I would prefer a 
convenience function
newContext = oldContext.clone()
or
newContext = XPathContext(context=oldContext)
But that is a matter of preference.
...
...
An upside would be the following: Assume that we store the (Pyrex) 
XPathContext object in the void* userData in the (C) xmlXPathContext.
Recall that (Python) extension functions receive the (C) 
xmlXPathParserContext (as a Pyrex object of course), and hence can 
access the (C) xmlXPathContext, so we could give them access to the 
(Pyrex) XPathContext.
That way, Kapil can store user data in some object variable in the 
(per-session) clone of the XPathContext; and make all the clones from 
a single one which is configured with namespaces and extensions.
I'm not sure I understand. Why not provide a new object altogether for 
UserData? I don't see why this need be the task of the XPath context. 
Could be a third argument to evaluate(), perhaps?
See it from the viewpoint of the extension function: All it receives 
(in C) is the xmlXPathContext. I assume that what the python extension 
function would receive would be a Pyrex object that wraps that C 
structure (however we call it!), or a Python object that we would have 
stored in that C structure (using the userData field). The latter 
choice actually risks losing information; the (wrapped) xmlXPathContext 
contains a lot of information that the extension function might need, 
like namespaces. Hence, I assume that further user data, for Kapil's 
need, would have to be accessible from this object that the Python 
function receives, which is still probably the xmlXPathContext wrapper. 
  Hence the above paragraph.
...
...
Also, each document gets to reuse a XPathContext, which means that we 
do not have to set one up each time we evaluate a xpath.
This is a separate idea from the above user data story, right? I see 
this as an optimization of the .xpath method. It would require a way 
to see the XPath object of the document to be seeded with namespaces 
through a separate API on the document or something.. An alternative 
is just to do away with the .xpath method altogether and require 
people to use XPath() directly, which would make it harder for people 
to make performance mistakes. I'm not sure, the convenience it offers 
right now, especially for evaluating in element contexts, is pretty 
nice.
It is indeed... This is why I was toying with the idea of tying a 
default XPathContext to a document. Still ambivalent about it.
...
...
Finally, another thing I would add to the xpathContext API is the 
option to declare variables (again as a dictionary) that can be read 
(only) by extension functions. Another way for Kapil to do things... 
though there he only gets a literal or node, not a Python object. 
Still, a literal can also be a key into a thread-global dictionary of 
session objects.
Having some way to pass along Python objects to extension functions 
would be nice. This is separate from the XPath $variable concept, 
which we'll also need to support.
Yes, you are right, they are distinct. I was musing here.
...
...
Overall, I think this can fly.
...
Registration of functions could work as a dictionary too:
...
...
...
xpath.registerFunctions(function_dict) # function name : python 
func
Yes... I do like this, but let us look at the XSLT philosophy, which 
assumes modules, before we do too much that is incompatible... So I 
am a tad less sure here.
I think worrying about XSLT when we get to it would be fine.
Sorry, but this is where we really disagree. I also started that way, 
and then realized that it was a way to get this nice API for extension 
functions, and then realize that I'd have to build a completely 
different one for XSLT... Which also involves extension functions, in a 
very different way. This is a way to seriously get into impedance 
mismatch with two incompatible APIs that attempt to do the same thing.
...
A dictionary of extension functions could be turned into some kind of 
module, right?
No. A module involves
a) extension functions
b) extension elements (each of which is strictly speaking a pair of 
functions, but I decided to sweep that under the carpet...)
c) a namespace URI
d) convenience methods for module initialization, for each document 
parsed

More annoying, modules are registered globally, unlike XPath extension 
functions.
...
...
The way I see it, we should actually be able to associate extension 
functions and elements with a namespace. A
Extension elements are a separate story again, right? Can extension 
functions have a namespace?
Yes, very much so. (You will notice I have them in my toy 
implementation.)
...
...
module allows that in a neat fashion:
a module encapsulates a namespace URI, extension functions and 
elements, and management function that are called at beginning/end of 
module setup and document transform respectively. (And yes, I have 
found both of those useful!)
Also, modules are very much registered globally. (They are associated 
to a local name at the level of an individual transform, however.)
SO....
One option is to actually reuse the existing libxslt.extensionModule 
class.
I won't allow anything like the libxml2-style APIs near the lxml API. 
:) I'm sure there are useful concepts in there, but I first want to 
tackle XPath. Then we'll look at XSLT. If we keep the XPath API as 
minimal and simple as we can (for the Python developer using lxml), we 
should be able to translate some of those concepts into the XSLT API.
Anyway, I'll skip the XSLT part for now until we've implemented an 
XPath API.
Again, sorry to differ, and it is your project, but I hope you will 
reconsider.
Note that I am very open to thinking of other ways to treat the 
modules, that is more pythonic and less close to the  libxslt; I am 
however quite convinced that tackling the two projects as separate is a 
way to to wind up with a two-headed API, or a lot of glue to make it 
look like a single one.

Also, to be honest, though I really admire what has happened so far in 
lxml to simplify the libxml API, I am not so convinced that the xslt 
extension API (i.e. using modules) so desperately needs fixing. But as 
I said, I would love to discuss alternatives if you care to discuss 
them.  (I have not thought of any yet, myself, but I can try.)

I hope you will not be offended by my strong opinions in the matter.

Regards,
Marc-Antoine Parent

Fwd: [lxml-dev] xpath extension functions

Marc-Antoine Parent

tags

participants (1)