[lxml-dev] Some thoughts re XPath extension functions

OK, I have a first (barely) functional implementation of registerXPathExtensionFunc. (I sent it to Martijn, but it's not ready for checkin until some collective thinking happens.) For one thing, I currently collect functions globally, and register them with the new XPathContext that is created on demand every time the xpath function is called. That operation is unfortunately slow. I think that there are reasons that XPathContext are created on demand: If your extension function calls the xpath method, I think it is necessary to use a new context. (Needs checking.) But then that means re-registering extensions, of course... I first thought of keeping a XPathContext with the xmlDoc. It would save creation time, for one thing. But it means that I have to guard against the recursion problems above. (Yes, I have done such things, it is a real situation.) Also, more care to guard against leaks. Do people here think it worth the trouble? And I also thought that the extension functions should be registered with the document, and not globally. Do people agree this is a good thing? More complicated in some ways, but it would allow different documents to have different extension functions registered. Is this useful in real life? I cannot think of a use case. I would definitely like feedback on this issue. Another alternative would be to make people manipulate XPathContext explicitely, and provide it (as an optional argument?) when calling XPaths functions. I think that is ugly, and again I cannot see use cases for using two distinct sets of functions on a single document. Anybody disagrees? On that note, extension functions receive a XPathParserContext, which indirectly gives access to a lot of relevant information that an extension function might need. For example, it gives access to the XPathContext, which can give access to the xsltTransformContext, which gives you access to the node being constructed as a result of a XSL transformation... I have built extension functions that use the latter; and of course the XPathContext tells you the node where the xpath is being evaluated. Is there anything else that people here know they might need from any of those contexts in XPath extension functions? (Not to mention XSL extension elements, when I get there...) Because we would have to put methods/accesors on appropriate Python wrappers around all those. And worry about wrapping and hence releasing these. Sigh. (I am thinking of putting a lot of accessors on the XPathContext only, so we do not have to wrap the other ones in Python and worry about memory management. Objections?) OK, I am through bombarding people for a while, my next step when I look at this next will be to build a test suite... MAP

On Jan 20, 2005, at 10:07 PM, Marc-Antoine Parent wrote:
very cool!
dunno.
i've been working on putting together an xsl engine in zope, i originally went with pyana/xalan for this very reason, the ability to have non global xpath extensions. as to give the extension functions, access to a zope request context ( basically an http request) needed access from a global perspective which was tricky, as well as conditional availablility of certain functions based on that context. i've since rewritten the engine ( since pyana doesn't allow for returning nodesets from ext functions) to use libxml/libxslt and play lots of thread local storage games to get access to the context (and manage the global error handlers). anyways, i'd like to see the capability of non global registration of extension functions, and i think the above is a valid use case, but the lack thereof can be worked around. one abstraction that pyana has that i like a lot is that of a reusable transformer object analagous where functions, and transform aspects can be set and reused against a given set of stylesheet transforms.
thats interesting.. if the xpathcontext is document stored, then they wouldn`t nesc. need to pass it on method invocation, they could set the xpath context for the document, after manipulating it. cheers, -kapil

First, thank you for the feedback. You have indeed a very valid use case for non-global functions; Let me make sure I understand it thoroughly. I understand that there the extensions needed access to an application context (the http request context) that varied from document to document; and I suppose that, by applying functions selectively to a transformation, you could introduce local variables in the function that knew about the application context information. Fair... So I say this invalidates my original approach, i.e. XPathContext attached to the document, as it is possible you will want to apply the same stylesheet to the same document with a different instance of the extension functions. I doubt I can optimize around that, but I get the impression that, for the same document and/or stylesheet, the extension functions would always be the same functions, though they might need access to data that varies per-call. Is that right? So maybe if we could somehow define access to a user-data parameter within the extension functions... Maybe from the python wrapper around the XPathParserContext parameter... But that also complicates the API, which is very much what lxml is working against. Still, it might be easier than exposing XPathContext manipulations in the API. Would you agree that is so? I also very much have the intuition that different documents should have different sets of extension functions, somehow. So I say that global registration is out. That was only a proof of concept, anyway. But having a way to package a set of extensions sounds like a very good idea. (Then, XSLT extension API also allows registering a whole module's worth of extensions. I like that. I started with the basic XPath extension, outside of XSLT, because that is my primary use case.) Marc-Antoine

On Jan 21, 2005, at 7:25 AM, Marc-Antoine Parent wrote:
at the moment, the application context would be constant for a given set of documents, though thats only because i'm reparsing docs as needed between requests and caching for the scope of the request context. based on the app context, an extension manager would install extension functions into an xpath context, which would be used for /from an xsl transform context. currently, as a workaround, all extensions are installed by an extension manager with extension function wrappers that use thread local storage accessors to manage app context and pass it directly to the extension function. so ideal would be, getting away from these workarounds and being able to register an extension already bound with the app context into an xpath context. also i forget to mention its opensource.. the extension management code is here http://svn.objectrealms.net/view/public/xslmethod/branches/libxsl/ extension.py the trunk has the same against pyana which has much less moving parts.
it doesn't nesc. mean that.. first i'll try and work with whatever is there. second, if the xpath context can be explicitly set against the doc then it could perhaps still be stored doc local. third, recreating documents between requests is fine. i still need to investigate some of the libxml2/xsl internals in terms of understanding access to the xpath context of an xsl transform context and its interaction with documents. i think the explicit set of the xpath context with app context bound ext functions on an xsl transform context would be ideal if its possible, i think might be possible right now with the libxml2 bindings.
yes.
implmentation wise it might be easier but as a better overall approach, not really.. just to be clear, your saying lxml should use a userdata parameter as a workaround against global extension functions needing local context, instead of local functions? ideally for an easier api, i think a separate abstraction of a transformer to wrap xpath parsing and transform context and functions, with an api for registering local xpath functions would be a step forward.
i think global registration definitely has its place, there are lots of potential functions which don't care about local contexts, and the easiest mechanism for a user to install them would be globally.
i tried the xslt extension api without much luck, and gave up on it after a few attempts cause i didn't really see any strong benefit to it. regardless, having ways to register whole module/namespaces/sets of functions would be cool. Kapil Thangavelu <hazmat@objectrealms.net> Vision Implemented objectrealms.net <http://www.objectrealms.net>

Yes... Since that last letter, I was leaning more and more convinced that exposing the XPathContext was not a bad idea after all. We could declare a XPathContext, with extension functions and namespaces that would look as two dictionaries on the XPathContext. That said, I am much less sure after looking at the xslt extension elements... The XSLT extension API is very different from the libxml XPath extension API, in that it uses global registration of modules. (There is registration against a transform context, but that seems deprecated.) That makes your use-case a bit difficult to handle, unless (again) we hide user data in a way that is accessible through the transform context... What bothers me is the inconsistency between both mechanisms. Sigh...
Well... If I provide an API that allows non-global registration, I am not happy about providing another one that allows global registration. Two ways to do something is not so pythonic! But, because of the XSLT API, I am again thinking in terms of global registration and access to user data. Getting the right design is never easy! Marc-Antoine

Marc-Antoine Parent wrote: [snip]
Dropping into the middle of things, and having not followed most of this discussion yet (I haven't had the time yet!), I was considering exposing a separate XPath object, like I have a RelaxNG and XSLT object already. the xpath method could then be implemented in terms of this XPath object, and would just be a convenience thing. The XPath object might have an XPathContext inside and you can indeed register namespaces and functions and so on on it. Sorry if I'm saying something obvious or obviously wrong. :) Regards, Martijn

On Jan 23, 2005, at 5:56 AM, Marc-Antoine Parent wrote:
i think there are two distinct use cases here, and that for people expecting either one its alot easier to have both options catered to in the api. if you have an extension which is to be global reregistering the extension for every xpath context is a bit tedious. the converse (local extensions with global api) is a much more difficult affair. so, if your set on providing one, i'd go for the local approach. -kapil

Kapil Thangavelu wrote:
On Jan 23, 2005, at 5:56 AM, Marc-Antoine Parent wrote:
I think we should focus on making local work first. We can always come up with an easy way to have some functions always reregistered for every context we set up, if necessary. I hope actually pulling in the global functions will be as simple as combining a few dictionaries, though, easy enough for a user to just do it themselves perhaps. Regards, Martijn

On Jan 20, 2005, at 10:07 PM, Marc-Antoine Parent wrote:
very cool!
dunno.
i've been working on putting together an xsl engine in zope, i originally went with pyana/xalan for this very reason, the ability to have non global xpath extensions. as to give the extension functions, access to a zope request context ( basically an http request) needed access from a global perspective which was tricky, as well as conditional availablility of certain functions based on that context. i've since rewritten the engine ( since pyana doesn't allow for returning nodesets from ext functions) to use libxml/libxslt and play lots of thread local storage games to get access to the context (and manage the global error handlers). anyways, i'd like to see the capability of non global registration of extension functions, and i think the above is a valid use case, but the lack thereof can be worked around. one abstraction that pyana has that i like a lot is that of a reusable transformer object analagous where functions, and transform aspects can be set and reused against a given set of stylesheet transforms.
thats interesting.. if the xpathcontext is document stored, then they wouldn`t nesc. need to pass it on method invocation, they could set the xpath context for the document, after manipulating it. cheers, -kapil

First, thank you for the feedback. You have indeed a very valid use case for non-global functions; Let me make sure I understand it thoroughly. I understand that there the extensions needed access to an application context (the http request context) that varied from document to document; and I suppose that, by applying functions selectively to a transformation, you could introduce local variables in the function that knew about the application context information. Fair... So I say this invalidates my original approach, i.e. XPathContext attached to the document, as it is possible you will want to apply the same stylesheet to the same document with a different instance of the extension functions. I doubt I can optimize around that, but I get the impression that, for the same document and/or stylesheet, the extension functions would always be the same functions, though they might need access to data that varies per-call. Is that right? So maybe if we could somehow define access to a user-data parameter within the extension functions... Maybe from the python wrapper around the XPathParserContext parameter... But that also complicates the API, which is very much what lxml is working against. Still, it might be easier than exposing XPathContext manipulations in the API. Would you agree that is so? I also very much have the intuition that different documents should have different sets of extension functions, somehow. So I say that global registration is out. That was only a proof of concept, anyway. But having a way to package a set of extensions sounds like a very good idea. (Then, XSLT extension API also allows registering a whole module's worth of extensions. I like that. I started with the basic XPath extension, outside of XSLT, because that is my primary use case.) Marc-Antoine

On Jan 21, 2005, at 7:25 AM, Marc-Antoine Parent wrote:
at the moment, the application context would be constant for a given set of documents, though thats only because i'm reparsing docs as needed between requests and caching for the scope of the request context. based on the app context, an extension manager would install extension functions into an xpath context, which would be used for /from an xsl transform context. currently, as a workaround, all extensions are installed by an extension manager with extension function wrappers that use thread local storage accessors to manage app context and pass it directly to the extension function. so ideal would be, getting away from these workarounds and being able to register an extension already bound with the app context into an xpath context. also i forget to mention its opensource.. the extension management code is here http://svn.objectrealms.net/view/public/xslmethod/branches/libxsl/ extension.py the trunk has the same against pyana which has much less moving parts.
it doesn't nesc. mean that.. first i'll try and work with whatever is there. second, if the xpath context can be explicitly set against the doc then it could perhaps still be stored doc local. third, recreating documents between requests is fine. i still need to investigate some of the libxml2/xsl internals in terms of understanding access to the xpath context of an xsl transform context and its interaction with documents. i think the explicit set of the xpath context with app context bound ext functions on an xsl transform context would be ideal if its possible, i think might be possible right now with the libxml2 bindings.
yes.
implmentation wise it might be easier but as a better overall approach, not really.. just to be clear, your saying lxml should use a userdata parameter as a workaround against global extension functions needing local context, instead of local functions? ideally for an easier api, i think a separate abstraction of a transformer to wrap xpath parsing and transform context and functions, with an api for registering local xpath functions would be a step forward.
i think global registration definitely has its place, there are lots of potential functions which don't care about local contexts, and the easiest mechanism for a user to install them would be globally.
i tried the xslt extension api without much luck, and gave up on it after a few attempts cause i didn't really see any strong benefit to it. regardless, having ways to register whole module/namespaces/sets of functions would be cool. Kapil Thangavelu <hazmat@objectrealms.net> Vision Implemented objectrealms.net <http://www.objectrealms.net>

Yes... Since that last letter, I was leaning more and more convinced that exposing the XPathContext was not a bad idea after all. We could declare a XPathContext, with extension functions and namespaces that would look as two dictionaries on the XPathContext. That said, I am much less sure after looking at the xslt extension elements... The XSLT extension API is very different from the libxml XPath extension API, in that it uses global registration of modules. (There is registration against a transform context, but that seems deprecated.) That makes your use-case a bit difficult to handle, unless (again) we hide user data in a way that is accessible through the transform context... What bothers me is the inconsistency between both mechanisms. Sigh...
Well... If I provide an API that allows non-global registration, I am not happy about providing another one that allows global registration. Two ways to do something is not so pythonic! But, because of the XSLT API, I am again thinking in terms of global registration and access to user data. Getting the right design is never easy! Marc-Antoine

Marc-Antoine Parent wrote: [snip]
Dropping into the middle of things, and having not followed most of this discussion yet (I haven't had the time yet!), I was considering exposing a separate XPath object, like I have a RelaxNG and XSLT object already. the xpath method could then be implemented in terms of this XPath object, and would just be a convenience thing. The XPath object might have an XPathContext inside and you can indeed register namespaces and functions and so on on it. Sorry if I'm saying something obvious or obviously wrong. :) Regards, Martijn

On Jan 23, 2005, at 5:56 AM, Marc-Antoine Parent wrote:
i think there are two distinct use cases here, and that for people expecting either one its alot easier to have both options catered to in the api. if you have an extension which is to be global reregistering the extension for every xpath context is a bit tedious. the converse (local extensions with global api) is a much more difficult affair. so, if your set on providing one, i'd go for the local approach. -kapil

Kapil Thangavelu wrote:
On Jan 23, 2005, at 5:56 AM, Marc-Antoine Parent wrote:
I think we should focus on making local work first. We can always come up with an easy way to have some functions always reregistered for every context we set up, if necessary. I hope actually pulling in the global functions will be as simple as combining a few dictionaries, though, easy enough for a user to just do it themselves perhaps. Regards, Martijn
participants (3)
-
Kapil Thangavelu
-
Marc-Antoine Parent
-
Martijn Faassen