[lxml-dev] XInclude does not support Resolvers?

Hi there, I was looking at the XInclude functionality and noticed that it does use the 'resolvers'. Is that a missing feature or it just can't be done at all? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

Hi, Sidnei da Silva wrote:
It's rather hard to do with the current libxml2. The way it works with parsers is: we create a libxml2 parser context and store a pointer to the Python resolver context in it, which allows us to call the resolvers from the C code when requested. libxml2's XInclude API does not allow us to modify the parser context it uses, so there is currently no way to hand the resolvers over to the lookup function. When the custom lookup function of lxml is called from the XInclude code, we just can't figure out if there are any resolvers to call at that point. We could consider using something like thread contexts to store the resolvers, but that should uglify the way it's currently done and I don't know if we'd get into trouble in other places. So I currently do not consider it worth the effort. Stefan

On Tue, 2006-11-28 at 12:36 -0200, Sidnei da Silva wrote:
It wouldn't be on the C side, but I think lxml might stick close enough to the ElementTree api to use their python xinclude implementation (ElementInclude.py) which does support the specification of a resolver if you need it. If you don't want to use that directly, it's probably a decent example. There is also a modified version of it that we use with lxml in Deliverance, but it's recursive I believe. (http://codespeak.net/svn/z3/deliverance/trunk/deliverance/xinclude.py) - Luke

Luke Tucker wrote:
It is, right. For a Python-based solution in lxml, I'd personally prefer something based on getiterator("{http://.../xinclude}include"), should be much faster. Maybe we could even switch to something like that internally in lxml... Stefan

Stefan Behnel wrote:
I'm fine with supporting something Python-based in addition to the libxml2 version, but I think the XInclude implementation in libxml2 has the benefit in that it's probably fairly complete and besides, *they*'re maintaining it, not us. :) So, I'm fine with adding our own XInclude support, as long as it's in addition and not a replacement, along the same lines as the way we support ElementTree's 'find' together with our own 'xpath'. Once the remaining buildout issues get taken care of it'll be a lot easier to work with lxml and newer versions of libxml2. This may enable us to push against libxml2 a bit harder. Regards, Martijn

Hi, Martijn Faassen wrote:
I copied ET's ElementInclude module over to lxml (trunk) and modified it a bit. The related tests in ET's selftest.py pass (with one minor exception), although the serialisations can look a little different (so I had to fix the doctests a little). The implementation is adapted in that it uses Element.getiterator() to find the XInclude elements. I also had to extend lxml's API in order to make the original parser of a document available at the API level. There is now a 'parser' property on _ElementTree that is used by ElementInclude to provide the same parser configuration (including resolvers) as for the source document. It's not tested much, so I'd be glad if others could give it a try. Hope it's useful, Stefan

Hi, Sidnei da Silva wrote:
It's rather hard to do with the current libxml2. The way it works with parsers is: we create a libxml2 parser context and store a pointer to the Python resolver context in it, which allows us to call the resolvers from the C code when requested. libxml2's XInclude API does not allow us to modify the parser context it uses, so there is currently no way to hand the resolvers over to the lookup function. When the custom lookup function of lxml is called from the XInclude code, we just can't figure out if there are any resolvers to call at that point. We could consider using something like thread contexts to store the resolvers, but that should uglify the way it's currently done and I don't know if we'd get into trouble in other places. So I currently do not consider it worth the effort. Stefan

On Tue, 2006-11-28 at 12:36 -0200, Sidnei da Silva wrote:
It wouldn't be on the C side, but I think lxml might stick close enough to the ElementTree api to use their python xinclude implementation (ElementInclude.py) which does support the specification of a resolver if you need it. If you don't want to use that directly, it's probably a decent example. There is also a modified version of it that we use with lxml in Deliverance, but it's recursive I believe. (http://codespeak.net/svn/z3/deliverance/trunk/deliverance/xinclude.py) - Luke

Luke Tucker wrote:
It is, right. For a Python-based solution in lxml, I'd personally prefer something based on getiterator("{http://.../xinclude}include"), should be much faster. Maybe we could even switch to something like that internally in lxml... Stefan

Stefan Behnel wrote:
I'm fine with supporting something Python-based in addition to the libxml2 version, but I think the XInclude implementation in libxml2 has the benefit in that it's probably fairly complete and besides, *they*'re maintaining it, not us. :) So, I'm fine with adding our own XInclude support, as long as it's in addition and not a replacement, along the same lines as the way we support ElementTree's 'find' together with our own 'xpath'. Once the remaining buildout issues get taken care of it'll be a lot easier to work with lxml and newer versions of libxml2. This may enable us to push against libxml2 a bit harder. Regards, Martijn

Hi, Martijn Faassen wrote:
I copied ET's ElementInclude module over to lxml (trunk) and modified it a bit. The related tests in ET's selftest.py pass (with one minor exception), although the serialisations can look a little different (so I had to fix the doctests a little). The implementation is adapted in that it uses Element.getiterator() to find the XInclude elements. I also had to extend lxml's API in order to make the original parser of a document available at the API level. There is now a 'parser' property on _ElementTree that is used by ElementInclude to provide the same parser configuration (including resolvers) as for the source document. It's not tested much, so I'd be glad if others could give it a try. Hope it's useful, Stefan
participants (4)
-
Luke Tucker
-
Martijn Faassen
-
Sidnei da Silva
-
Stefan Behnel