[lxml-dev] Evaluate XSLT directives inside custom XSLT extension elements

Hi, I have a problem I couldn't find the way to solve. Sadly if it's not possible. That's the code from XSLT extensions tutorial: <xsl:template match="*"> <my:python-extension> <some-content /> </my:python-extension> </xsl:template> Everything's working cool, but I need a bit more: <xsl:template match="*"> <my:python-extension> <something><xsl:attribute name="test"><xsl:value-of select="111" /></xsl:attribute></something> </my:python-extension> </xsl:template> Let's assume that's my execute function's signature: execute(self, context, self_node, input_node, output_parent) When it's started by XSLT processor, self_node contains <something> tag with <xsl:attribute> and <xsl:value-of> tags inside it. What should I do to ask XSLT processor to evaluate it? I need to make self_node containing <something test="111" />. And even more. What if my XSLT file looks like: <xsl:template match="*"> <my:python-extension> <something> <xsl:attribute name="test"><xsl:value-of select="111" /></xsl:attribute> <my:python-extension> <xsl:attribute name="test2"><xsl:value-of select="222" /></xsl:attribute> </my:python-extension> </something> </my:python-extension> </xsl:template> I need to have execute function called for deepest <my:python-extension> first, I need to evaluate <xsl:attribute> inside it to have <my:python-extension test2="222">... And to process the rest <my:python-extension> having the result of processed <my:python-extension test2="222"> and all XSLT instructions inside. Is there a way to do it? Or maybe there is some other way to get such possibilities? Thanks. -- Marat

Marat Dakota, 08.11.2009 23:28:
It's funny that you found the example above but didn't make it to the text sections right below it. http://codespeak.net/lxml/extensions.html#applying-xsl-templates Stefan

Than it looks like I don't completely understand how it works. Could you please help me: XSLT file is: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="testns" extension-element-prefixes="my"> <xsl:template match="/"> <foo> <my:ext> <xsl:attribute name="test"><xsl:value-of select="111" /></xsl:attribute> blabla </my:ext> </foo> </xsl:template> </xsl:stylesheet> Function is: def execute(self, context, self_node, input_node, output_parent): ????? print etree.tostring(deepcopy(self_node)) What should I put instead of ????? to make self_node having attribute named test with "111" as value? I tried to call self.apply_templates for everything (for self_node, for self_node[0], even for input_node and output_parent). No result in either case, print result is the same: <my:ext xmlns:my="testns" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:attribute name="test"><xsl:value-of select="111"/></xsl:attribute> blabla </my:ext> What am I doing wrong? Thanks. -- Marat On Mon, Nov 9, 2009 at 10:44 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

Hi, please don't top-post. Marat Dakota, 09.11.2009 10:13:
Ah, yes, attributes. Attributes are mapped to smart strings when passed through Python code.
Remember that self_node is the extension element itself. It will not change during the evaluation, so printing it is uninteresting.
I would guess that you want to call it on "input_node". Calling .apply_templates() should return a string (although I never tested that), which you can then add as a new attribute to the tree. You have to do that manually, though. I would expect that "is_attribute" flag of the smart string to be True in your case (see the XPath docs), that would allow you to distinguish attributes from plain text context. Stefan

Thanks for reply. It looks like it's not working as expected. class MyExtElement(etree.XSLTExtension): def execute(self, context, self_node, input_node, output_parent): results = self.apply_templates(context, input_node) print results This code causes: Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in 'lxml.etree._callExtensionElement' ignored [<Element foo at 1007b0998>] [<Element foo at 1007b0940>] [<Element foo at 1007b0a48>] [<Element foo at 1007b0aa0>] [<Element foo at 1007b0af8>] ... and so on for my example. When I try input_node[0] instead of input_node: results = self.apply_templates(context, input_node[0]) print results results is empty list. By the way, this simple code causes segmentation fault on my OSX, Linux and Windows machines. class MyExtElement(etree.XSLTExtension): def execute(self, context, self_node, input_node, output_parent): print input_node -- Marat On Tue, Nov 10, 2009 at 11:26 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

Hi, please, don't top-post. Marat Dakota, 10.11.2009 10:38:
Ok, that's a bug. It shouldn't swallow that exception. It's because it's trying to call Python functions in the exception handler, which fails if the recursion limit is already reached. I'll see if I can make this more robust. But, yes, the above will necessarily lead to infinite recursion. Sorry, didn't think of that. The feature you want is not "apply_templates", as that just mimics the behaviour of xsl:apply-templates in XSLT, i.e. you can't define which template will be applied or which XSL tags will run. That feature is currently not available (apart from creating a new stylesheet from the content of the extension element and applying that to input_node, but that's a clumsy and also incomplete solution).
You didn't show your input document, so I don't know what "input_node" or "input_node[0]" actually are in your case.
No idea why, I'll have to look into that. Stefan

On Tue, Nov 10, 2009 at 6:22 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi,
please, don't top-post.
Sorry, I didn't know what 'top-post' mean until you've asked me twice not to do it and I checked wikipedia.
That's sad. Do you think it's hard to implement this feature, are there libxml or libxslt limits that will not allow to do that? Maybe I could join in and dig a bit? Just don't know where to start. I hope it's not too complicated and it's possible - I have my problem's elegant solution, but it needs this feature.
My input document is just: <dummy />
My colleague traced it a bit. He was just very curious if it's python's segfault (which he never met and that's why he's curious) or lxml's one. He said lxml dies when trying to read property tag of input_node. -- Marat

Marat Dakota, 10.11.2009 16:58:
All it takes is figuring out how to make libxslt start its evaluation at a given point in the stylesheet document. That's usually more work than looking it up in the docs, although the place to start would likely be here: http://xmlsoft.org/XSLT/html/index.html http://xmlsoft.org/XSLT/html/libxslt-templates.html
You can try to dig into libxslt to find it out. I don't currently have the time to implement major new features, but if I get an outline how this should work, so that I can estimate the amount of work it takes, I may get around to do it.
In that case, input_node[0] just doesn't exist, so there's nothing to do. Actually, I wonder why that doesn't raise an IndexError...
My guess is that input_node is not an element here but a different kind of XML node. Looks like that case isn't handled in the sources (a seriously blatant omission, although I guess I just didn't expect that this can happen...) Stefan

Stefan Behnel, 10.11.2009 17:20:
Yes, that was the reason. Template matching on "/" makes the *document node* the context node instead of the root node. While this is ok from the POV of XPath/XSLT, it doesn't make sense in lxml.etree. lxml 2.2.5 will no longer crash here (fix is committed) and lxml 2.3 will support other types of context nodes. Stefan

Maybe I could join
I've digged in. After some time of figuring out and reading libxslt code (the only way to understand what's really happening, because libxslt's documentation is ugly) I've ended up with solution. The patch is placed below and it's rather simple. I'm almost sure that it needs a few of your fixes just because you know much-much better how to manage elements, memory and so on. But it does things I really needed. I worked with source code of lxml 2.2.4. --------------- PATCH STARTS HERE --------------- diff -r c8813376f20b -r 0a195f4f7df2 xslt.pxd --- a/xslt.pxd Tue Dec 29 19:03:23 2009 +0300 +++ b/xslt.pxd Tue Dec 29 19:25:19 2009 +0300 @@ -30,10 +30,13 @@ xmlNode* node xmlDoc* output xmlNode* insert + xmlNode* inst xsltTransformState state ctypedef struct xsltStackElem + ctypedef struct xsltTemplate + cdef xsltStylesheet* xsltParseStylesheetDoc(xmlDoc* doc) nogil cdef void xsltFreeStylesheet(xsltStylesheet* sheet) nogil @@ -84,6 +87,10 @@ cdef xsltTransformContext* xsltNewTransformContext(xsltStylesheet* style, xmlDoc* doc) nogil cdef void xsltFreeTransformContext(xsltTransformContext* context) nogil + cdef void xsltApplyOneTemplate(xsltTransformContext* ctxt, + xmlNode* contextNode, xmlNode* list, + xsltTemplate* templ, + xsltStackElem* params) nogil cdef extern from "libxslt/xsltutils.h": cdef int xsltSaveResultToString(char** doc_txt_ptr, diff -r c8813376f20b -r 0a195f4f7df2 xsltext.pxi --- a/xsltext.pxi Tue Dec 29 19:03:23 2009 +0300 +++ b/xsltext.pxi Tue Dec 29 19:25:19 2009 +0300 @@ -66,6 +66,30 @@ tree.xmlFreeNode(c_parent) return results + def evaluate(self, _XSLTContext context not None, _Element output_parent): + u"""evaluate(self, context, output_parent) + + Call this method to evaluate XSLT content of extension element. + + Evaluation result will be placed into output_parent element. + """ + cdef xslt.xsltTransformContext* ctxt + cdef xmlNode* c_backup + + ctxt = context._xsltCtxt + c_backup = ctxt.insert + + # I'm not sure about output_parent's type, maybe it should be some type + # of proxy. This needs better knowing man's opinion. + # And I'm using output_parent node for adding results instead of + # elements list used in apply_templates, that's easier and allows to + # use attributes added to extension element with <xsl:attribute>. + # And that's exactly the thing I need. + ctxt.insert = output_parent._c_node + xslt.xsltApplyOneTemplate(ctxt, + ctxt.node, ctxt.inst.children, NULL, NULL) + ctxt.insert = c_backup + cdef _registerXSLTExtensions(xslt.xsltTransformContext* c_ctxt, extension_dict): --------------- PATCH ENDS HERE --------------- So, with this patch we can handle XSLT content of extension elements (including attributes) and we can have extension elements inside extension elements. For example I can have xslt file looks like: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="testns" extension-element-prefixes="my"> <xsl:template match="/"> <foo> <my:ext> <xsl:attribute name="test">123</xsl:attribute> <child> <xsl:attribute name="test2"> <my:ext>blabla</my:ext> </xsl:attribute> </child> </my:ext> </foo> </xsl:template> </xsl:stylesheet> And execute method for my:ext looks like: def execute(self, context, self_node, input_node, output_parent): tmp = etree.Element('tmp') self.evaluate(context, tmp) output_parent.append(tmp) And the result is: <?xml version="1.0"?> <foo> <tmp test="123"> <child test2="blabla"/> </tmp> </foo> I think it's great feature. Is there any chance this thing will be included in nearest release? Thanks. -- Marat

Hi, Marat Dakota, 29.12.2009 18:13:
Thanks a lot, it's looks reasonable at first glance and I'll take a closer look as soon as I get to it. If it works well, it should make it into 2.3. Could you add a couple of tests to src/lxml/tests/test_xslt.py? That would help in making sure that it keeps working as expected even if I find that I need to rework the patch. Also, it's best to send patches as a readable attachment rather than inline. Mail programs tend to reformat text and it's easy to loose empty trailing lines etc. Thanks for pulling this out! Stefan

Thanks a lot, it's looks reasonable at first glance and I'll take a closer look as soon as I get to it. If it works well, it should make it into 2.3.
Is there a roadmap date for 2.3 release?
I've added tests, I've also renamed variables to fit your code better and added possibility to evaluate extension element's content directly to _AppendOnlyElementProxy as well as to _Element. It looks like I'm satisfied with the code now. I wonder what will you say about it.
The patch is attached. Can't wait to see it in trunk :)
Thanks for pulling this out!
And thank you for making very nice and useful thing! -- Marat

Hi, I wonder if you've noticed my last letter with patch and questions... -- Marat

Marat Dakota, 17.01.2010 21:12:
I wonder if you've noticed my last letter with patch and questions...
Sorry! Yes, I noticed it, but didn't have the time to reply at the time. I haven't looked at it yet, but I definitely will. As I said, the last one looked good already, so I'll see that I get it applied as soon as I get to it. Thanks! Stefan

Hi, Stefan Behnel, 18.01.2010 08:21:
I have committed an extended version of the patch to the trunk. Please review the new API to see if it works for you. https://codespeak.net/viewvc/?view=rev&revision=70799 Stefan

Marat Dakota, 12.01.2010 14:05:
Not yet, no.
Hmm, and did you *run* the tests? The test code actually contains obvious errors (such as non well-formed XML), so I wonder how you tested it at all. After fixing the tests, they even crash on my machine. So, sorry, but this patch isn't in an acceptable state. Could you please open up a ticket on launchpad for this? That would make it easier to track the progress of this patch. Stefan

Marat Dakota, 08.11.2009 23:28:
It's funny that you found the example above but didn't make it to the text sections right below it. http://codespeak.net/lxml/extensions.html#applying-xsl-templates Stefan

Than it looks like I don't completely understand how it works. Could you please help me: XSLT file is: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="testns" extension-element-prefixes="my"> <xsl:template match="/"> <foo> <my:ext> <xsl:attribute name="test"><xsl:value-of select="111" /></xsl:attribute> blabla </my:ext> </foo> </xsl:template> </xsl:stylesheet> Function is: def execute(self, context, self_node, input_node, output_parent): ????? print etree.tostring(deepcopy(self_node)) What should I put instead of ????? to make self_node having attribute named test with "111" as value? I tried to call self.apply_templates for everything (for self_node, for self_node[0], even for input_node and output_parent). No result in either case, print result is the same: <my:ext xmlns:my="testns" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:attribute name="test"><xsl:value-of select="111"/></xsl:attribute> blabla </my:ext> What am I doing wrong? Thanks. -- Marat On Mon, Nov 9, 2009 at 10:44 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

Hi, please don't top-post. Marat Dakota, 09.11.2009 10:13:
Ah, yes, attributes. Attributes are mapped to smart strings when passed through Python code.
Remember that self_node is the extension element itself. It will not change during the evaluation, so printing it is uninteresting.
I would guess that you want to call it on "input_node". Calling .apply_templates() should return a string (although I never tested that), which you can then add as a new attribute to the tree. You have to do that manually, though. I would expect that "is_attribute" flag of the smart string to be True in your case (see the XPath docs), that would allow you to distinguish attributes from plain text context. Stefan

Thanks for reply. It looks like it's not working as expected. class MyExtElement(etree.XSLTExtension): def execute(self, context, self_node, input_node, output_parent): results = self.apply_templates(context, input_node) print results This code causes: Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in 'lxml.etree._callExtensionElement' ignored [<Element foo at 1007b0998>] [<Element foo at 1007b0940>] [<Element foo at 1007b0a48>] [<Element foo at 1007b0aa0>] [<Element foo at 1007b0af8>] ... and so on for my example. When I try input_node[0] instead of input_node: results = self.apply_templates(context, input_node[0]) print results results is empty list. By the way, this simple code causes segmentation fault on my OSX, Linux and Windows machines. class MyExtElement(etree.XSLTExtension): def execute(self, context, self_node, input_node, output_parent): print input_node -- Marat On Tue, Nov 10, 2009 at 11:26 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

Hi, please, don't top-post. Marat Dakota, 10.11.2009 10:38:
Ok, that's a bug. It shouldn't swallow that exception. It's because it's trying to call Python functions in the exception handler, which fails if the recursion limit is already reached. I'll see if I can make this more robust. But, yes, the above will necessarily lead to infinite recursion. Sorry, didn't think of that. The feature you want is not "apply_templates", as that just mimics the behaviour of xsl:apply-templates in XSLT, i.e. you can't define which template will be applied or which XSL tags will run. That feature is currently not available (apart from creating a new stylesheet from the content of the extension element and applying that to input_node, but that's a clumsy and also incomplete solution).
You didn't show your input document, so I don't know what "input_node" or "input_node[0]" actually are in your case.
No idea why, I'll have to look into that. Stefan

On Tue, Nov 10, 2009 at 6:22 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi,
please, don't top-post.
Sorry, I didn't know what 'top-post' mean until you've asked me twice not to do it and I checked wikipedia.
That's sad. Do you think it's hard to implement this feature, are there libxml or libxslt limits that will not allow to do that? Maybe I could join in and dig a bit? Just don't know where to start. I hope it's not too complicated and it's possible - I have my problem's elegant solution, but it needs this feature.
My input document is just: <dummy />
My colleague traced it a bit. He was just very curious if it's python's segfault (which he never met and that's why he's curious) or lxml's one. He said lxml dies when trying to read property tag of input_node. -- Marat

Marat Dakota, 10.11.2009 16:58:
All it takes is figuring out how to make libxslt start its evaluation at a given point in the stylesheet document. That's usually more work than looking it up in the docs, although the place to start would likely be here: http://xmlsoft.org/XSLT/html/index.html http://xmlsoft.org/XSLT/html/libxslt-templates.html
You can try to dig into libxslt to find it out. I don't currently have the time to implement major new features, but if I get an outline how this should work, so that I can estimate the amount of work it takes, I may get around to do it.
In that case, input_node[0] just doesn't exist, so there's nothing to do. Actually, I wonder why that doesn't raise an IndexError...
My guess is that input_node is not an element here but a different kind of XML node. Looks like that case isn't handled in the sources (a seriously blatant omission, although I guess I just didn't expect that this can happen...) Stefan

Stefan Behnel, 10.11.2009 17:20:
Yes, that was the reason. Template matching on "/" makes the *document node* the context node instead of the root node. While this is ok from the POV of XPath/XSLT, it doesn't make sense in lxml.etree. lxml 2.2.5 will no longer crash here (fix is committed) and lxml 2.3 will support other types of context nodes. Stefan

Maybe I could join
I've digged in. After some time of figuring out and reading libxslt code (the only way to understand what's really happening, because libxslt's documentation is ugly) I've ended up with solution. The patch is placed below and it's rather simple. I'm almost sure that it needs a few of your fixes just because you know much-much better how to manage elements, memory and so on. But it does things I really needed. I worked with source code of lxml 2.2.4. --------------- PATCH STARTS HERE --------------- diff -r c8813376f20b -r 0a195f4f7df2 xslt.pxd --- a/xslt.pxd Tue Dec 29 19:03:23 2009 +0300 +++ b/xslt.pxd Tue Dec 29 19:25:19 2009 +0300 @@ -30,10 +30,13 @@ xmlNode* node xmlDoc* output xmlNode* insert + xmlNode* inst xsltTransformState state ctypedef struct xsltStackElem + ctypedef struct xsltTemplate + cdef xsltStylesheet* xsltParseStylesheetDoc(xmlDoc* doc) nogil cdef void xsltFreeStylesheet(xsltStylesheet* sheet) nogil @@ -84,6 +87,10 @@ cdef xsltTransformContext* xsltNewTransformContext(xsltStylesheet* style, xmlDoc* doc) nogil cdef void xsltFreeTransformContext(xsltTransformContext* context) nogil + cdef void xsltApplyOneTemplate(xsltTransformContext* ctxt, + xmlNode* contextNode, xmlNode* list, + xsltTemplate* templ, + xsltStackElem* params) nogil cdef extern from "libxslt/xsltutils.h": cdef int xsltSaveResultToString(char** doc_txt_ptr, diff -r c8813376f20b -r 0a195f4f7df2 xsltext.pxi --- a/xsltext.pxi Tue Dec 29 19:03:23 2009 +0300 +++ b/xsltext.pxi Tue Dec 29 19:25:19 2009 +0300 @@ -66,6 +66,30 @@ tree.xmlFreeNode(c_parent) return results + def evaluate(self, _XSLTContext context not None, _Element output_parent): + u"""evaluate(self, context, output_parent) + + Call this method to evaluate XSLT content of extension element. + + Evaluation result will be placed into output_parent element. + """ + cdef xslt.xsltTransformContext* ctxt + cdef xmlNode* c_backup + + ctxt = context._xsltCtxt + c_backup = ctxt.insert + + # I'm not sure about output_parent's type, maybe it should be some type + # of proxy. This needs better knowing man's opinion. + # And I'm using output_parent node for adding results instead of + # elements list used in apply_templates, that's easier and allows to + # use attributes added to extension element with <xsl:attribute>. + # And that's exactly the thing I need. + ctxt.insert = output_parent._c_node + xslt.xsltApplyOneTemplate(ctxt, + ctxt.node, ctxt.inst.children, NULL, NULL) + ctxt.insert = c_backup + cdef _registerXSLTExtensions(xslt.xsltTransformContext* c_ctxt, extension_dict): --------------- PATCH ENDS HERE --------------- So, with this patch we can handle XSLT content of extension elements (including attributes) and we can have extension elements inside extension elements. For example I can have xslt file looks like: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="testns" extension-element-prefixes="my"> <xsl:template match="/"> <foo> <my:ext> <xsl:attribute name="test">123</xsl:attribute> <child> <xsl:attribute name="test2"> <my:ext>blabla</my:ext> </xsl:attribute> </child> </my:ext> </foo> </xsl:template> </xsl:stylesheet> And execute method for my:ext looks like: def execute(self, context, self_node, input_node, output_parent): tmp = etree.Element('tmp') self.evaluate(context, tmp) output_parent.append(tmp) And the result is: <?xml version="1.0"?> <foo> <tmp test="123"> <child test2="blabla"/> </tmp> </foo> I think it's great feature. Is there any chance this thing will be included in nearest release? Thanks. -- Marat

Hi, Marat Dakota, 29.12.2009 18:13:
Thanks a lot, it's looks reasonable at first glance and I'll take a closer look as soon as I get to it. If it works well, it should make it into 2.3. Could you add a couple of tests to src/lxml/tests/test_xslt.py? That would help in making sure that it keeps working as expected even if I find that I need to rework the patch. Also, it's best to send patches as a readable attachment rather than inline. Mail programs tend to reformat text and it's easy to loose empty trailing lines etc. Thanks for pulling this out! Stefan

Thanks a lot, it's looks reasonable at first glance and I'll take a closer look as soon as I get to it. If it works well, it should make it into 2.3.
Is there a roadmap date for 2.3 release?
I've added tests, I've also renamed variables to fit your code better and added possibility to evaluate extension element's content directly to _AppendOnlyElementProxy as well as to _Element. It looks like I'm satisfied with the code now. I wonder what will you say about it.
The patch is attached. Can't wait to see it in trunk :)
Thanks for pulling this out!
And thank you for making very nice and useful thing! -- Marat

Hi, I wonder if you've noticed my last letter with patch and questions... -- Marat

Marat Dakota, 17.01.2010 21:12:
I wonder if you've noticed my last letter with patch and questions...
Sorry! Yes, I noticed it, but didn't have the time to reply at the time. I haven't looked at it yet, but I definitely will. As I said, the last one looked good already, so I'll see that I get it applied as soon as I get to it. Thanks! Stefan

Hi, Stefan Behnel, 18.01.2010 08:21:
I have committed an extended version of the patch to the trunk. Please review the new API to see if it works for you. https://codespeak.net/viewvc/?view=rev&revision=70799 Stefan

Marat Dakota, 12.01.2010 14:05:
Not yet, no.
Hmm, and did you *run* the tests? The test code actually contains obvious errors (such as non well-formed XML), so I wonder how you tested it at all. After fixing the tests, they even crash on my machine. So, sorry, but this patch isn't in an acceptable state. Could you please open up a ticket on launchpad for this? That would make it easier to track the progress of this patch. Stefan
participants (2)
-
Marat Dakota
-
Stefan Behnel