[lxml-dev] xslt transformators

Hi, Just an interesting thing to think of: I think that it would be pythonic to treat xslt transformations just as other functions, for example:
What do you think?

Andrey Tatarinov wrote:
Looks nice. And it would simply require to make XSLT objects callable. Still, there are two functions in XSLT: apply and tostring, so maybe the right API would be something like class XSLT: def __init__(): "parse and prepare" def __call__(to_what): " apply it" def __str__(): "get the result" tr55 = XSLT(some_xslt_tree) xml = str(tr1(tr55(some_xml_tree))) Stefan

A few more notes on this.
I prefer the general feature of passing file names to XSLT, RNG and the others. See my mail about helper functions for that.
A classmethod on XSLT may be more sensible here. .>>> tr0 = XSLT(tree) .>>> tr1 = XSLT('xslt.file') .>>> tr2 = XSLT.fromstring('''<?xml...''') It's not symmetric that way, but I think it resembles common idioms. Obviously, all other API classes that accept element trees should have the same interface (where it makes sense).
.>>> data = lxml.etree.parse('data.xml') # maybe .fromfile() ?
fromfile() would then be symmetric, but not ELementTree ...
Stefan Behnel wrote:
I replied to fast here, this actually doesn't work:
str() is called on the result, not on the XSLT object here. A better way of doing this would be to return a subclass of _ElementTree (_XSLTResultTree?) that would then have a __str__ method. We would just have to splitting up the ElementTree factory. I think that would be the right thing to do, since it would also prevent calling XSLT.tostring() with a non XSLT-generated _ElementTree object. Actually, if we go for __str__, we should even depricate XSLT.tostring(), as it is much more pythonic to just call str() on the result. BTW: What about putting another __str__ into _ElementTree and having it return the unindented UTF-8 serialization of the tree? Stefan

Hi there, Stefan Behnel wrote: [snip]
Basically the proposal would be: * the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring(). This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper? Regards, Martijn

Martijn Faassen wrote:
Here is a patch (against the trunk) that implements the XSLT part. _ElementTree's __str__() is more in the area of Geert's work. I only ran the default test suite on it, so no new functionality tested. @Andrey: you can apply it to run your tests with it. Stefan Index: src/lxml/etree.pyx =================================================================== --- src/lxml/etree.pyx (Revision 19888) +++ src/lxml/etree.pyx (Arbeitskopie) @@ -294,8 +294,11 @@ tree.xmlFree(data) cdef _ElementTree _elementTreeFactory(xmlDoc* c_doc): + return _newElementTree(c_doc, _ElementTree) + +cdef _ElementTree _newElementTree(xmlDoc* c_doc, object baseclass): cdef _ElementTree result - result = _ElementTree() + result = baseclass() result._ns_counter = 0 result._c_doc = c_doc return result Index: src/lxml/xslt.pxi =================================================================== --- src/lxml/xslt.pxi (Revision 19888) +++ src/lxml/xslt.pxi (Arbeitskopie) @@ -56,7 +56,7 @@ # this cleans up copy of doc as well xslt.xsltFreeStylesheet(self._c_style) - def apply(self, _ElementTree doc, **kw): + def __call__(self, _ElementTree doc, **kw): cdef xmlDoc* c_result cdef char** params cdef int i @@ -88,26 +88,39 @@ cstd.free(params) if c_result is NULL: raise XSLTApplyError, "Error applying stylesheet" - # XXX should set special flag to indicate this is XSLT result - # so that xsltSaveResultTo* functional can be used during - # serialize? - return _elementTreeFactory(c_result) + return _xsltResultTreeFactory(c_result, self) - def tostring(self, _ElementTree doc): + def apply(self, _ElementTree doc, **kw): + return self(doc, **kw) + + def tostring(self, _ElementTree result_tree): """Save result doc to string using stylesheet as guidance. """ + return str(result_tree) + +cdef class _XSLTResultTree(_ElementTree): + cdef XSLT _xslt + def __str__(self): cdef char* s cdef int l cdef int r - r = xslt.xsltSaveResultToString(&s, &l, doc._c_doc, self._c_style) + r = xslt.xsltSaveResultToString(&s, &l, self._c_doc, + self._xslt._c_style) if r == -1: - raise XSLTSaveError, "Error saving stylesheet result to string" + raise XSLTSaveError, "Error saving XSLT result to string" if s is NULL: return '' result = funicode(s) tree.xmlFree(s) return result +cdef _xsltResultTreeFactory(xmlDoc* c_doc, XSLT xslt): + cdef _XSLTResultTree result + result = <_XSLTResultTree>_newElementTree(c_doc, _XSLTResultTree) + result._xslt = xslt + return result + + ################################################################################ # XPath

Stefan Behnel wrote: [snip]
Basically the proposal would be: * the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring(). This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper? Regards, Martijn

Andrey Tatarinov wrote:
Looks nice. And it would simply require to make XSLT objects callable. Still, there are two functions in XSLT: apply and tostring, so maybe the right API would be something like class XSLT: def __init__(): "parse and prepare" def __call__(to_what): " apply it" def __str__(): "get the result" tr55 = XSLT(some_xslt_tree) xml = str(tr1(tr55(some_xml_tree))) Stefan

A few more notes on this.
I prefer the general feature of passing file names to XSLT, RNG and the others. See my mail about helper functions for that.
A classmethod on XSLT may be more sensible here. .>>> tr0 = XSLT(tree) .>>> tr1 = XSLT('xslt.file') .>>> tr2 = XSLT.fromstring('''<?xml...''') It's not symmetric that way, but I think it resembles common idioms. Obviously, all other API classes that accept element trees should have the same interface (where it makes sense).
.>>> data = lxml.etree.parse('data.xml') # maybe .fromfile() ?
fromfile() would then be symmetric, but not ELementTree ...
Stefan Behnel wrote:
I replied to fast here, this actually doesn't work:
str() is called on the result, not on the XSLT object here. A better way of doing this would be to return a subclass of _ElementTree (_XSLTResultTree?) that would then have a __str__ method. We would just have to splitting up the ElementTree factory. I think that would be the right thing to do, since it would also prevent calling XSLT.tostring() with a non XSLT-generated _ElementTree object. Actually, if we go for __str__, we should even depricate XSLT.tostring(), as it is much more pythonic to just call str() on the result. BTW: What about putting another __str__ into _ElementTree and having it return the unindented UTF-8 serialization of the tree? Stefan

Hi there, Stefan Behnel wrote: [snip]
Basically the proposal would be: * the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring(). This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper? Regards, Martijn

Martijn Faassen wrote:
Here is a patch (against the trunk) that implements the XSLT part. _ElementTree's __str__() is more in the area of Geert's work. I only ran the default test suite on it, so no new functionality tested. @Andrey: you can apply it to run your tests with it. Stefan Index: src/lxml/etree.pyx =================================================================== --- src/lxml/etree.pyx (Revision 19888) +++ src/lxml/etree.pyx (Arbeitskopie) @@ -294,8 +294,11 @@ tree.xmlFree(data) cdef _ElementTree _elementTreeFactory(xmlDoc* c_doc): + return _newElementTree(c_doc, _ElementTree) + +cdef _ElementTree _newElementTree(xmlDoc* c_doc, object baseclass): cdef _ElementTree result - result = _ElementTree() + result = baseclass() result._ns_counter = 0 result._c_doc = c_doc return result Index: src/lxml/xslt.pxi =================================================================== --- src/lxml/xslt.pxi (Revision 19888) +++ src/lxml/xslt.pxi (Arbeitskopie) @@ -56,7 +56,7 @@ # this cleans up copy of doc as well xslt.xsltFreeStylesheet(self._c_style) - def apply(self, _ElementTree doc, **kw): + def __call__(self, _ElementTree doc, **kw): cdef xmlDoc* c_result cdef char** params cdef int i @@ -88,26 +88,39 @@ cstd.free(params) if c_result is NULL: raise XSLTApplyError, "Error applying stylesheet" - # XXX should set special flag to indicate this is XSLT result - # so that xsltSaveResultTo* functional can be used during - # serialize? - return _elementTreeFactory(c_result) + return _xsltResultTreeFactory(c_result, self) - def tostring(self, _ElementTree doc): + def apply(self, _ElementTree doc, **kw): + return self(doc, **kw) + + def tostring(self, _ElementTree result_tree): """Save result doc to string using stylesheet as guidance. """ + return str(result_tree) + +cdef class _XSLTResultTree(_ElementTree): + cdef XSLT _xslt + def __str__(self): cdef char* s cdef int l cdef int r - r = xslt.xsltSaveResultToString(&s, &l, doc._c_doc, self._c_style) + r = xslt.xsltSaveResultToString(&s, &l, self._c_doc, + self._xslt._c_style) if r == -1: - raise XSLTSaveError, "Error saving stylesheet result to string" + raise XSLTSaveError, "Error saving XSLT result to string" if s is NULL: return '' result = funicode(s) tree.xmlFree(s) return result +cdef _xsltResultTreeFactory(xmlDoc* c_doc, XSLT xslt): + cdef _XSLTResultTree result + result = <_XSLTResultTree>_newElementTree(c_doc, _XSLTResultTree) + result._xslt = xslt + return result + + ################################################################################ # XPath

Stefan Behnel wrote: [snip]
Basically the proposal would be: * the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring(). This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper? Regards, Martijn
participants (3)
-
Andrey Tatarinov
-
Martijn Faassen
-
Stefan Behnel