[lxml-dev] xslt transformators
Hi, Just an interesting thing to think of: I think that it would be pythonic to treat xslt transformations just as other functions, for example:
tr1 = lxml.etree.xslt_fromfile('tr1.xsl') tr2 = lxml.etree.xslt_fromstring('''<?xml ... ... <!-- transformation here --> ... ''') data = lxml.etree.parse('data.xml') # maybe .fromfile() ? data1 = tr1(data) data2 = tr2(data) data12 = tr2(tr1(data))
What do you think?
Andrey Tatarinov wrote:
Just an interesting thing to think of:
I think that it would be pythonic to treat xslt transformations just as other functions, for example:
tr1 = lxml.etree.xslt_fromfile('tr1.xsl') tr2 = lxml.etree.xslt_fromstring('''<?xml ... ... <!-- transformation here --> ... ''') data = lxml.etree.parse('data.xml') # maybe .fromfile() ? data1 = tr1(data) data2 = tr2(data) data12 = tr2(tr1(data))
What do you think?
Looks nice. And it would simply require to make XSLT objects callable. Still, there are two functions in XSLT: apply and tostring, so maybe the right API would be something like class XSLT: def __init__(): "parse and prepare" def __call__(to_what): " apply it" def __str__(): "get the result" tr55 = XSLT(some_xslt_tree) xml = str(tr1(tr55(some_xml_tree))) Stefan
Stefan Behnel wrote:
Andrey Tatarinov wrote:
Just an interesting thing to think of:
I think that it would be pythonic to treat xslt transformations just as other functions, for example:
tr1 = lxml.etree.xslt_fromfile('tr1.xsl') tr2 = lxml.etree.xslt_fromstring('''<?xml ...
... <!-- transformation here --> ... ''')
data = lxml.etree.parse('data.xml') # maybe .fromfile() ? data1 = tr1(data) data2 = tr2(data) data12 = tr2(tr1(data))
What do you think?
Looks nice. And it would simply require to make XSLT objects callable. Still, there are two functions in XSLT: apply and tostring, so maybe the right API would be something like
class XSLT: def __init__(): "parse and prepare" def __call__(to_what): " apply it" def __str__(): "get the result"
tr55 = XSLT(some_xslt_tree) xml = str(tr1(tr55(some_xml_tree)))
I'm interested. I think the right next step would be to write a document/doctest which demonstrates this usage pattern in a few cases. Then if we're all happy, let's implement it. :) Andrey, are you up to writing such a short document? Regards, Martijn
On Mon, 2005-11-14 at 16:44 +0100, Martijn Faassen wrote:
Stefan Behnel wrote:
Andrey Tatarinov wrote:
Just an interesting thing to think of:
I think that it would be pythonic to treat xslt transformations just as other functions, for example:
tr1 = lxml.etree.xslt_fromfile('tr1.xsl') tr2 = lxml.etree.xslt_fromstring('''<?xml ...
... <!-- transformation here --> ... ''')
data = lxml.etree.parse('data.xml') # maybe .fromfile() ? data1 = tr1(data) data2 = tr2(data) data12 = tr2(tr1(data))
What do you think?
Looks nice. And it would simply require to make XSLT objects callable. Still, there are two functions in XSLT: apply and tostring, so maybe the right API would be something like
class XSLT: def __init__(): "parse and prepare" def __call__(to_what): " apply it" def __str__(): "get the result"
tr55 = XSLT(some_xslt_tree) xml = str(tr1(tr55(some_xml_tree)))
I'm interested. I think the right next step would be to write a document/doctest which demonstrates this usage pattern in a few cases. Then if we're all happy, let's implement it. :) Andrey, are you up to writing such a short document?
Yep, I can do it. I'll send it, say, tomorrow.
Andrey Tatarinov wrote: [snip]
I'm interested. I think the right next step would be to write a document/doctest which demonstrates this usage pattern in a few cases. Then if we're all happy, let's implement it. :) Andrey, are you up to writing such a short document?
Yep, I can do it. I'll send it, say, tomorrow.
Thanks! Regards, Martijn
A few more notes on this.
Andrey Tatarinov wrote:
I think that it would be pythonic to treat xslt transformations just as other functions, for example:
.>>> tr1 = lxml.etree.xslt_fromfile('tr1.xsl')
I prefer the general feature of passing file names to XSLT, RNG and the others. See my mail about helper functions for that.
.>>> tr2 = lxml.etree.xslt_fromstring('''<?xml ... ... <!-- transformation here --> ... ''')
A classmethod on XSLT may be more sensible here. .>>> tr0 = XSLT(tree) .>>> tr1 = XSLT('xslt.file') .>>> tr2 = XSLT.fromstring('''<?xml...''') It's not symmetric that way, but I think it resembles common idioms. Obviously, all other API classes that accept element trees should have the same interface (where it makes sense).
.>>> data = lxml.etree.parse('data.xml') # maybe .fromfile() ?
fromfile() would then be symmetric, but not ELementTree ...
.>>> data1 = tr1(data) .>>> data2 = tr2(data) .>>> data12 = tr2(tr1(data))
Stefan Behnel wrote:
Looks nice. And it would simply require to make XSLT objects callable. Still, there are two functions in XSLT: apply and tostring, so maybe the right API would be something like
class XSLT: def __init__(): "parse and prepare" def __call__(to_what): " apply it"
I replied to fast here, this actually doesn't work:
def __str__(): "get the result" tr55 = XSLT(some_xslt_tree) xml = str(tr1(tr55(some_xml_tree)))
str() is called on the result, not on the XSLT object here. A better way of doing this would be to return a subclass of _ElementTree (_XSLTResultTree?) that would then have a __str__ method. We would just have to splitting up the ElementTree factory. I think that would be the right thing to do, since it would also prevent calling XSLT.tostring() with a non XSLT-generated _ElementTree object. Actually, if we go for __str__, we should even depricate XSLT.tostring(), as it is much more pythonic to just call str() on the result. BTW: What about putting another __str__ into _ElementTree and having it return the unindented UTF-8 serialization of the tree? Stefan
Hi there, Stefan Behnel wrote: [snip]
str() is called on the result, not on the XSLT object here.
A better way of doing this would be to return a subclass of _ElementTree (_XSLTResultTree?) that would then have a __str__ method. We would just have to splitting up the ElementTree factory.
I think that would be the right thing to do, since it would also prevent calling XSLT.tostring() with a non XSLT-generated _ElementTree object. Actually, if we go for __str__, we should even depricate XSLT.tostring(), as it is much more pythonic to just call str() on the result.
BTW: What about putting another __str__ into _ElementTree and having it return the unindented UTF-8 serialization of the tree?
Basically the proposal would be: * the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring(). This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper? Regards, Martijn
Martijn Faassen wrote:
* the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring().
This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper?
Here is a patch (against the trunk) that implements the XSLT part. _ElementTree's __str__() is more in the area of Geert's work. I only ran the default test suite on it, so no new functionality tested. @Andrey: you can apply it to run your tests with it. Stefan Index: src/lxml/etree.pyx =================================================================== --- src/lxml/etree.pyx (Revision 19888) +++ src/lxml/etree.pyx (Arbeitskopie) @@ -294,8 +294,11 @@ tree.xmlFree(data) cdef _ElementTree _elementTreeFactory(xmlDoc* c_doc): + return _newElementTree(c_doc, _ElementTree) + +cdef _ElementTree _newElementTree(xmlDoc* c_doc, object baseclass): cdef _ElementTree result - result = _ElementTree() + result = baseclass() result._ns_counter = 0 result._c_doc = c_doc return result Index: src/lxml/xslt.pxi =================================================================== --- src/lxml/xslt.pxi (Revision 19888) +++ src/lxml/xslt.pxi (Arbeitskopie) @@ -56,7 +56,7 @@ # this cleans up copy of doc as well xslt.xsltFreeStylesheet(self._c_style) - def apply(self, _ElementTree doc, **kw): + def __call__(self, _ElementTree doc, **kw): cdef xmlDoc* c_result cdef char** params cdef int i @@ -88,26 +88,39 @@ cstd.free(params) if c_result is NULL: raise XSLTApplyError, "Error applying stylesheet" - # XXX should set special flag to indicate this is XSLT result - # so that xsltSaveResultTo* functional can be used during - # serialize? - return _elementTreeFactory(c_result) + return _xsltResultTreeFactory(c_result, self) - def tostring(self, _ElementTree doc): + def apply(self, _ElementTree doc, **kw): + return self(doc, **kw) + + def tostring(self, _ElementTree result_tree): """Save result doc to string using stylesheet as guidance. """ + return str(result_tree) + +cdef class _XSLTResultTree(_ElementTree): + cdef XSLT _xslt + def __str__(self): cdef char* s cdef int l cdef int r - r = xslt.xsltSaveResultToString(&s, &l, doc._c_doc, self._c_style) + r = xslt.xsltSaveResultToString(&s, &l, self._c_doc, + self._xslt._c_style) if r == -1: - raise XSLTSaveError, "Error saving stylesheet result to string" + raise XSLTSaveError, "Error saving XSLT result to string" if s is NULL: return '' result = funicode(s) tree.xmlFree(s) return result +cdef _xsltResultTreeFactory(xmlDoc* c_doc, XSLT xslt): + cdef _XSLTResultTree result + result = <_XSLTResultTree>_newElementTree(c_doc, _XSLTResultTree) + result._xslt = xslt + return result + + ################################################################################ # XPath
Stefan Behnel wrote:
Martijn Faassen wrote:
* the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring().
This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper?
Here is a patch (against the trunk) that implements the XSLT part. _ElementTree's __str__() is more in the area of Geert's work.
Cool! I'd like to review them together and then submit __str__() for the baseclass as well. I want to give Fredrik a few days to respond too. Regards, Martijn P.S. I'm going to be off-list for a few days as I'm on a business trip.
Stefan Behnel wrote: [snip]
str() is called on the result, not on the XSLT object here.
A better way of doing this would be to return a subclass of _ElementTree (_XSLTResultTree?) that would then have a __str__ method. We would just have to splitting up the ElementTree factory.
I think that would be the right thing to do, since it would also prevent calling XSLT.tostring() with a non XSLT-generated _ElementTree object. Actually, if we go for __str__, we should even depricate XSLT.tostring(), as it is much more pythonic to just call str() on the result.
BTW: What about putting another __str__ into _ElementTree and having it return the unindented UTF-8 serialization of the tree?
Basically the proposal would be: * the ElementTree API gains a __str__() * we have a special subclass of ElementTree for the result returned by XSLT transformation, which implements its own __str__() that does the XSLT.tostring(). This sounds interesting. I wonder whether the first point, adding a __str__(), would also make sense for ElementTree proper? Regards, Martijn
participants (3)
-
Andrey Tatarinov
-
Martijn Faassen
-
Stefan Behnel