
Hi Martijn, Martijn Faassen wrote:
Stefan Behnel wrote:
I noticed that exslt:regexp was not supported by libexslt, so I wrote three extension functions that use Python's re module (which is not really JavaScript compatible as requested by the spec, but who cares...).
I think one might care if one had a stylesheet that uses exslt and then have it not work with lxml because the regex behavior is different?
The API is identical, it just depends on what sort of expressions you use. The normal ().*+ stuff should be the same, also \w and the like. But you'll never find two RE implementations that are completely compatible. So, well, you'll just have to take care if you want to write portable stylesheets. Note that many processors do not even support REs at all and different processors base their support on different libraries (JavaScript or Apache or whatever).
Here's an example:
----------------------------------------
xslt = etree.XSLT(etree.XML("""\ <xsl:stylesheet version="1.0" xmlns:regexp="http://exslt.org/regular-expressions" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="*"> <test><xsl:copy-of select="*[regexp:test(string(.), '8.')]"/></test> </xsl:template> </xsl:stylesheet> """))
result = xslt(etree.XML('<a><b>123</b><b>098</b><b>987</b></a>')) print str(result) <test><b>987</b></test>
Since the test cases worked out perfectly, it's already in the trunk. So, when the regular exslt support gets merged, lxml will have more complete exslt support than libxslt itself. :)
Cool. :)
One thing that I wonder about is potential security issues? Are there ways to break out of the Python regexs and call arbitrary python code? If not, then we don't need to worry about it. XSLT can be run from fairly unsafe sources so this may be a concern.
I wouldn't know why there should be any risks. The regexps are just handed to the re.compile function as is and there shouldn't be any way to break out of the (s)re module. There are no calls to "eval" or anything like it. The EXSLT extensions shouldn't do any harm either. On the other hand, registering the libxslt "extra" extension functions may be a risk. There is a "debug" element that becomes accessible and the "output" and "write" elements that can write(!) to files. So, maybe we should require some initialization function call to add those extras. I'll just remove the "extra" registration for now. Also, remember that the document() function can be used to access local XML files. That may already be a risk in some cases. Stefan