Martijn Faassen wrote:
Stefan Behnel wrote: [snip]
For comparison, I now implemented the examples from the page as unit tests, which sadly showed that Python's regexps are incompatible with what EXSLT requires. The Python RE "([a-z])+ " does not match "test " as in EXSLT, only the last "t" is returned for the group by re.findall(). So we can't claim compatibility with EXSLT at this point. -- Note, though, that I never really said it was compatible, it just builds on Python's re module. I still think that's enough for a Python XML library.
If it's not compatible, I think it should be invoked differently than in the EXSLT way. This way someone dropping in an EXSLT stylesheet with regexes doesn't have a half-working stylesheet but a completely and clearly failing stylesheet: lxml doesn't support the regexes. In addition, the path forward to getting the stylesheet working is clear: use the Python-based and deliberately incompatible regex facility instead, and rewrite the regexes.
Hmmm, I feel invited to disagree here. I reread the EXSLT spec on this topic and it does not contain any RE syntax specification and is rather unclear about what is required for compliance. It says this in the introduction of the RE module:
while in the description of the functions, it mainly uses this wording:
So, the way I read it, the "currently" does not seem to indicate a clear obligation to obey the actual RE syntax used in the spec. Especially the "ease of implementation" calls for a Python 're' implementation in lxml. :)