Hi Dmitri, Stefan Behnel wrote:
The way XSLT is implemented in lxml is a bit tricky, as libxslt makes some things hard to control that lxml uses in libxml2 for performance reasons. In particular, lxml uses a thread-local hash table for constant strings, which is much faster than a malloc() for each string that occurs in a document. However, libxslt doesn't honour this dictionary and creates its own one based on the stylesheet dictionary. The result is that the stylesheet can leak into the result document through string references that now point into the hash table of the stylesheet.
There isn't a way in libxslt that would allow us to prevent this or to control the allocation. That's why I decided to restrict the execution of XSL transformations to threads that inherit the same hash table as the stylesheet, this should normally prevent any problems.
Here is a trivial patch (the one against xslt.pxi) that, instead of raising an exception, copies the stylesheet into the current thread context, and thus works around the current thread restrictions. It seems to work for me, any chance you could give it a try? In case it doesn't work reliably, could you additionally check the second change (in parser.pxi)? It should restrict 'acceptable' hash tables to the local thread, not including the main thread (as it did before). Stefan === src/lxml/xslt.pxi ================================================================== --- src/lxml/xslt.pxi (revision 3205) +++ src/lxml/xslt.pxi (local) @@ -373,7 +373,7 @@ cdef xmlDoc* c_doc if not _checkThreadDict(self._c_style.doc.dict): - raise RuntimeError, "stylesheet is not usable in this thread" + return self.__copy__()(_input, profile_run=profile_run, **_kw) input_doc = _documentOrRaise(_input) root_node = _rootNodeOrRaise(_input) === src/lxml/parser.pxi ================================================================== --- src/lxml/parser.pxi (revision 3205) +++ src/lxml/parser.pxi (local) @@ -132,8 +132,8 @@ """Check that c_dict is either the local thread dictionary or the global parent dictionary. """ - if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict: - return 1 # main thread + #if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict: + # return 1 # main thread if __GLOBAL_PARSER_CONTEXT._getThreadDict(NULL) is c_dict: return 1 # local thread dict return 0