[lxml-dev] XSLT and threading
Hi there, We've been trying to do XSLT transformations in a multi-threaded (Zope 2) situation. On 1.3.6 this won't work, but 2.0.x is supposed to have support for this. Unfortunately we're getting memory errors from within XSLT.__call__, and we think this is the problem: transform_ctxt = xslt.xsltNewTransformContext(self._c_style, c_doc) if transform_ctxt is NULL: _destroyFakeDoc(input_doc._c_doc, c_doc) python.PyErr_NoMemory() For some reason this sometimes works, sometimes fails; it doesn't always do this, so we suspect perhaps we're already in a copied XSLT sheet (by _copyXSLT) when this happens. We're also wondering about the threading strategy of 2.1.x; the copyXSLT code is removed. Is there a new strategy? I couldn't find anything about it in CHANGES.txt. I mean, I wouldn't be unhappy with a new strategy as actually re-parsing the stylesheet each time this gets called from a different thread is rather expensive (the stylesheet isn't cached as far as I can see). What's the new strategy, if any? (Unfortunately 2.1.x also gives us an error, though a different one. I don't have this error handy here.) We're trying to reduce this to a simpler test case that demonstrates the problem but we're having a hard time so far. Any hints would be welcome. Regards, Martijn
Martijn Faassen wrote:
Hi there,
We've been trying to do XSLT transformations in a multi-threaded (Zope 2) situation. On 1.3.6 this won't work, but 2.0.x is supposed to have support for this. Unfortunately we're getting memory errors from within XSLT.__call__, and we think this is the problem:
transform_ctxt = xslt.xsltNewTransformContext(self._c_style, c_doc) if transform_ctxt is NULL: _destroyFakeDoc(input_doc._c_doc, c_doc) python.PyErr_NoMemory()
For some reason this sometimes works, sometimes fails; it doesn't always do this, so we suspect perhaps we're already in a copied XSLT sheet (by _copyXSLT) when this happens.
We're also wondering about the threading strategy of 2.1.x; the copyXSLT code is removed. Is there a new strategy? I couldn't find anything about it in CHANGES.txt. I mean, I wouldn't be unhappy with a new strategy as actually re-parsing the stylesheet each time this gets called from a different thread is rather expensive (the stylesheet isn't cached as far as I can see). What's the new strategy, if any?
(Unfortunately 2.1.x also gives us an error, though a different one. I don't have this error handy here.)
We're trying to reduce this to a simpler test case that demonstrates the problem but we're having a hard time so far. Any hints would be welcome.
Important to note: We are using a filename resolver for the url of an xsl:import in the set up where these errors occur. -- - eric casteleijn http://infrae.com
Hi, Martijn Faassen wrote:
We've been trying to do XSLT transformations in a multi-threaded (Zope 2) situation. On 1.3.6 this won't work,
Yep, I should really put up a warning somewhere that threading in 1.3.x is buggy in a couple of use cases and should be used with caution. (isn't there one in the FAQ anyway?) 2.1 is much cleaner here, but the latest 2.0 versions should also work in most cases.
Unfortunately we're getting memory errors from within XSLT.__call__, and we think this is the problem:
transform_ctxt = xslt.xsltNewTransformContext(self._c_style, c_doc) if transform_ctxt is NULL: _destroyFakeDoc(input_doc._c_doc, c_doc) python.PyErr_NoMemory()
For some reason this sometimes works, sometimes fails; it doesn't always do this, so we suspect perhaps we're already in a copied XSLT sheet (by _copyXSLT) when this happens.
I think I'll need a test case to see this. Anyway, the _copyXSLT() is really just a work-around in 2.0 up to 2.0.5, which lacked a reliable way of keeping (sub-)trees within thread boundaries, where the dictionary keeping their tag names is defined. lxml 2.1 does this at the end of XSLT.__call__(): if not _checkThreadDict(c_result.dict): # fix document dictionary c_node = _findChildForwards(<xmlNode*>c_result, 0) if c_node is not NULL: __GLOBAL_PARSER_CONTEXT.initThreadDictRef(&c_result.dict) moveNodeToDocument(result_doc, self._c_style.doc, c_node) I don't remember why 2.0.6 doesn't do this. It shouldn't break anything to backport it - maybe I just left it out at the time because it hasn't been tested very well yet, due to lack of user feedback. That makes the copy work-around appear as a safer choice (especially if you are still running into problems with 2.1beta).
We're also wondering about the threading strategy of 2.1.x; the copyXSLT code is removed. Is there a new strategy?
2.1 detects it when a subtree is merged into a tree from a different thread and migrates the names stored in the thread dictionary over to the target thread. This is a bit more overhead, but you only pay it in multi-threaded environments, where you gain safety in parallel execution.
(Unfortunately 2.1.x also gives us an error, though a different one. I don't have this error handy here.)
Luckily, 2.1 is still in beta, so it would be great to sort this out soon. Stefan
Hey Stefan, Yeah, we realize you need a test case. Eric worked quite hard to try to isolate the problem, but no luck yet so far. Hopefully we'll be able to tell you more next week. The situation seems to occur when custom resolvers are involved, and it's conceivable it's not really a threading problem at all. Regards, Martijn P.S. If not, we can just personally say "thank you" at EuroPython - Eric is also going to be there. ;)
Martijn Faassen wrote:
Hey Stefan,
Yeah, we realize you need a test case. Eric worked quite hard to try to isolate the problem, but no luck yet so far. Hopefully we'll be able to tell you more next week. The situation seems to occur when custom resolvers are involved, and it's conceivable it's not really a threading problem at all.
I *finally* succeeded in building a small (semi) reliable test that shows the behavior. In the end, it has nothing to do with custom resolvers or even xsl:imports, which had me barking up quite a few wrong trees. Enclosed is a small python file, which when run, segfaults pretty reliably for me. I don't think it's doing anything special anymore except the threading. If anyone can help with this, I will be very grateful! eric
Hi, eric casteleijn wrote:
Martijn Faassen wrote: I *finally* succeeded in building a small (semi) reliable test that shows the behavior. In the end, it has nothing to do with custom resolvers or even xsl:imports, which had me barking up quite a few wrong trees. Enclosed is a small python file, which when run, segfaults pretty reliably for me.
I don't think it's doing anything special anymore except the threading. If anyone can help with this, I will be very grateful!
Thanks for the test. I get it crashing here, even under valgrind. I'll look into it. Stefan
Hi, eric casteleijn wrote:
Martijn Faassen wrote:
Hey Stefan,
Yeah, we realize you need a test case. Eric worked quite hard to try to isolate the problem, but no luck yet so far. Hopefully we'll be able to tell you more next week. The situation seems to occur when custom resolvers are involved, and it's conceivable it's not really a threading problem at all.
I *finally* succeeded in building a small (semi) reliable test that shows the behavior. In the end, it has nothing to do with custom resolvers or even xsl:imports, which had me barking up quite a few wrong trees. Enclosed is a small python file, which when run, segfaults pretty reliably for me.
I don't think it's doing anything special anymore except the threading. If anyone can help with this, I will be very grateful!
I investigated this a little more. The way lxml now handles passing trees between threads is that it migrates all tag names from the dictionary of the thread that created the tree to the dictionary of the target thread. It seems that there is more that has to be done here. Your test specifically crashes when trying to free the "xml" special name (or maybe "xmlns", not sure). Breakpointing on xmlDictLookup() showed me that libxml2 also stores (some?) namespace URIs and prefixes in the dictionary, at least during parsing. It doesn't do this when declaring namespaces later on with xmlNewNs(). How's that for consistency... So the fix is even a bit more involved than I initially thought. It may take a bit to figure out how to get this straight. Stefan
I investigated this a little more. The way lxml now handles passing trees between threads is that it migrates all tag names from the dictionary of the thread that created the tree to the dictionary of the target thread.
It seems that there is more that has to be done here. Your test specifically crashes when trying to free the "xml" special name (or maybe "xmlns", not sure). Breakpointing on xmlDictLookup() showed me that libxml2 also stores (some?) namespace URIs and prefixes in the dictionary, at least during parsing. It doesn't do this when declaring namespaces later on with xmlNewNs(). How's that for consistency...
Ugh. This would explain why the tests (often) didn't break when I took out the custom namespaces. (But they did sometimes...)
So the fix is even a bit more involved than I initially thought. It may take a bit to figure out how to get this straight.
Ok, well I'm just very glad we (for very 'you' values of 'we') are getting closer, since hunting for it by myself nearly drove me to drink. ;) Let me know if there is anything I can do to help, like test, since my C skills are probably not going to be much use without a lot of dusting off. -- - eric casteleijn http://infrae.com
Hi, eric casteleijn wrote:
well I'm just very glad we (for very 'you' values of 'we') are getting closer, since hunting for it by myself nearly drove me to drink.
Here's a simplified test script that reliably crashes for me each time I run it. The problem appears to be parsing the stylesheet in a thread and then use it in another thread. So one way to work around this (for now) is to parse all stylesheets in the main thread. I'm working on it, but libxslt is really nasty when it comes to tag dictionaries... Stefan
Hi, quick follow-up here, just for the archives: this is hopefully fixed in lxml 2.0.8 and 2.1.1. Stefan Stefan Behnel wrote:
eric casteleijn wrote:
well I'm just very glad we (for very 'you' values of 'we') are getting closer, since hunting for it by myself nearly drove me to drink.
Here's a simplified test script that reliably crashes for me each time I run it. The problem appears to be parsing the stylesheet in a thread and then use it in another thread. So one way to work around this (for now) is to parse all stylesheets in the main thread.
I'm working on it, but libxslt is really nasty when it comes to tag dictionaries...
Stefan
Stefan Behnel <stefan_ml@behnel.de> writes:
Hi,
quick follow-up here, just for the archives: this is hopefully fixed in lxml 2.0.8 and 2.1.1.
Stefan
Yes, I was meaning to confirm: all our testing seems to indicate 2.1.1 solves the problems. Thanks once again for the quick fix and release! -- - eric casteleijn http://infrae.com
Hi, Martijn Faassen wrote:
Unfortunately we're getting memory errors from within XSLT.__call__, and we think this is the problem:
transform_ctxt = xslt.xsltNewTransformContext(self._c_style, c_doc) if transform_ctxt is NULL: _destroyFakeDoc(input_doc._c_doc, c_doc) python.PyErr_NoMemory()
Does "memory errors" mean you get an exception or a crash? Maybe there are cases in libxslt where xsltNewTransformContext() can return NULL that do not involve a malloc problem and must be handled differently? If you get a crash, is there any chance you can come up with a valgrind trace? Stefan
Hi again, On Fri, Jun 27, 2008 at 8:14 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Martijn Faassen wrote:
Unfortunately we're getting memory errors from within XSLT.__call__, and we think this is the problem:
transform_ctxt = xslt.xsltNewTransformContext(self._c_style, c_doc) if transform_ctxt is NULL: _destroyFakeDoc(input_doc._c_doc, c_doc) python.PyErr_NoMemory()
Does "memory errors" mean you get an exception or a crash? Maybe there are cases in libxslt where xsltNewTransformContext() can return NULL that do not involve a malloc problem and must be handled differently?
The exception, not a crash. Yes, it looks like that returns NULL for some reason. Possibly the resolvers haven't been set up correctly in the copy and that's why it fails? I think we got a crash though with lxml2., so we can oblige you with both sets of information. :) Unfortunately I don't have a set up here that demonstrates the behavior. Regards, Martijn
Hi, Martijn Faassen wrote:
On Fri, Jun 27, 2008 at 8:14 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Martijn Faassen wrote:
Unfortunately we're getting memory errors from within XSLT.__call__, and we think this is the problem:
transform_ctxt = xslt.xsltNewTransformContext(self._c_style, c_doc) if transform_ctxt is NULL: _destroyFakeDoc(input_doc._c_doc, c_doc) python.PyErr_NoMemory() Does "memory errors" mean you get an exception or a crash? Maybe there are cases in libxslt where xsltNewTransformContext() can return NULL that do not involve a malloc problem and must be handled differently?
The exception, not a crash. Yes, it looks like that returns NULL for some reason. Possibly the resolvers haven't been set up correctly in the copy and that's why it fails?
Hmmm, I wouldn't know what impact custom resolvers could have here... Maybe this is really a memory problem after all? Do you have a chance to check the error log in lxml.etree when this happens? libxml2/libxslt often write a message there when something fails. Stefan
participants (4)
-
eric casteleijn
-
Eric Casteleijn
-
Martijn Faassen
-
Stefan Behnel