[lxml-dev] Error with thread

Hello, I have a problem with a XSLT filter: Traceback (most recent call last): File "/home/pymad/PyMAD/server/macros/freeScan.py", line 247, in doAction DataManager().newScanPoint(numPoint, 'NSF') File "/home/pymad/PyMAD/server/managers/dataManager.py", line 253, in newScanPoint self.__scan.setPoint(numPoint, numPal) File "/home/pymad/PyMAD/server/data/scanData.py", line 221, in setPoint self.__serialize() File "/home/pymad/PyMAD/server/data/scanData.py", line 196, in __serialize filter_.dumpScan(self.__tree, self.__scanBaseName) File "/home/pymad/PyMAD/server/data/dataFilters.py", line 122, in dumpScan result = self.__xslFilter.apply(tree, fileName="'%s'" % baseName) File "xslt.pxi", line 450, in etree.XSLT.apply File "xslt.pxi", line 356, in etree.XSLT.__call__ RuntimeError: stylesheet is not usable in this thread This error ocurs on a machine, and not on another, which has the same install (debian etch, lxml 1.1.1). Any idea ? -- Frédéric

Hi, Frédéric Mantegazza wrote:
That's a pretty old version, but the general restriction still applies. You cannot use an XSLT object in a different thread if it was not created in the main thread (that's due to some optimisations in libxslt). Try preparing the stylesheets in the main thread. Or, if you do not control the main thread, consider creating them on the fly and maybe caching them in thread local storage. Stefan

Le lundi 29 octobre 2007 17:36, Stefan Behnel a écrit :
Ok, so I correctly guessed the problem ;o)
Mmmm, I can't do that, as: 1) I would have to create the stylesheets every second (I use them to output datas during instrument scans, and I have 5 or 6 different stylesheets, used for screen output, and different files formats outputs); 2) I can't cache them, as a new thread is created for each new point of the scan (Pyro mecanism). I have to find another solution, by adding some new methods and propagate the new params, without creating a new stylesheet. Thanks, -- Frédéric

Frédéric Mantegazza wrote:
I think that's the best and most efficient way to do it. Generalise your stylesheets to make them configurable through parameters so that you can instantiate them once and then pass parameters that tell the stylesheet how to behave in this specific thread for the task at hand. With a bit of thinking, you should end up with a small set of different stylesheets (5-6 sounds reasonable to me) and some parameters for them that make them work in all use cases. If you want to specialise the stylesheets even further, consider creating them (partially) programmatically by adapting the XSL tree to your needs (adding xsl:include tags or templates as you see fit), then create the XSLT instances and store them in a read-only dictionary, maybe addressed by a tuple of parameters or whatever you find appropriate. Stefan

Frédéric Mantegazza wrote:
I assume you meant "libxml2", the Python wrapper around the libxml2 C library.
Is this optimisation in your C/python code, or in the original C libxslt code?
We are talking about two optimisations here, one in libxslt and one in lxml. libxslt uses a hash table for XML names to avoid re-allocation of memory. AFAIR, the optimisation is that the dict used for the generated document inherits from the dict of the stylesheet document, which is treated as read-only dictionary fallback. The optimisation in lxml is that it uses one dictionary per thread, for all documents that are parsed in that thread. So, if you take a stylesheet that was parsed in one thread (and thus depends on the dictionary of that thread), and use it in a different thread that uses a different dictionary, you end up with a result document that tries to mix entries from different dictionaries and will therefore free some of them although they are still referenced in the dictionary of another thread. A sure way to crash your system. Sadly, the dictionary used by libxslt is not configurable, so all we can do is raise an exception if we detect this problem. I don't really see that as a disadvantage, as it is a fast and safe solution, and you can usually work around the restriction without major hassle. Stefan

Le lundi 29 octobre 2007 17:02, Frédéric Mantegazza a écrit :
Ok, I think I found the problem, but not the solution. I'm using lxml in a client/server app, based on Pyro. The first time I instanciate all my XSLT filters, I'm in a the main thread. Then, I can use them in other threads, and it works fine. But I have some commands to create a new session in the DataManger, and I then re-instanciate all XSLT filters (with different params). But here, I'm not in the main thread anymore! And it seems that if XSLT filters are instanciated in threads, they can't be used in other thread. Is this analyse correct? Is there a workarround ? -- Frédéric

Le mardi 30 octobre 2007 08:33, Frédéric Mantegazza a écrit :
Sorry, I didn't see the answer from Stefan, as answers posted on the list and at my personnal address are only sent to my personnal address ; I didn't see it in the dozain of other messages... BTW, why this behaviour? This is the only ML like this (I use other Mailman lists, and I administrate some)... -- Frédéric

Salut, Frédéric Mantegazza wrote:
If with "params" you mean XSLT parameters, you can pass these at call time, you do not need to reinstantiate the XSLT for that.
Correct. As I said, a work-around would be to either create them on the fly or cache the XSLT objects in thread-local storage and reuse them from there. Stefan

Hello, Looks like I've got the same problem. Sometimes I get the same error message: 'stylesheet is not usable in this thread'. As far as I could understand, that's because of an attempt to use in one thread the xslt obejct initiated in another thread. Sounds reasonable.
Correct. As I said, a work-around would be to either create them on the fly or cache the XSLT objects in thread-local storage and reuse them from there.
Nice. My application works under the same scheme. I'm using mod_python and several apache processes started in prefork mode. In every apache process I'm using a global general object that contains xslt objects inside. When a request comes to the next apache process, my general object is initialized (if it has not been done yet) and then is used inside this thread and this process. I cannot see the reason why one instance of mod_python should conflict with another. Nevertheless, I happen to get this error messages without any idea why. I can not see any dependency or rule yet. The only solution is to restart apache. My software is the following: Apache/2.0.61 Python 2.5.1 mod_python-3.3.1 lxml-1.3.4 libxslt-1.1.20 freebsd 6.2-20070330-SNAP Do you have any idea how can I fix this situation or at least how can I track the reasons? Maybe this is the question for some other mailing lists too? Dmitri

Hi, I had the same problem with mod_python, a while ago, it seems mod_python does some trickery with threads in its internals. The only solution I found is setting PythonInterpreter "myoneandonlyinterpreter" in the apache config of each virtual host while running prefork servers. I coudn't find any other solution beside hoping that the threading problem in lxml will go away, sometime. Hans Am Montag, den 03.12.2007, 16:17 +0300 schrieb Dmitri Fedoruk:
Alterras GmbH, Allersbergerstr. 185-N, D-90461 Nürnberg http://www.alterras.de/, info@alterras.de Tel: (+49) 0911-480039-0 Handelsreg.: AG Nürnberg HRB 18488, Geschäftsführer: H.-J. Hay, H. Sivak

Hi, Frédéric Mantegazza wrote:
That's a pretty old version, but the general restriction still applies. You cannot use an XSLT object in a different thread if it was not created in the main thread (that's due to some optimisations in libxslt). Try preparing the stylesheets in the main thread. Or, if you do not control the main thread, consider creating them on the fly and maybe caching them in thread local storage. Stefan

Le lundi 29 octobre 2007 17:36, Stefan Behnel a écrit :
Ok, so I correctly guessed the problem ;o)
Mmmm, I can't do that, as: 1) I would have to create the stylesheets every second (I use them to output datas during instrument scans, and I have 5 or 6 different stylesheets, used for screen output, and different files formats outputs); 2) I can't cache them, as a new thread is created for each new point of the scan (Pyro mecanism). I have to find another solution, by adding some new methods and propagate the new params, without creating a new stylesheet. Thanks, -- Frédéric

Frédéric Mantegazza wrote:
I think that's the best and most efficient way to do it. Generalise your stylesheets to make them configurable through parameters so that you can instantiate them once and then pass parameters that tell the stylesheet how to behave in this specific thread for the task at hand. With a bit of thinking, you should end up with a small set of different stylesheets (5-6 sounds reasonable to me) and some parameters for them that make them work in all use cases. If you want to specialise the stylesheets even further, consider creating them (partially) programmatically by adapting the XSL tree to your needs (adding xsl:include tags or templates as you see fit), then create the XSLT instances and store them in a read-only dictionary, maybe addressed by a tuple of parameters or whatever you find appropriate. Stefan

Frédéric Mantegazza wrote:
I assume you meant "libxml2", the Python wrapper around the libxml2 C library.
Is this optimisation in your C/python code, or in the original C libxslt code?
We are talking about two optimisations here, one in libxslt and one in lxml. libxslt uses a hash table for XML names to avoid re-allocation of memory. AFAIR, the optimisation is that the dict used for the generated document inherits from the dict of the stylesheet document, which is treated as read-only dictionary fallback. The optimisation in lxml is that it uses one dictionary per thread, for all documents that are parsed in that thread. So, if you take a stylesheet that was parsed in one thread (and thus depends on the dictionary of that thread), and use it in a different thread that uses a different dictionary, you end up with a result document that tries to mix entries from different dictionaries and will therefore free some of them although they are still referenced in the dictionary of another thread. A sure way to crash your system. Sadly, the dictionary used by libxslt is not configurable, so all we can do is raise an exception if we detect this problem. I don't really see that as a disadvantage, as it is a fast and safe solution, and you can usually work around the restriction without major hassle. Stefan

Le lundi 29 octobre 2007 17:02, Frédéric Mantegazza a écrit :
Ok, I think I found the problem, but not the solution. I'm using lxml in a client/server app, based on Pyro. The first time I instanciate all my XSLT filters, I'm in a the main thread. Then, I can use them in other threads, and it works fine. But I have some commands to create a new session in the DataManger, and I then re-instanciate all XSLT filters (with different params). But here, I'm not in the main thread anymore! And it seems that if XSLT filters are instanciated in threads, they can't be used in other thread. Is this analyse correct? Is there a workarround ? -- Frédéric

Le mardi 30 octobre 2007 08:33, Frédéric Mantegazza a écrit :
Sorry, I didn't see the answer from Stefan, as answers posted on the list and at my personnal address are only sent to my personnal address ; I didn't see it in the dozain of other messages... BTW, why this behaviour? This is the only ML like this (I use other Mailman lists, and I administrate some)... -- Frédéric

Salut, Frédéric Mantegazza wrote:
If with "params" you mean XSLT parameters, you can pass these at call time, you do not need to reinstantiate the XSLT for that.
Correct. As I said, a work-around would be to either create them on the fly or cache the XSLT objects in thread-local storage and reuse them from there. Stefan

Hello, Looks like I've got the same problem. Sometimes I get the same error message: 'stylesheet is not usable in this thread'. As far as I could understand, that's because of an attempt to use in one thread the xslt obejct initiated in another thread. Sounds reasonable.
Correct. As I said, a work-around would be to either create them on the fly or cache the XSLT objects in thread-local storage and reuse them from there.
Nice. My application works under the same scheme. I'm using mod_python and several apache processes started in prefork mode. In every apache process I'm using a global general object that contains xslt objects inside. When a request comes to the next apache process, my general object is initialized (if it has not been done yet) and then is used inside this thread and this process. I cannot see the reason why one instance of mod_python should conflict with another. Nevertheless, I happen to get this error messages without any idea why. I can not see any dependency or rule yet. The only solution is to restart apache. My software is the following: Apache/2.0.61 Python 2.5.1 mod_python-3.3.1 lxml-1.3.4 libxslt-1.1.20 freebsd 6.2-20070330-SNAP Do you have any idea how can I fix this situation or at least how can I track the reasons? Maybe this is the question for some other mailing lists too? Dmitri

Hi, I had the same problem with mod_python, a while ago, it seems mod_python does some trickery with threads in its internals. The only solution I found is setting PythonInterpreter "myoneandonlyinterpreter" in the apache config of each virtual host while running prefork servers. I coudn't find any other solution beside hoping that the threading problem in lxml will go away, sometime. Hans Am Montag, den 03.12.2007, 16:17 +0300 schrieb Dmitri Fedoruk:
Alterras GmbH, Allersbergerstr. 185-N, D-90461 Nürnberg http://www.alterras.de/, info@alterras.de Tel: (+49) 0911-480039-0 Handelsreg.: AG Nürnberg HRB 18488, Geschäftsführer: H.-J. Hay, H. Sivak
participants (4)
-
Dmitri Fedoruk
-
Frédéric Mantegazza
-
Hans-Jürgen Hay
-
Stefan Behnel