[lxml-dev] lxml \ libxslt \ libxml2 leads to apache 2 crash on freebsd/amd64
Greetings! First of all, happy holidays. Looks like it's not the perfect time to report a problem, but I can't find any solution by myself. More than that, I'm not sure this is a lxml problem. But I really hope for somebody to take a look at this long message :) My application is a web service written in Python. It is running inside of mod_python handled by Apache2 (precise versions follow). Apache runs in prefork mode. I use xml to collect data from sources. I use xslt for their processing and, eventually, forming the html output. I use lxml library for this. Xslt transformations are compiled once for every instance of apache fork and then they run in their own thread independently. Up to one moment everything went smoothly and painlessly. Then I decided to upgrade all the software and came to the following situation. When I try to serialize an etree object ... transform = etree.XSLT(xslt_doc) result_tree = transform(data, **variables ) return etree.tostring(result_tree, 'utf-8') ( or return str(result_tree) ) I have a core dump of apache. The point is that it happens during serialization. I tried to omit 'utf-8', I tried to use str() - did not help. What is strange - this error occurs with only several xsl templates, and not all the time. What is even more strange: this happens only on amd64 machines _within_ apache. On my i386 desktop I can not reproduce this bug. Also I can not reproduce this from the console. Now the configuration details. I tried almost all possible combinations of this software. FreeBSD 6.2-20070330-SNAP Apache/2.0.61, Apache/2.2.6 mod_python-3.3.1 lxml 1.3.5, 1.3.6 libxml2-2.6.27, libxml2-2.6.30 libxslt-1.1.20, libxslt-1.1.22 I have to mention that I do also use some modules written in C that have pythonic binding. We suggested that they could be the reason of memory corruption. We run the application without their usage and still got the same core dump, so they are not the reason. The error messages are httpd in free(): error: chunk is already free httpd in free(): error: modified (chunk-) pointer httpd in free(): error: pointer to wrong page And the backtrace: (gdb) bt #0 0x00000008013824bc in kill () from /lib/libc.so.6 #1 0x000000080138134d in abort () from /lib/libc.so.6 #2 0x000000080131a265 in _UTF8_init () from /lib/libc.so.6 #3 0x000000080131a29c in _UTF8_init () from /lib/libc.so.6 #4 0x000000080131b23d in _UTF8_init () from /lib/libc.so.6 #5 0x00000008055dc3f9 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #6 0x00000008055dc29d in xmlFreeProp () from /usr/local/lib/libxml2.so.5 #7 0x00000008055dc2dc in xmlFreePropList () from /usr/local/lib/libxml2.so.5 #8 0x00000008055dc4bb in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #9 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #10 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #11 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #12 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #13 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #14 0x00000008055dcad5 in xmlFreeDoc () from /usr/local/lib/libxml2.so.5 #15 0x00000008051a8399 in __pyx_tp_dealloc_5etree__Document () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #16 0x00000008051be15b in __pyx_tp_dealloc_5etree__Element () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #17 0x00000008051a9b8f in __pyx_tp_dealloc_5etree__ElementTree () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #18 0x00000008036cbb4b in _PyFloat_Unpack8 () from /usr/local/lib/libpython2.5.so #19 0x0000000803723c03 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #20 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #21 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #22 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #23 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #24 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #25 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #26 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #27 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #28 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #29 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #30 0x00000008036cd8ae in PyFunction_SetClosure () from /usr/local/lib/libpython2.5.so #31 0x00000008036b3c73 in PyObject_Call () from /usr/local/lib/libpython2.5.so #32 0x0000000803721262 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #33 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #34 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #35 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #36 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #37 0x0000000803723326 in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #38 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #39 0x00000008037229eb in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.5.so #40 0x0000000803723c24 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.5.so #41 0x00000008036cd8ae in PyFunction_SetClosure () from /usr/local/lib/libpython2.5.so #42 0x00000008036b3c73 in PyObject_Call () from /usr/local/lib/libpython2.5.so #43 0x00000008036bbd64 in PyMethod_New () from /usr/local/lib/libpython2.5.so #44 0x00000008036b3c73 in PyObject_Call () from /usr/local/lib/libpython2.5.so #45 0x00000008036b3d09 in PyObject_Call () from /usr/local/lib/libpython2.5.so #46 0x00000008036b4024 in PyObject_CallMethod () from /usr/local/lib/libpython2.5.so #47 0x000000080356acf2 in python_handler () from /usr/local/libexec/apache2/mod_python.so #48 0x0000000000425e4a in ap_run_handler (r=0x190b090) at config.c:152 #49 0x0000000000426755 in ap_invoke_handler (r=0x190b090) at config.c:364 #50 0x0000000000422550 in ap_process_request (r=0x190b090) at http_request.c:249 #51 0x000000000041bbd7 in ap_process_http_connection (c=0x77f1b0) at http_core.c:253 #52 0x0000000000433c1a in ap_run_process_connection (c=0x77f1b0) at connection.c:43 #53 0x0000000000434075 in ap_process_connection (c=0x77f1b0, csd=0x77f090) at connection.c:176 #54 0x000000000042437f in child_main (child_num_arg=3) at prefork.c:610 #55 0x000000000042450b in make_child (s=0x5a94f8, slot=3) at prefork.c:704 #56 0x00000000004247b0 in perform_idle_server_maintenance (p=0x578028) at prefork.c:839 #57 0x0000000000424c37 in ap_mpm_run (_pconf=0x578028, plog=0x5a4028, s=0x5a94f8) at prefork.c:1040 #58 0x000000000042d27e in main (argc=1, argv=0x7fffffffea08) at main.c:656 The irony of it all is that I still can not downgrade all this packages to the version that did not have core dump... I've spent a week already trying to figure out the gist of the problem, combining different lxml, libxml2\xslt and apache versions. I start with this mailing list as the fatal call is made with the lxml function. I really hope there is an explanation and solution for this. I do not want to switch to another xslt processor or rebuild the architecture of the whole service in the worst case. I'd be grateful for any ideas and suggestions. Cheers, Dmitri
Hi, Dmitri Fedoruk wrote:
First of all, happy holidays. Looks like it's not the perfect time to report a problem, but I can't find any solution by myself. More than that, I'm not sure this is a lxml problem. But I really hope for somebody to take a look at this long message :) ... transform = etree.XSLT(xslt_doc) result_tree = transform(data, **variables ) return etree.tostring(result_tree, 'utf-8') ( or return str(result_tree) )
FreeBSD 6.2-20070330-SNAP Apache/2.0.61, Apache/2.2.6 mod_python-3.3.1 lxml 1.3.5, 1.3.6 libxml2-2.6.27, libxml2-2.6.30 libxslt-1.1.20, libxslt-1.1.22
(gdb) bt #0 0x00000008013824bc in kill () from /lib/libc.so.6 #1 0x000000080138134d in abort () from /lib/libc.so.6 #2 0x000000080131a265 in _UTF8_init () from /lib/libc.so.6 #3 0x000000080131a29c in _UTF8_init () from /lib/libc.so.6 #4 0x000000080131b23d in _UTF8_init () from /lib/libc.so.6 #5 0x00000008055dc3f9 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #6 0x00000008055dc29d in xmlFreeProp () from /usr/local/lib/libxml2.so.5 #7 0x00000008055dc2dc in xmlFreePropList () from /usr/local/lib/libxml2.so.5 #8 0x00000008055dc4bb in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #9 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #10 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #11 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #12 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #13 0x00000008055dc385 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #14 0x00000008055dcad5 in xmlFreeDoc () from /usr/local/lib/libxml2.so.5 #15 0x00000008051a8399 in __pyx_tp_dealloc_5etree__Document () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #16 0x00000008051be15b in __pyx_tp_dealloc_5etree__Element () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #17 0x00000008051a9b8f in __pyx_tp_dealloc_5etree__ElementTree () from /usr/local/lib/python2.5/site-packages/lxml-1.3.6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so
Hmm, this looks like a deallocation problem - and there shouldn't be any left in lxml 1.3.6... Also, the usage you describe sounds perfectly reasonable and shouldn't lead to any problems. For a quick shot: could you try switching to lxml 2.0alpha6 to see if the problem persists? Stefan
For a quick shot: could you try switching to lxml 2.0alpha6 to see if the problem persists? Hm, switching turned out not to be a pice of cake - in fact, some
Hello, Thank you fro your reply! transformations do not run at all. 1) There is a template that was processed with lxml 1.3.x with no problem, but it can not be processed in 2.0: "Entity 'mdash' not defined, line 67, column 43". All entities are defined in an included file. There are two similar templates that are processed with no problem both in 1.3.x and 2.0 2) I get something like this: function takes at most 1 positional arguments (2 given) I'll try to figure it out in the morning. I had a hope that downgrading to lxml 1.3.3[4] coudl solve my problem - it did not, at least with libxml2 2.6.30. I'll try with another libxml2 version tomorrow. Dmitri
Hi, Dmitri Fedoruk wrote: >> For a quick shot: could you try switching to lxml 2.0alpha6 to see if the >> problem persists? > Hm, switching turned out not to be a pice of cake - in fact, some > transformations do not run at all. > 1) There is a template that was processed with lxml 1.3.x with no > problem, but it can not be processed in 2.0: "Entity 'mdash' not > defined, line 67, column 43". All entities are defined in an included > file. There are two similar templates that are processed with no > problem both in 1.3.x and 2.0 There were changes regarding entities in 2.0. They are now supported as real Element classes rather than requiring the parser to resolve them (which could cause problems in 1.3 if they were not resolved). This change is up to the parser configuration and therefore shouldn't normally touch old code. But as I do not know what exactly you are doing, I can't guess what the impact is in your case. > 2) I get something like this: > function takes at most 1 positional arguments (2 given) This comes from one of the API changes for requiring keyword-only arguments for optional parameters. You should be able to use keyword arguments here in both 2.0 and 1.3. Stefan
Hi,
2) I get something like this: function takes at most 1 positional arguments (2 given)
The 2.0 series has moved to keyword-only arguments in some signatures somewhere in its alpha-phase, so it might complain when you use certain positional args in legacy code. This should be fixed rather easily by using keyword arguments instead. Cheers, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
Hello once again, I've upgdaded my code to be lxml2.0-compatible.
Entity 'hellip' not defined Parsing of the incoming data fails when I have html entities in it.
Literally I have this code: xmlParser = etree.XMLParser( no_network = False, resolve_entities = False ) storedDoc = etree.parse( StringIO.StringIO(reply['data']), xmlParser ) I tried to turn resolve_entities = True, did not help either. The point is that all entities are defined in the files included in the DTD file, and I do not want to validate the data in the runtime - I have strict time limitations. It worked fine win 1.3.x without special parser, just with storedDoc = etree.parse( StringIO.StringIO(reply['data']) ) So, is there any chance to deal with entities in my incoming data without validating?
function takes at most 1 positional arguments (2 given) That was the very string that leads to problems, I had to add the 'encoding' keyword. return etree.tostring(result_tree, encoding = 'utf-8')
Nevertheless the upgrade did not help. (gdb) bt #0 0x00000008011464bc in kill () from /lib/libc.so.6 #1 0x0000000800f5261e in raise () from /lib/libpthread.so.2 #2 0x000000080114534d in abort () from /lib/libc.so.6 #3 0x00000008010de265 in _UTF8_init () from /lib/libc.so.6 #4 0x00000008010de29c in _UTF8_init () from /lib/libc.so.6 #5 0x00000008010df23d in _UTF8_init () from /lib/libc.so.6 #6 0x00000008069d7a19 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #7 0x00000008069d78bd in xmlFreeProp () from /usr/local/lib/libxml2.so.5 #8 0x00000008069d78fc in xmlFreePropList () from /usr/local/lib/libxml2.so.5 #9 0x00000008069d7adb in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #10 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #11 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #12 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #13 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #14 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #15 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #16 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #17 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #18 0x00000008069d79a5 in xmlFreeNodeList () from /usr/local/lib/libxml2.so.5 #19 0x00000008069d80f5 in xmlFreeDoc () from /usr/local/lib/libxml2.so.5 #20 0x000000080656d589 in __pyx_tp_dealloc_4lxml_5etree__Document () from /usr/local/lib/python2.5/site-packages/lxml-2.0alpha6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #21 0x000000080658f48b in __pyx_tp_dealloc_4lxml_5etree__Element () from /usr/local/lib/python2.5/site-packages/lxml-2.0alpha6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #22 0x000000080656eacf in __pyx_tp_dealloc_4lxml_5etree__ElementTree () from /usr/local/lib/python2.5/site-packages/lxml-2.0alpha6-py2.5-freebsd-6.2-20070912-SNAP-amd64.egg/lxml/etree.so #23 0x0000000804bb7b4b in _PyFloat_Unpack8 () from /usr/local/lib/libpython2.5.so As I have already said, this happens with only several given stylesheets. May this be the data\stylesheet problem? Cheers, Dmitri
Hi, Dmitri Fedoruk wrote:
I've upgdaded my code to be lxml2.0-compatible.
Cool. Hope it wasn't too hard.
Entity 'hellip' not defined Parsing of the incoming data fails when I have html entities in it.
Literally I have this code:
xmlParser = etree.XMLParser( no_network = False, resolve_entities = False ) storedDoc = etree.parse( StringIO.StringIO(reply['data']), xmlParser )
I tried to turn resolve_entities = True, did not help either. The point is that all entities are defined in the files included in the DTD file, and I do not want to validate the data in the runtime - I have strict time limitations.
You can load the DTD without triggering validation by passing "load_dtd = True". I never tested the performance impact, though. The XML parser needs to read the DTD to learn about the entities (that's how it works). If you are dealing with HTML, you can also try the HTMLParser() - it's not only good for fixing HTML, it also knows a lot of HTML specifics.
As I have already said, this happens with only several given stylesheets. May this be the data\stylesheet problem?
Not sure what you mean here. Can you figure out what is different in the stylesheets that fail? Something like "only they call document() to read from other XML files" or "only they (or all of them) use stylesheet-local data" or "they were created at a different place in the code". As a quick fix, did you try changing the mod_python config as proposed in the FAQ? http://codespeak.net/lxml/dev/FAQ.html#my-program-crashes-when-run-with-mod-... Again, no idea about the performance impact here. Stefan
Hi once again, So, just to me more precise - iit is truly a deallocation problem of libxml2 inside of Apache. here is the code with debugging traces: ... result = '' try: result_tree = transform(data, **variables ) logging.debug('try') if isText: result = str(result_tree) else: result = etree.tostring(result_tree, encoding = 'utf-8') logging.debug('passed') except Exception, exc: inLogger.error( exc.__str__() ) inLogger.error( "xslt error" ) return "" logging.debug('fake object') result_tree = '1' # as well as None, etree.Element(), etc logging.debug('exiting') return result Here is the log of the normal run: Sat, 29 Dec 2007 19:42:38 DEBUG try Sat, 29 Dec 2007 19:42:38 DEBUG passed Sat, 29 Dec 2007 19:42:38 DEBUG fake object Sat, 29 Dec 2007 19:42:38 DEBUG exiting and here is the log of the crashing call: Sat, 29 Dec 2007 19:42:37 DEBUG try Sat, 29 Dec 2007 19:42:37 DEBUG passed Sat, 29 Dec 2007 19:42:37 DEBUG fake object httpd in free(): error: modified (chunk-) pointer So, it happens when I try to replace my result_tree value Is it worth of reporting this crash to libxml2 / apache mailing lists, what would you say? Cheers, Dmitri
Hi, Dmitri Fedoruk wrote:
So, just to me more precise - iit is truly a deallocation problem of libxml2 inside of Apache. [example code stripped] Is it worth of reporting this crash to libxml2 / apache mailing lists, what would you say?
I'm sure it's not a problem in libxml2. Since I do not have enough information, I do not know if the following explanation fits here, but I'll give it anyway. The way XSLT is implemented in lxml is a bit tricky, as libxslt makes some things hard to control that lxml uses in libxml2 for performance reasons. In particular, lxml uses a thread-local hash table for constant strings, which is much faster than a malloc() for each string that occurs in a document. However, libxslt doesn't honour this dictionary and creates its own one based on the stylesheet dictionary. The result is that the stylesheet can leak into the result document through string references that now point into the hash table of the stylesheet. There isn't a way in libxslt that would allow us to prevent this or to control the allocation. That's why I decided to restrict the execution of XSL transformations to threads that inherit the same hash table as the stylesheet, this should normally prevent any problems. As I said, this might or might not be the source of this particular problem. Threading is always hard to get right, so maybe there are constellations where the current restrictions are not enough. So far, I'm not aware of any. Redesigning the way XSLT interacts with threads is not a small change and quite risky, so I'd prefer considering that the last resort... Stefan
Hi Dmitri, Stefan Behnel wrote:
The way XSLT is implemented in lxml is a bit tricky, as libxslt makes some things hard to control that lxml uses in libxml2 for performance reasons. In particular, lxml uses a thread-local hash table for constant strings, which is much faster than a malloc() for each string that occurs in a document. However, libxslt doesn't honour this dictionary and creates its own one based on the stylesheet dictionary. The result is that the stylesheet can leak into the result document through string references that now point into the hash table of the stylesheet.
There isn't a way in libxslt that would allow us to prevent this or to control the allocation. That's why I decided to restrict the execution of XSL transformations to threads that inherit the same hash table as the stylesheet, this should normally prevent any problems.
Here is a trivial patch (the one against xslt.pxi) that, instead of raising an exception, copies the stylesheet into the current thread context, and thus works around the current thread restrictions. It seems to work for me, any chance you could give it a try? In case it doesn't work reliably, could you additionally check the second change (in parser.pxi)? It should restrict 'acceptable' hash tables to the local thread, not including the main thread (as it did before). Stefan === src/lxml/xslt.pxi ================================================================== --- src/lxml/xslt.pxi (revision 3205) +++ src/lxml/xslt.pxi (local) @@ -373,7 +373,7 @@ cdef xmlDoc* c_doc if not _checkThreadDict(self._c_style.doc.dict): - raise RuntimeError, "stylesheet is not usable in this thread" + return self.__copy__()(_input, profile_run=profile_run, **_kw) input_doc = _documentOrRaise(_input) root_node = _rootNodeOrRaise(_input) === src/lxml/parser.pxi ================================================================== --- src/lxml/parser.pxi (revision 3205) +++ src/lxml/parser.pxi (local) @@ -132,8 +132,8 @@ """Check that c_dict is either the local thread dictionary or the global parent dictionary. """ - if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict: - return 1 # main thread + #if __GLOBAL_PARSER_CONTEXT._c_dict is c_dict: + # return 1 # main thread if __GLOBAL_PARSER_CONTEXT._getThreadDict(NULL) is c_dict: return 1 # local thread dict return 0
Hello,
Here is a trivial patch (the one against xslt.pxi) It seems to work for me, any chance you could give it a try? Thank you for the patches, I'll apply them and see what happens next. The thing is that such an exception occurs very seldom and I can not reproduce it.
Nevertheless, coming back to the thread subject. As we have managed to find out, it is indeed the deallocation problem. I've played around with the variable taht caused the trouble, tried to make it global, for example - this changed only the positon of the crash, but not the reason. When the memory has to be free'd, the crash happens. Unfortunatelly I managed to reproduce this bug on 3 versions of FreeBSD 6.2 and on the i386 architecture too, which had never happened in 6 months of development. But i386 is capable of running valgrind. So I got this errors: ==77394== Invalid free() / delete / delete[] ==77394== at 0x3C03867F: free (in /usr/local/lib/valgrind/vgpreload_memcheck.so) ==77394== by 0x3CF97668: xmlFreeNodeList (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3CF974F0: xmlFreeProp (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3CF9754F: xmlFreePropList (in /usr/X11R6/lib/libxml2.so.5) ==77394== Address 0x3C9C5E8B is 743 bytes inside a block of size 1024 alloc'd ==77394== at 0x3C038183: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck.so) ==77394== by 0x3D02B4EE: xmlDictAddString (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3D02BBEB: xmlDictLookup (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3CF80425: xmlDetectSAX2 (in /usr/X11R6/lib/libxml2.so.5) More than that, this messages were preceeded by a bunch of errors from the libpython itself: ==77390== Use of uninitialised value of size 4 ==77390== at 0x3C6D2AC9: PyObject_Realloc (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C735BA4: _PyObject_GC_Resize (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C6BE525: PyFrame_New (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C711382: PyEval_EvalFrameEx (in /usr/X11R6/lib/libpython2.5.so) ==77390== ==77390== Invalid read of size 4 ==77390== at 0x3C6D2AAF: PyObject_Realloc (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C735BA4: _PyObject_GC_Resize (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C6BE525: PyFrame_New (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C711382: PyEval_EvalFrameEx (in /usr/X11R6/lib/libpython2.5.so) ==77390== Conditional jump or move depends on uninitialised value(s) ==77390== at 0x3C6D2AB8: PyObject_Realloc (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C735BA4: _PyObject_GC_Resize (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C6BE525: PyFrame_New (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C711382: PyEval_EvalFrameEx (in /usr/X11R6/lib/libpython2.5.so) (repeated many times during the apache thread initialisation). So, this is not the lxml problem really... But maybe somebody has any idea? Right now I'm thinking of opportunity to replace mod_python with mod_fastcgi . Thanks for attention so far! Dmitri
Greetings once more, Update of the previous message: nevermind the different PID's in valgrind output, I just copied the wrong ones. Error messages from the libpython2.5.so in the process with invalid free\delete are the same as I have posted before.
==77394== Invalid free() / delete / delete[] [omitted]
More than that, this messages were preceeded by a bunch of errors from the libpython itself: ==77390== Use of uninitialised value of size 4 [omitted]
Cheers, Dmitri
Hi, Dmitri Fedoruk wrote:
Here is a trivial patch (the one against xslt.pxi) It seems to work for me, any chance you could give it a try? Thank you for the patches, I'll apply them and see what happens next.
Thanks.
The thing is that such an exception occurs very seldom and I can not reproduce it.
You will still notice if it doesn't work. :)
Nevertheless, coming back to the thread subject. As we have managed to find out, it is indeed the deallocation problem. I've played around with the variable taht caused the trouble, tried to make it global, for example - this changed only the positon of the crash, but not the reason. When the memory has to be free'd, the crash happens.
Unfortunatelly I managed to reproduce this bug on 3 versions of FreeBSD 6.2 and on the i386 architecture too, which had never happened in 6 months of development.
It shouldn't be machine dependent. If it's there, it's in the code. Garbage collection and threading might work different on different architectures, but that won't remove the actual problem.
But i386 is capable of running valgrind. So I got this errors: ==77394== Invalid free() / delete / delete[] ==77394== at 0x3C03867F: free (in /usr/local/lib/valgrind/vgpreload_memcheck.so) ==77394== by 0x3CF97668: xmlFreeNodeList (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3CF974F0: xmlFreeProp (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3CF9754F: xmlFreePropList (in /usr/X11R6/lib/libxml2.so.5) ==77394== Address 0x3C9C5E8B is 743 bytes inside a block of size 1024 alloc'd ==77394== at 0x3C038183: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck.so) ==77394== by 0x3D02B4EE: xmlDictAddString (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3D02BBEB: xmlDictLookup (in /usr/X11R6/lib/libxml2.so.5) ==77394== by 0x3CF80425: xmlDetectSAX2 (in /usr/X11R6/lib/libxml2.so.5)
Funny place for a malloc. Anyway, this is only a symptom. The problem is that the document or an XML node gets freed either while it's still in use, or by two independent parties (i.e. Python element proxies that refer to it). With a bug that occurs this seldom and a setup as complex as mod_python, it's really hard to narrow down the test case, so I don't how far you could get here. However, I'm currently chasing a (pretty old) bug myself. Maybe it's related already. Could you check with the current SVN trunk if that works better for you? Although the stack trace above gives me doubts...
More than that, this messages were preceeded by a bunch of errors from the libpython itself: ==77390== Use of uninitialised value of size 4 ==77390== at 0x3C6D2AC9: PyObject_Realloc (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C735BA4: _PyObject_GC_Resize (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C6BE525: PyFrame_New (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C711382: PyEval_EvalFrameEx (in /usr/X11R6/lib/libpython2.5.so) ==77390== ==77390== Invalid read of size 4 ==77390== at 0x3C6D2AAF: PyObject_Realloc (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C735BA4: _PyObject_GC_Resize (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C6BE525: PyFrame_New (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C711382: PyEval_EvalFrameEx (in /usr/X11R6/lib/libpython2.5.so) ==77390== Conditional jump or move depends on uninitialised value(s) ==77390== at 0x3C6D2AB8: PyObject_Realloc (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C735BA4: _PyObject_GC_Resize (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C6BE525: PyFrame_New (in /usr/X11R6/lib/libpython2.5.so) ==77390== by 0x3C711382: PyEval_EvalFrameEx (in /usr/X11R6/lib/libpython2.5.so) (repeated many times during the apache thread initialisation).
Hmmm, not sure what this means. Might be entirely unrelated. Valgrind uses a suppression file that drops a lot of false positives, maybe those are just false positives of mod_python.
So, this is not the lxml problem really... But maybe somebody has any idea? Right now I'm thinking of opportunity to replace mod_python with mod_fastcgi .
I have neither experience with mod_python nor with mod_fastcgi, sorry. Stefan
participants (3)
-
Dmitri Fedoruk
-
jholg@gmx.de
-
Stefan Behnel