Fwd: [lxml-dev] Segfault on OS X

FYI, I also have the same problem. I ran the tests through gdb, and this is what I get. xmllint: using libxml version 20619 gdb python GNU gdb 5.3-20030128 (Apple version gdb-330.1) (Fri Jul 16 21:42:28 GMT 2004) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "powerpc-apple-darwin". Reading symbols for shared libraries ... done (gdb) set args test.py -p -v (gdb) run Starting program: /usr/bin/python test.py -p -v Reading symbols for shared libraries . done ... Reading symbols for shared libraries . done 3/238 ( 1.3%): test_attribute_items (lxml.tests.test_etree.ETreeTestCase) Program received signal EXC_BAD_ACCESS, Could not access memory. 0x94ad072c in xmlDictOwns () (gdb) bt #0 0x94ad072c in xmlDictOwns () #1 0x94a53068 in xmlFreeNodeList () #2 0x94a518f8 in xmlFreeProp () #3 0x94a51810 in xmlFreePropList () #4 0x94a53028 in xmlFreeNodeList () #5 0x94a504f4 in xmlFreeDoc () #6 0x004f4230 in __pyx_tp_dealloc_5etree__DocumentBase (o=0x3e110) at src/lxml/etree.c:354 #7 0x004f43fc in __pyx_tp_dealloc_5etree__NodeBase (o=0xd4e8) at src/lxml/etree.c:9688 #8 0x10024898 in frame_dealloc (f=0x124ae0) at Objects/frameobject.c:394 #9 0x1007cab8 in fast_function (func=0x121d30, pp_stack=0xc, n=7, na=5194268, nk=1187120) at Python/ceval.c:3632 #10 0x1007c908 in call_function (pp_stack=0xbfffdfcc, oparg=1255136) at Python/ceval.c:3568 #11 0x1007a384 in PyEval_EvalFrame (f=0x83d810) at Python/ceval.c:2163 #12 0x1007b4c8 in PyEval_EvalCodeEx (co=0x0, globals=0x1326e0, locals=0x42657475, args=0x10078444, argcount=1049072, kws=0x1007a374, kwcount=1, defs=0x83d96c, defcount=1, closure=0x0) at Python/ceval.c:2730 #13 0x100264b8 in function_call (func=0x10c809, arg=0x1001f0, kw=0x83d810) at Objects/funcobject.c:548 #14 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #15 0x1007d078 in ext_do_call (func=0x6d430, pp_stack=0xbfffe290, flags=269601980, na=1, nk=0) at Python/ceval.c:3824 #16 0x1007a474 in PyEval_EvalFrame (f=0x116730) at Python/ceval.c:2203 #17 0x1007b4c8 in PyEval_EvalCodeEx (co=0x11688c, globals=0x1326e0, locals=0x42657475, args=0xa30d0, argcount=1049072, kws=0x1007a374, kwcount=1, defs=0x83d96c, defcount=0, closure=0x0) at Python/ceval.c:2730 #18 0x100264b8 in function_call (func=0x627eb, arg=0x1001f0, kw=0x116730) at Objects/funcobject.c:548 #19 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #20 0x10015ccc in instancemethod_call (func=0x6d470, arg=0x567878, kw=0x0) at Objects/classobject.c:2431 #21 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #22 0x1005959c in slot_tp_call (self=0xa30d0, args=0x54c570, kwds=0x0) at Objects/typeobject.c:4526 #23 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #24 0x1007cc28 in do_call (func=0xa30d0, pp_stack=0xa30d0, na=0, nk=1049072) at Python/ceval.c:3755 #25 0x1007c920 in call_function (pp_stack=0x0, oparg=1255136) at Python/ceval.c:3570 #26 0x1007a384 in PyEval_EvalFrame (f=0x109bc0) at Python/ceval.c:2163 #27 0x1007b4c8 in PyEval_EvalCodeEx (co=0x1, globals=0x1326e0, locals=0x42657475, args=0x10078444, argcount=1049072, kws=0x1007a374, kwcount=1, defs=0x83d96c, defcount=0, closure=0x0) at Python/ceval.c:2730 #28 0x100264b8 in function_call (func=0x2092dc, arg=0x1001f0, kw=0x109bc0) at Objects/funcobject.c:548 #29 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #30 0x1007d078 in ext_do_call (func=0x6d8b0, pp_stack=0xbfffebd0, flags=269601980, na=1, nk=0) at Python/ceval.c:3824 #31 0x1007a474 in PyEval_EvalFrame (f=0x107160) at Python/ceval.c:2203 #32 0x1007b4c8 in PyEval_EvalCodeEx (co=0x1072bc, globals=0x1326e0, locals=0x42657475, args=0x55bc70, argcount=1049072, kws=0x1007a374, kwcount=1, defs=0x83d96c, defcount=0, closure=0x0) at Python/ceval.c:2730 #33 0x100264b8 in function_call (func=0x62cc3, arg=0x1001f0, kw=0x107160) at Objects/funcobject.c:548 #34 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #35 0x10015ccc in instancemethod_call (func=0x6d8f0, arg=0xd3350, kw=0x0) at Objects/classobject.c:2431 #36 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #37 0x1005959c in slot_tp_call (self=0x55bc70, args=0x565690, kwds=0x0) at Objects/typeobject.c:4526 #38 0x1000c58c in PyObject_Call (func=0x121d30, arg=0x1326e0, kw=0x0) at Objects/abstract.c:1751 #39 0x1007cc28 in do_call (func=0x55bc70, pp_stack=0x55bc70, na=0, nk=1049072) at Python/ceval.c:3755 #40 0x1007c920 in call_function (pp_stack=0x0, oparg=1255136) at Python/ceval.c:3570 #41 0x1007a384 in PyEval_EvalFrame (f=0x104c50) at Python/ceval.c:2163 #42 0x1007ca80 in fast_function (func=0x121d30, pp_stack=0x104dc8, n=268928068, na=5194268, nk=1187120) at Python/ceval.c:3629 #43 0x1007c908 in call_function (pp_stack=0xbffff3fc, oparg=1255136) at Python/ceval.c:3568 #44 0x1007a384 in PyEval_EvalFrame (f=0x842c10) at Python/ceval.c:2163 #45 0x1007ca80 in fast_function (func=0x121d30, pp_stack=0x842dc8, n=268928068, na=5194268, nk=1187120) at Python/ceval.c:3629 #46 0x1007c908 in call_function (pp_stack=0xbffff5ac, oparg=1255136) at Python/ceval.c:3568 #47 0x1007a384 in PyEval_EvalFrame (f=0x108850) at Python/ceval.c:2163 #48 0x1007b4c8 in PyEval_EvalCodeEx (co=0x1, globals=0x1326e0, locals=0x42657475, args=0x10078444, argcount=1049072, kws=0x1007a374, kwcount=1, defs=0x83d96c, defcount=0, closure=0x0) at Python/ceval.c:2730 #49 0x1007e8bc in PyEval_EvalCode (co=0x121d30, globals=0x1326e0, locals=0x0) at Python/ceval.c:484 #50 0x100b3124 in run_node (n=0x10078444, filename=0x1326e0 "Alpha", globals=0x1, locals=0x1089a4, flags=0x0) at Python/pythonrun.c:1265 #51 0x100b28b0 in PyRun_SimpleFileExFlags (fp=0xa0009818, filename=0xbffffa10 "test.py", closeit=1083472, flags=0x10b849) at Python/pythonrun.c:860 #52 0x100bf884 in Py_Main (argc=129616, argv=0xbffffa13) at Modules/main.c:484 #53 0x000018d0 in _start (argc=129616, argv=0xa0009818, envp=0xbffffa13) at /SourceCache/Csu/Csu-47/crt.c:267 #54 0x8fe1a558 in __dyld__dyld_start ()

Marc-Antoine Parent wrote:
Really odd. I just tried the tests with libxml2 2.6.19 to see whether this made a difference, but no segfault, and no complaints from valgrind. I've also tried Paul's example, but no segfault, no valgrind complaints.. From this:
it looks like *something* is thrown away before it should be, or perhaps thrown away again when it's shouldn't be. Whether it's really a problem with libxml2 dictionary sharing or whether it's actually just the symptom I do not know -- often enough I saw these errors pop up on a simple double deallocation of a document. Since I cannot reproduce this memory error on my linux/athlon box I do not have a clear idea on how to proceed. One would assume this is some stealthy logic error in lxml that is only exposed on the PowerPC architecture for some reason, but everything is so quiet on ix86 it surprises me... Debugging this one is going to be tricky and I'll need lots of help. A mimimal test cast (like perhaps Paul's fragment) is helpful, and then you could look at enabling some of the print statements in etree.pyx and recompiling -- there's quite a few that have to do with delallocation debugging. I believe you guys *did* manage to run the tests successfully on the Mac in the past, correct? If so, another route we could take is to track down what change introduced this problem... Checking out older svn revisions and running the tests to do a binary search to identify where it broke might be the way to do. Again, I'll need help, as Mac OS X is not a familiar development platform to me. Regards, Martijn

Marc-Antoine Parent wrote: [snip]
Thanks! Interesting. The differences between etree.pyx between those versions isn't so large, and presumably/hopefully the root cause of the breakage is there. Did you have a problem with lxml 0.5, as opposed to 0.5.1? One of the changes that is slightly suspicious is this block: 1320c1323 < cdef tree.PyFileObject* o ---
cdef tree.PyObject* o
1326c1329 < o = <tree.PyFileObject*>f ---
o = <tree.PyObject*>f
introduced for Python 2.2 compatibility. Since Paul's example features opening files, it might be this one. In fact I *hope* it's this one, as it's be an easy fix. I hope it doesn't to do this this change: 1420,1421c1423 < if not hasProxy(c_node): < tree.xmlFreeNode(c_node) ---
attemptDeallocation(c_node)
but I think that this change is correct; the old code was definitely wrong. Regards, Martijn

Further analysis: It also did look like a more obscure segfault than it was because OSX contains an earlier libxml in /usr/lib, which the linker knows about, so we ended up compiling against one version and linking against another. Do I ever feel silly. MAP

OK... This may not be Paul's problem, but it sure is mine. (<sheepish/>) The change that kills my install lies in setup.py: Here is the faulty block and line, around line 140: ext_modules = [ Extension('lxml.etree', sources=['src/lxml/etree.pyx'], include_dirs=include_dirs, library_dirs=library_dirs, # Put this line back! libraries=['xml2', 'xslt'], extra_compile_args=['-w']) ] What happens is that the default link mode on OSX 10.3 is -undefined dynamic_lookup, i.e. if a symbol is not found, we will link anyway and hope to find it later in a dynlib. So the linking without that line looks like this: gcc -I/sw/lib -bundle -undefined dynamic_lookup build/temp.darwin-7.8.0-Power_Macintosh-2.4/src/lxml/etree.o -lxml2 -lxslt -o src/lxml/etree.so missing is "-L /usr/local/lib", which is where the libxml libraries are installed by default, and certainly on my machine. Note that we are running into the annoying fact that many unixy libraries are installed in /usr/local, which is not known to OS X, and the dynamic linker hence fails to look in this location. So another solution to the segfault is to tell the dynamic linker where to look, with export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/usr/local/lib But it is much better to specify the library location at library link time, as above, so that the etree.so library will itself contain the location of libxml.dylib and the dynamic linker will know what to do. After this, I pass all tests. Cheers, Marc-Antoine

Marc-Antoine Parent wrote:
OK... This may not be Paul's problem, but it sure is mine. (<sheepish/>)
Yay! I'm glad it was only a configuration issue for you. I hope we can also track down Paul's problem to this.
I've extended the use of the guessing code in setup.py to also extract the library information. Try the trunk and see whether that helps? Regards, Martijn

Marc-Antoine Parent wrote:
Really odd. I just tried the tests with libxml2 2.6.19 to see whether this made a difference, but no segfault, and no complaints from valgrind. I've also tried Paul's example, but no segfault, no valgrind complaints.. From this:
it looks like *something* is thrown away before it should be, or perhaps thrown away again when it's shouldn't be. Whether it's really a problem with libxml2 dictionary sharing or whether it's actually just the symptom I do not know -- often enough I saw these errors pop up on a simple double deallocation of a document. Since I cannot reproduce this memory error on my linux/athlon box I do not have a clear idea on how to proceed. One would assume this is some stealthy logic error in lxml that is only exposed on the PowerPC architecture for some reason, but everything is so quiet on ix86 it surprises me... Debugging this one is going to be tricky and I'll need lots of help. A mimimal test cast (like perhaps Paul's fragment) is helpful, and then you could look at enabling some of the print statements in etree.pyx and recompiling -- there's quite a few that have to do with delallocation debugging. I believe you guys *did* manage to run the tests successfully on the Mac in the past, correct? If so, another route we could take is to track down what change introduced this problem... Checking out older svn revisions and running the tests to do a binary search to identify where it broke might be the way to do. Again, I'll need help, as Mac OS X is not a familiar development platform to me. Regards, Martijn

Marc-Antoine Parent wrote: [snip]
Thanks! Interesting. The differences between etree.pyx between those versions isn't so large, and presumably/hopefully the root cause of the breakage is there. Did you have a problem with lxml 0.5, as opposed to 0.5.1? One of the changes that is slightly suspicious is this block: 1320c1323 < cdef tree.PyFileObject* o ---
cdef tree.PyObject* o
1326c1329 < o = <tree.PyFileObject*>f ---
o = <tree.PyObject*>f
introduced for Python 2.2 compatibility. Since Paul's example features opening files, it might be this one. In fact I *hope* it's this one, as it's be an easy fix. I hope it doesn't to do this this change: 1420,1421c1423 < if not hasProxy(c_node): < tree.xmlFreeNode(c_node) ---
attemptDeallocation(c_node)
but I think that this change is correct; the old code was definitely wrong. Regards, Martijn

Further analysis: It also did look like a more obscure segfault than it was because OSX contains an earlier libxml in /usr/lib, which the linker knows about, so we ended up compiling against one version and linking against another. Do I ever feel silly. MAP

OK... This may not be Paul's problem, but it sure is mine. (<sheepish/>) The change that kills my install lies in setup.py: Here is the faulty block and line, around line 140: ext_modules = [ Extension('lxml.etree', sources=['src/lxml/etree.pyx'], include_dirs=include_dirs, library_dirs=library_dirs, # Put this line back! libraries=['xml2', 'xslt'], extra_compile_args=['-w']) ] What happens is that the default link mode on OSX 10.3 is -undefined dynamic_lookup, i.e. if a symbol is not found, we will link anyway and hope to find it later in a dynlib. So the linking without that line looks like this: gcc -I/sw/lib -bundle -undefined dynamic_lookup build/temp.darwin-7.8.0-Power_Macintosh-2.4/src/lxml/etree.o -lxml2 -lxslt -o src/lxml/etree.so missing is "-L /usr/local/lib", which is where the libxml libraries are installed by default, and certainly on my machine. Note that we are running into the annoying fact that many unixy libraries are installed in /usr/local, which is not known to OS X, and the dynamic linker hence fails to look in this location. So another solution to the segfault is to tell the dynamic linker where to look, with export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/usr/local/lib But it is much better to specify the library location at library link time, as above, so that the etree.so library will itself contain the location of libxml.dylib and the dynamic linker will know what to do. After this, I pass all tests. Cheers, Marc-Antoine

Marc-Antoine Parent wrote:
OK... This may not be Paul's problem, but it sure is mine. (<sheepish/>)
Yay! I'm glad it was only a configuration issue for you. I hope we can also track down Paul's problem to this.
I've extended the use of the guessing code in setup.py to also extract the library information. Try the trunk and see whether that helps? Regards, Martijn
participants (2)
-
Marc-Antoine Parent
-
Martijn Faassen