Multi threaded lxml usage causes crash in etree.so (lxml.etree.c:160651)

Hey, Spent about a month trying to track down the cause of a crash in one of our webapps. When using uWSGI+Django, if you import lxml (without even using it) it will cause the worker to crash after the response has been sent back. I have replicated the same problem in a fresh vbox install. Attached is the gdb trace, and below is some info about the env + crash. Any advice on this would be *very* much appreciated, and will do anything I can to assist. Cal Leeming Simplicity Media Ltd ---- Tue Jan 17 20:01:51 2012 - *** Starting uWSGI 1.0.2.1 (64bit) on [Tue Jan 17 20:01:51 2012] *** Tue Jan 17 20:01:51 2012 - compiled with version: 4.4.5 on 16 January 2012 19:31:32 Tue Jan 17 20:01:51 2012 - current working directory: /home/duke Tue Jan 17 20:01:51 2012 - writing pidfile to /home/chroot/home/duke_local/webapps/ddcms/pid/uwsgi.pid Tue Jan 17 20:01:51 2012 - detected binary path: /home/duke/bin/uwsgi Tue Jan 17 20:01:51 2012 - limiting address space of processes... Tue Jan 17 20:01:51 2012 - your process address space limit is 536870912 bytes (512 MB) Tue Jan 17 20:01:51 2012 - your memory page size is 4096 bytes Tue Jan 17 20:01:51 2012 - allocated 7808 bytes (7 KB) for 8 cores per worker. Tue Jan 17 20:01:51 2012 - uwsgi socket 0 bound to UNIX address /home/chroot/home/duke_local/webapps/ddcms/sock/uwsgi.sock fd 3 Tue Jan 17 20:01:51 2012 - Python main interpreter initialized at 0x7c5e90 Tue Jan 17 20:01:51 2012 - threads support enabled Tue Jan 17 20:01:51 2012 - your server socket listen backlog is limited to 1024 connections Tue Jan 17 20:01:51 2012 - *** Operational MODE: threaded *** Tue Jan 17 20:01:51 2012 - added /home/chroot/home/duke_local/webapps/ddcms/webapp/ to pythonpath. Tue Jan 17 20:01:51 2012 - WSGI application 0 (mountpoint='/') ready on interpreter 0x7c5e90 pid: 3527 (default app) Tue Jan 17 20:01:51 2012 - spawned uWSGI master process (pid: 3527) Tue Jan 17 20:01:51 2012 - spawned uWSGI worker 1 (pid: 3528, cores: 8) Tue Jan 17 20:01:54 2012 - DAMN ! worker 1 (pid: 3528) died :( trying respawn ... Tue Jan 17 20:01:54 2012 - Respawned uWSGI worker 1 (new pid: 3541) Tue Jan 17 20:01:56 2012 - DAMN ! worker 1 (pid: 3541) died :( trying respawn ... Tue Jan 17 20:01:56 2012 - Respawned uWSGI worker 1 (new pid: 3553) [ 3276.149102] uwsgi[3531]: segfault at 27b00 ip 00007fefd0d802d4 sp 00007fefd952d6f0 error 4 in etree.so[7fefd0d10000+109000] [ 3278.533480] uwsgi[3542]: segfault at 27b00 ip 00007fefd0d802d4 sp 00007fefda52f6f0 error 4 in etree.so[7fefd0d10000+109000] Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:48:11) [GCC 4.4.5] lxml.etree: (2, 3, 3, 0) libxml used: (2, 7, 8) libxml compiled: (2, 7, 8) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 26) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f912f2df700 (LWP 3432)] 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 160651 src/lxml/lxml.etree.c: No such file or directory. in src/lxml/lxml.etree.c (gdb) bt full #0 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 __pyx_t_1 = 0x0 __pyx_t_5 = <value optimized out> __pyx_t_6 = 0x7f91335ec848 __pyx_t_7 = 0x0 __pyx_t_8 = 0x7f9126b75fe0 __pyx_t_9 = 0x7f91328baa60 __pyx_t_11 = <value optimized out> __pyx_t_12 = <value optimized out> __pyx_t_13 = <value optimized out> __pyx_t_14 = <value optimized out> #0 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 __pyx_t_7 = ((PyObject *)PyUnicode_DecodeASCII(xmlParserVersion, strlen(xmlParserVersion), NULL)); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 232; __pyx_clineno = __LINE__; goto __pyx_L34_error;}

Cal Leeming [Simplicity Media Ltd], 17.01.2012 21:04:
Spent about a month trying to track down the cause of a crash in one of our webapps.
When using uWSGI+Django, if you import lxml (without even using it) it will cause the worker to crash after the response has been sent back.
I have replicated the same problem in a fresh vbox install.
Attached is the gdb trace, and below is some info about the env + crash.
Any advice on this would be *very* much appreciated, and will do anything I can to assist.
Thanks for this very detailed report.
Tue Jan 17 20:01:51 2012 - *** Starting uWSGI 1.0.2.1 (64bit) on [Tue Jan 17 20:01:51 2012] *** Tue Jan 17 20:01:51 2012 - Python main interpreter initialized at 0x7c5e90 Tue Jan 17 20:01:51 2012 - threads support enabled Tue Jan 17 20:01:51 2012 - *** Operational MODE: threaded *** Tue Jan 17 20:01:51 2012 - WSGI application 0 (mountpoint='/') ready on interpreter 0x7c5e90 pid: 3527 (default app) Tue Jan 17 20:01:51 2012 - spawned uWSGI master process (pid: 3527) Tue Jan 17 20:01:51 2012 - spawned uWSGI worker 1 (pid: 3528, cores: 8) Tue Jan 17 20:01:54 2012 - DAMN ! worker 1 (pid: 3528) died :( trying respawn ... Tue Jan 17 20:01:54 2012 - Respawned uWSGI worker 1 (new pid: 3541) Tue Jan 17 20:01:56 2012 - DAMN ! worker 1 (pid: 3541) died :( trying respawn ... Tue Jan 17 20:01:56 2012 - Respawned uWSGI worker 1 (new pid: 3553)
[ 3276.149102] uwsgi[3531]: segfault at 27b00 ip 00007fefd0d802d4 sp 00007fefd952d6f0 error 4 in etree.so[7fefd0d10000+109000] [ 3278.533480] uwsgi[3542]: segfault at 27b00 ip 00007fefd0d802d4 sp 00007fefda52f6f0 error 4 in etree.so[7fefd0d10000+109000]
Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:48:11) [GCC 4.4.5]
lxml.etree: (2, 3, 3, 0) libxml used: (2, 7, 8) libxml compiled: (2, 7, 8) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 26)
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f912f2df700 (LWP 3432)] 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 160651 src/lxml/lxml.etree.c: No such file or directory. in src/lxml/lxml.etree.c (gdb) bt full #0 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 __pyx_t_1 = 0x0 __pyx_t_5 = <value optimized out> __pyx_t_6 = 0x7f91335ec848 __pyx_t_7 = 0x0 __pyx_t_8 = 0x7f9126b75fe0 __pyx_t_9 = 0x7f91328baa60 __pyx_t_11 = <value optimized out> __pyx_t_12 = <value optimized out> __pyx_t_13 = <value optimized out> __pyx_t_14 = <value optimized out>
#0 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 __pyx_t_7 = ((PyObject *)PyUnicode_DecodeASCII(xmlParserVersion, strlen(xmlParserVersion), NULL)); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 232; __pyx_clineno = __LINE__; goto __pyx_L34_error;}
The code line you present here is very unexpected, and therefore also very indicative. All lxml.etree does at this point is to read the version string of libxml2 from that library, using an exported C variable. I can only guess that something has seriously gone wrong with the thread initialisation that prevented the libxml2 library from initialising properly, so that it couldn't even assign that variable. Normally, CPython's import lock should at least prevent multiple threads from importing lxml simultaneously, but I can't tell how libxml2 behaves when it is loaded and configured from within a thread. The top-of-the-list thing that you should try is to import lxml into the interpreter *before* the server starts any threads. I don't know if uWSGI has a way to do that (at least, the log indicated that it starts CPython before enabling thread support), but if there is any chance of hooking into the initialisation for that purpose, even just site.py or other CPython init scripts, I'd say it's your best bet. Stefan

Hi Stefan, Thank you so much for your fast response on this. I forwarded this info to Roberto at unbit (uwsgi vendors), he suggested that the uWSGI xml support may be causing some sort of conflict, and to either remove xml support from the compile or not reference it in command line. After removing xml config support (i.e. not using -x), the crashing problem seems to have been fixed! (it's going to be stress tested for 7+ days, but it's looking very promising). Really a big thank you to giving us a pointer in the right direction, and for providing a detailed explanation. Hope this information will help someone else in the future. Cheers Cal On Wed, Jan 18, 2012 at 7:47 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Cal Leeming [Simplicity Media Ltd], 17.01.2012 21:04:
Spent about a month trying to track down the cause of a crash in one of our webapps.
When using uWSGI+Django, if you import lxml (without even using it) it will cause the worker to crash after the response has been sent back.
I have replicated the same problem in a fresh vbox install.
Attached is the gdb trace, and below is some info about the env + crash.
Any advice on this would be *very* much appreciated, and will do anything I can to assist.
Thanks for this very detailed report.
Tue Jan 17 20:01:51 2012 - *** Starting uWSGI 1.0.2.1 (64bit) on [Tue Jan 17 20:01:51 2012] *** Tue Jan 17 20:01:51 2012 - Python main interpreter initialized at 0x7c5e90 Tue Jan 17 20:01:51 2012 - threads support enabled Tue Jan 17 20:01:51 2012 - *** Operational MODE: threaded *** Tue Jan 17 20:01:51 2012 - WSGI application 0 (mountpoint='/') ready on interpreter 0x7c5e90 pid: 3527 (default app) Tue Jan 17 20:01:51 2012 - spawned uWSGI master process (pid: 3527) Tue Jan 17 20:01:51 2012 - spawned uWSGI worker 1 (pid: 3528, cores: 8) Tue Jan 17 20:01:54 2012 - DAMN ! worker 1 (pid: 3528) died :( trying respawn ... Tue Jan 17 20:01:54 2012 - Respawned uWSGI worker 1 (new pid: 3541) Tue Jan 17 20:01:56 2012 - DAMN ! worker 1 (pid: 3541) died :( trying respawn ... Tue Jan 17 20:01:56 2012 - Respawned uWSGI worker 1 (new pid: 3553)
[ 3276.149102] uwsgi[3531]: segfault at 27b00 ip 00007fefd0d802d4 sp 00007fefd952d6f0 error 4 in etree.so[7fefd0d10000+109000] [ 3278.533480] uwsgi[3542]: segfault at 27b00 ip 00007fefd0d802d4 sp 00007fefda52f6f0 error 4 in etree.so[7fefd0d10000+109000]
Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:48:11) [GCC 4.4.5]
lxml.etree: (2, 3, 3, 0) libxml used: (2, 7, 8) libxml compiled: (2, 7, 8) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 26)
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f912f2df700 (LWP 3432)] 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 160651 src/lxml/lxml.etree.c: No such file or directory. in src/lxml/lxml.etree.c (gdb) bt full #0 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 __pyx_t_1 = 0x0 __pyx_t_5 = <value optimized out> __pyx_t_6 = 0x7f91335ec848 __pyx_t_7 = 0x0 __pyx_t_8 = 0x7f9126b75fe0 __pyx_t_9 = 0x7f91328baa60 __pyx_t_11 = <value optimized out> __pyx_t_12 = <value optimized out> __pyx_t_13 = <value optimized out> __pyx_t_14 = <value optimized out>
#0 0x00007f9126b7e2d4 in initetree () at src/lxml/lxml.etree.c:160651 __pyx_t_7 = ((PyObject *)PyUnicode_DecodeASCII(xmlParserVersion, strlen(xmlParserVersion), NULL)); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 232; __pyx_clineno = __LINE__; goto __pyx_L34_error;}
The code line you present here is very unexpected, and therefore also very indicative. All lxml.etree does at this point is to read the version string of libxml2 from that library, using an exported C variable. I can only guess that something has seriously gone wrong with the thread initialisation that prevented the libxml2 library from initialising properly, so that it couldn't even assign that variable. Normally, CPython's import lock should at least prevent multiple threads from importing lxml simultaneously, but I can't tell how libxml2 behaves when it is loaded and configured from within a thread.
The top-of-the-list thing that you should try is to import lxml into the interpreter *before* the server starts any threads. I don't know if uWSGI has a way to do that (at least, the log indicated that it starts CPython before enabling thread support), but if there is any chance of hooking into the initialisation for that purpose, even just site.py or other CPython init scripts, I'd say it's your best bet.
Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
participants (2)
-
Cal Leeming [Simplicity Media Ltd]
-
Stefan Behnel