How to debug pyexpat SIGSEGV with GDB?
Sorry if this is OT. I've hit a repeatable segfault in pyexpat on RH Linux 9 with Python 2.3.3 (I'm trying rss2email). It seems that XML_parse is returning an error, but when XML_GetCurrentLineNumber is called positionPtr is not valid. I catch this in GDB and have looked at the stack (see below), but when I get back up the stack into PyCFunction_Call I don't know what to do. Ideally, I want to find out the Python source file and line number that is currently being executed, then look at the Python source to figure out exactly which pyexpat call is being made just before the call to get_parse_result. It seems strange that (it appears) that Python code is calling get_parse_result directly. So, how can I figure out where in the Python source the function call is coming from using gdb? I'm sure it involves "print" and some casts.. I couldn't find a howto on python.org -- Python 2.3.3 (#1, Dec 22 2003, 14:01:09) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Starting program: /usr/local/bin/python2.3 rss2email.py feeds.dat run --no-send [New Thread 1074948352 (LWP 2379)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1074948352 (LWP 2379)] normal_updatePosition (enc=0x407211c0, ptr=0x40785000
, end=0x823ac09 "öteborg, Sweden, 7-9 June 2004. To help us serve the communit y in the best way possible, we need your input on what you think we should do in the way of tutorials.\"</i>]\n\n<!-- /newsinfo -->\n</p>\n\n\n, end=0x823ac09 "öteborg, Sweden, 7-9 June 2004. To help us serve the communit y in the best way possible, we need your input on what you think we should do in the way of tutorials.\"</i>]\n\n<!-- /newsinfo -->\n</p>\n\n\n
, end=0x823ac09 "öteborg, Sweden, 7-9 June 2004." pos=0x8238954) at xmltok_impl.c:1745 1745 switch (BYTE_TYPE(enc, ptr)) { (gdb) list 1740 const char *ptr, 1741 const char *end, 1742 POSITION *pos) 1743 { 1744 while (ptr != end) { 1745 switch (BYTE_TYPE(enc, ptr)) { 1746 #define LEAD_CASE(n) \ 1747 case BT_LEAD ## n: \ 1748 ptr += n; \ 1749 break; (gdb) (gdb) frame 1 #1 0x40701804 in XML_GetCurrentLineNumber (parser=0x82387c0) at /usr/local/src/Python-2.3.3/Modules/expat/xmlparse.c:1605 1605 XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); (gdb) list 1600 1601 int XMLCALL 1602 XML_GetCurrentLineNumber(XML_Parser parser) 1603 { 1604 if (eventPtr) { 1605 XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); 1606 positionPtr = eventPtr; 1607 } 1608 return position.lineNumber + 1; 1609 } (gdb) frame 2 #2 0x406ff800 in set_error (self=0x4067e8ec, code=XML_ERROR_INVALID_TOKEN) at /usr/local/src/Python-2.3.3/Modules/pyexpat.c:124 124 int lineno = XML_GetErrorLineNumber(parser); (gdb) list 119 set_error(xmlparseobject *self, enum XML_Error code) 120 { 121 PyObject *err; 122 char buffer[256]; 123 XML_Parser parser = self->itself; 124 int lineno = XML_GetErrorLineNumber(parser); 125 int column = XML_GetErrorColumnNumber(parser); 126 127 /* There is no risk of overflowing this buffer, since 128 even for 64-bit integers, there is sufficient space. */ (gdb) print *parser $2 = {m_userData = 0x2, m_handlerArg = 0x40720ac0, m_buffer = 0x82387c0 "ìèg@ìèg@\b@v@TA\005\b$D\005\b´E\005\b\b@v@=Dw@\b@x@5\004\001", m_mem = {malloc_fcn = 0x1, realloc_fcn = 0, free_fcn = 0}, m_bufferPtr = 0x0, m_bufferEnd = 0x1
, m_bufferLim = 0x0, m_parseEndByteIndex = 8192, m_parseEndPtr = 0x0, m_dataBuf = 0x0, m_dataBufEnd = 0x81e66b0 "<\224f@¬\222f@\204\222f@ì\223f@ü\222f@\024\224f@Brad Clements wrote:
Sorry if this is OT.
I've hit a repeatable segfault in pyexpat on RH Linux 9 with Python 2.3.3 (I'm trying rss2email).
It seems that XML_parse is returning an error, but when XML_GetCurrentLineNumber is called positionPtr is not valid.
I catch this in GDB and have looked at the stack (see below), but when I get back up the stack into PyCFunction_Call I don't know what to do.
Ideally, I want to find out the Python source file and line number that is currently being executed, then look at the Python source to figure out exactly which pyexpat call is being made just before the call to get_parse_result.
It seems strange that (it appears) that Python code is calling get_parse_result directly.
So, how can I figure out where in the Python source the function call is coming from using gdb? I'm sure it involves "print" and some casts.. I couldn't find a howto on python.org
Just as a wild idea: Could this be related to the fact that Python 2.3.3 doesn't maintain explicit line numbers, any longer? -- Christian Tismer :^) mailto:tismer@stackless.com Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
"Brad Clements"
Sorry if this is OT.
Marginally.
I've hit a repeatable segfault in pyexpat on RH Linux 9 with Python 2.3.3 (I'm trying rss2email).
It seems that XML_parse is returning an error, but when XML_GetCurrentLineNumber is called positionPtr is not valid.
I catch this in GDB and have looked at the stack (see below), but when I get back up the stack into PyCFunction_Call I don't know what to do.
That's possibly because you haven't gone far enough. PyCFunction_Call is what you call to execute a builtin function. You need to work your way a couple more levels up the stack.
Ideally, I want to find out the Python source file and line number that is currently being executed, then look at the Python source to figure out exactly which pyexpat call is being made just before the call to get_parse_result.
It seems strange that (it appears) that Python code is calling get_parse_result directly.
Build a debug build, maybe? If a function looks like: PyObject* foo() { /* stuff with no function calls, or only inlineable function calls */ return Function(args); } gcc, at least, will not set up a stack frame for foo.
So, how can I figure out where in the Python source the function call is coming from using gdb? I'm sure it involves "print" and some casts.. I couldn't find a howto on python.org
Read lots of source. Cheers, mwh -- 31. Simplicity does not precede complexity, but follows it. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
On Fri, 2004-02-27 at 18:00, Brad Clements wrote:
So, how can I figure out where in the Python source the function call is coming from using gdb? I'm sure it involves "print" and some casts.. I couldn't find a howto on python.org
First, make sure that the code from Misc/gdbinit is in your .gdbinit file. Get the stack trace in gdb and move up/down until you get to an eval_frame() frame. Then call the function pyframe. It will print the filename, function name, and line number of the current frame. The lineno usually points to the first line of the function. Jeremy
>> So, how can I figure out where in the Python source the function call >> is coming from using gdb? I'm sure it involves "print" and some >> casts.. I couldn't find a howto on python.org Jeremy> First, make sure that the code from Misc/gdbinit is in your Jeremy> .gdbinit file. Get the stack trace in gdb and move up/down Jeremy> until you get to an eval_frame() frame. Then call the function Jeremy> pyframe. It will print the filename, function name, and line Jeremy> number of the current frame. The lineno usually points to the Jeremy> first line of the function. I have this in my .gdbinit file: define ppystack while $pc < Py_Main || $pc > Py_GetArgcArgv if $pc > eval_frame && $pc < PyEval_EvalCodeEx set $__fn = PyString_AsString(co->co_filename) set $__n = PyString_AsString(co->co_name) printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n end up-silently 1 end select-frame 0 end Skip
On Mon, 2004-03-01 at 10:09, Skip Montanaro wrote:
>> So, how can I figure out where in the Python source the function call >> is coming from using gdb? I'm sure it involves "print" and some >> casts.. I couldn't find a howto on python.org
Jeremy> First, make sure that the code from Misc/gdbinit is in your Jeremy> .gdbinit file. Get the stack trace in gdb and move up/down Jeremy> until you get to an eval_frame() frame. Then call the function Jeremy> pyframe. It will print the filename, function name, and line Jeremy> number of the current frame. The lineno usually points to the Jeremy> first line of the function.
I have this in my .gdbinit file:
define ppystack while $pc < Py_Main || $pc > Py_GetArgcArgv if $pc > eval_frame && $pc < PyEval_EvalCodeEx set $__fn = PyString_AsString(co->co_filename) set $__n = PyString_AsString(co->co_name) printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n end up-silently 1 end select-frame 0 end
That's nice! I never learned how to write real programs in gdb. You should add a copy to gdbinit. Jeremy
>> I have this in my .gdbinit file: ... Jeremy> That's nice! I never learned how to write real programs in gdb. Jeremy> You should add a copy to gdbinit. Done. I renamed it simply "pystack" and added a short comment describing its while and if tests as well as flag comments to the relevant files alerting people to the dependency of pystack on those files. Skip
participants (6)
-
Barry Warsaw
-
Brad Clements
-
Christian Tismer
-
Jeremy Hylton
-
Michael Hudson
-
Skip Montanaro