
Guido van Rossum <guido@python.org> writes:
Very interesting! Had you done a previous non-debug build in the same directory? (Even if you did a "make clobber" before restarting -- you never know what leave-behind could cause this.)
Yes. I did do a make clobber && ./configure before building the debug version, but it's not a clean checkout. OK, just did a clean checkout, no patch, and got the same result.
The next thing I'd try would be to start the python executable that was build under gdb and play around in it, like this:
$ gdb ./python [...] (gdb) run Starting program: /home/guido/projects/python/dist/src/debug/python [New Thread 1074895104 (LWP 30404)] Python 2.4a0 (#2, Dec 22 2003, 11:02:19) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
print 2+2 # try various things that don't need external modules except sys # define a function, etc... # if that doesn't segfault, try: from test import autotest [...]
Eventually I expect you'd get a segfault; at that point you can use the gdb 'bt' command to get a stack trace. Hopefully it'll point to some innocent C code that gets mistreated by the non-debug compiler...
It doesn't get as far as the banner: Script started on Mon Dec 22 14:42:19 2003 hydra /home/kbk/proj/sandbox/python_clean$ gdb ./python GNU gdb 4.16.1 Copyright 1996 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-openbsd3.3"... (gdb) run Starting program: /home/kbk/proj/sandbox/python_clean/./python Program received signal SIGSEGV, Segmentation fault. 0x401900a0 in strchr () (gdb) quit The program is running. Quit anyway (and kill it)? (y or n) y hydra /home/kbk/proj/sandbox/python_clean$ Script done on Mon Dec 22 14:42:38 2003 Investigating. -- KBK

It doesn't get as far as the banner:
Script started on Mon Dec 22 14:42:19 2003 hydra /home/kbk/proj/sandbox/python_clean$ gdb ./python GNU gdb 4.16.1 Copyright 1996 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-openbsd3.3"... (gdb) run Starting program: /home/kbk/proj/sandbox/python_clean/./python
Program received signal SIGSEGV, Segmentation fault. 0x401900a0 in strchr () (gdb) quit The program is running. Quit anyway (and kill it)? (y or n) y hydra /home/kbk/proj/sandbox/python_clean$ Script done on Mon Dec 22 14:42:38 2003
Investigating.
That suggests it's still in Py_Initialize(). What does the gdb command 'bt' say??? I'd also try another experiment: instead of "run" try "run -S". This passes the -S option to Python when it is started, so that it doesn't try to load site.py (which executes rather a lot of Python code). I'd be interested in seeing how much you can do interactively in that case, of if it still crashes in Py_Initialize(). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum <guido@python.org> writes:
That suggests it's still in Py_Initialize(). What does the gdb command 'bt' say???
I'd also try another experiment: instead of "run" try "run -S". This passes the -S option to Python when it is started, so that it doesn't try to load site.py (which executes rather a lot of Python code). I'd be interested in seeing how much you can do interactively in that case, of if it still crashes in Py_Initialize().
Script started on Mon Dec 22 15:20:25 2003 hydra /home/kbk/proj/sandbox/python_clean$ gdb ./python GNU gdb 4.16.1 Copyright 1996 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-openbsd3.3"... (gdb) run -S Starting program: /home/kbk/proj/sandbox/python_clean/./python -S Program received signal SIGSEGV, Segmentation fault. 0x401900a0 in strchr () (gdb) bt #0 0x401900a0 in strchr () #1 0x1d11d in load_next (mod=0x10905c, altmod=0x10905c, p_name=0xcfbfd640, buf=0xcfbfd230 "__builtin__", p_buflen=0xcfbfd22c) at Python/import.c:2004 #2 0x1cc40 in import_module_ex (name=0x19786 "__builtin__", globals=0x0, locals=0x0, fromlist=0x0) at Python/import.c:1888 #3 0x1ce29 in PyImport_ImportModuleEx (name=0x19786 "__builtin__", globals=0x0, locals=0x0, fromlist=0x0) at Python/import.c:1922 #4 0x1dfe9 in PyImport_Import (module_name=0x115598) at Python/import.c:2333 #5 0x1caec in PyImport_ImportModule (name=0xb895e "__builtin__") at Python/import.c:1853 #6 0xb8b3d in _PyExc_Init () at Python/exceptions.c:1755 #7 0x25ad4 in Py_Initialize () at Python/pythonrun.c:205 #8 0x282f in Py_Main (argc=2, argv=0xcfbfd82c) at Modules/main.c:376 #9 0x17e3 in main (argc=2, argv=0xcfbfd82c) at Modules/python.c:23 (gdb) q The program is running. Quit anyway (and kill it)? (y or n) y hydra /home/kbk/proj/sandbox/python_clean$ Script done on Mon Dec 22 15:21:06 2003 Same result w/o -S ============================================================= Slightly later: (gdb) b import.c:2004 Breakpoint 1 at 0x1d10f: file Python/import.c, line 2004. (gdb) r Starting program: /home/kbk/proj/sandbox/python_clean/./python Breakpoint 1, load_next (mod=0xe98ec, altmod=0xe98ec, p_name=0xcfbfd88c, buf=0xcfbfd47c "", p_buflen=0xcfbfd478) at Python/import.c:2004 2004 char *dot = strchr(name, '.'); (gdb) p *p_name $1 = 0x19786 "__builtin__" (gdb) p name $2 = 0x19786 "__builtin__" (gdb) p strchr(name, '.') Program received signal SIGSEGV, Segmentation fault. OTOH, if I break at 2004 and then step once, I get by the strchr call OK. Also if I stepi through it. If I continue, it segfaults at the next execution of line 2004. Weird. Investigating. -- KBK

(JD: yes, it's real!)
Script started on Mon Dec 22 15:20:25 2003 hydra /home/kbk/proj/sandbox/python_clean$ gdb ./python GNU gdb 4.16.1 Copyright 1996 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-openbsd3.3"... (gdb) run -S Starting program: /home/kbk/proj/sandbox/python_clean/./python -S
Program received signal SIGSEGV, Segmentation fault. 0x401900a0 in strchr () (gdb) bt #0 0x401900a0 in strchr () #1 0x1d11d in load_next (mod=0x10905c, altmod=0x10905c, p_name=0xcfbfd640, buf=0xcfbfd230 "__builtin__", p_buflen=0xcfbfd22c) at Python/import.c:2004 #2 0x1cc40 in import_module_ex (name=0x19786 "__builtin__", globals=0x0, locals=0x0, fromlist=0x0) at Python/import.c:1888 #3 0x1ce29 in PyImport_ImportModuleEx (name=0x19786 "__builtin__", globals=0x0, locals=0x0, fromlist=0x0) at Python/import.c:1922 #4 0x1dfe9 in PyImport_Import (module_name=0x115598) at Python/import.c:2333 #5 0x1caec in PyImport_ImportModule (name=0xb895e "__builtin__") at Python/import.c:1853 #6 0xb8b3d in _PyExc_Init () at Python/exceptions.c:1755 #7 0x25ad4 in Py_Initialize () at Python/pythonrun.c:205 #8 0x282f in Py_Main (argc=2, argv=0xcfbfd82c) at Modules/main.c:376 #9 0x17e3 in main (argc=2, argv=0xcfbfd82c) at Modules/python.c:23 (gdb) q The program is running. Quit anyway (and kill it)? (y or n) y hydra /home/kbk/proj/sandbox/python_clean$ Script done on Mon Dec 22 15:21:06 2003
Same result w/o -S
============================================================= Slightly later:
(gdb) b import.c:2004 Breakpoint 1 at 0x1d10f: file Python/import.c, line 2004. (gdb) r Starting program: /home/kbk/proj/sandbox/python_clean/./python
Breakpoint 1, load_next (mod=0xe98ec, altmod=0xe98ec, p_name=0xcfbfd88c, buf=0xcfbfd47c "", p_buflen=0xcfbfd478) at Python/import.c:2004 2004 char *dot = strchr(name, '.'); (gdb) p *p_name $1 = 0x19786 "__builtin__" (gdb) p name $2 = 0x19786 "__builtin__" (gdb) p strchr(name, '.')
Program received signal SIGSEGV, Segmentation fault.
OTOH, if I break at 2004 and then step once, I get by the strchr call OK. Also if I stepi through it. If I continue, it segfaults at the next execution of line 2004. Weird.
Investigating.
The most likely cause then is some kind of bug in the platform's strchr(). This could explain why -O3 fixes the issue: I think I've heard of GCC replacing calls to strchr(), strcpy() etc. with inline code, thereby avoiding the buggy library version (and explaining why the buggy code could persist undetected in the library -- most system code is of course compiled fully optimized). As to why stepi doesn't trigger the segfault: possibly it's a timing bug that doesn't occur when run one instruction at a time. This would even make it CPU dependent, which would explain that some folks didn't see this. I don't have the OpenBSD strchr.c source code online here so I'll stop speculating here... --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I don't have the OpenBSD strchr.c source code online here so I'll stop speculating here...
It's this: http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/index.c I can't see anything wrong in it, and it hasn't significantly changed in ages, either. I'd be curious what p_name points to when the process crashes. (gdb) p *p_name Regards, Martin

[Guido]
I don't have the OpenBSD strchr.c source code online here so I'll stop speculating here...
[Martin v. Loewis]
It's this:
http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/index.c
I can't see anything wrong in it, and it hasn't significantly changed in ages, either.
Looks fine to me too -- and it's a very simple function. Google didn't turn up any suggestion of strchr problems under OpenBSD either. Hate to say it, but the pointer passed *to* strchr must be insane, and that makes it more likely a Python, or platform compiler, bug.
I'd be curious what p_name points to when the process crashes.
(gdb) p *p_name

"Tim Peters" <tim.one@comcast.net> writes:
Hate to say it, but the pointer passed *to* strchr must be insane, and that makes it more likely a Python, or platform compiler, bug.
There are two calls to load_next() in import_module_ex(). The segfault is occuring during the second call. The code is somewhat pathological in that the callee, load_next(), is modifying the caller's /parameters/ by changing the contents of name. For some reason, the compiler emits code which makes a copy of import_module_ex()'s parameters in the stack frame. When load_next() is called, the reference &name is the location in the parameter area of the frame, but when name is tested in the while loop, the copy in the local area of the frame is used. Since this has not been modified by load_next(), the fact that name has been set to 0x00 is missed. load_next() gets called erroneously and passes a null pointer to strchr. I tried a volatile declaration, but no joy. Adding a proper local, mod_name, resolved the problem. -- KBK Index: Python/import.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/import.c,v retrieving revision 2.225 diff -c -r2.225 import.c *** Python/import.c 20 Nov 2003 01:44:58 -0000 2.225 --- Python/import.c 23 Dec 2003 14:56:40 -0000 *************** *** 1871,1876 **** --- 1871,1877 ---- PyObject *fromlist) { char buf[MAXPATHLEN+1]; + char *mod_name; int buflen = 0; PyObject *parent, *head, *next, *tail; *************** *** 1878,1891 **** if (parent == NULL) return NULL; ! head = load_next(parent, Py_None, &name, buf, &buflen); if (head == NULL) return NULL; tail = head; Py_INCREF(tail); ! while (name) { ! next = load_next(tail, tail, &name, buf, &buflen); Py_DECREF(tail); if (next == NULL) { Py_DECREF(head); --- 1879,1893 ---- if (parent == NULL) return NULL; ! mod_name = name; ! head = load_next(parent, Py_None, &mod_name, buf, &buflen); if (head == NULL) return NULL; tail = head; Py_INCREF(tail); ! while (mod_name) { ! next = load_next(tail, tail, &mod_name, buf, &buflen); Py_DECREF(tail); if (next == NULL) { Py_DECREF(head);

There are two calls to load_next() in import_module_ex(). The segfault is occuring during the second call.
The code is somewhat pathological in that the callee, load_next(), is modifying the caller's /parameters/ by changing the contents of name.
For some reason, the compiler emits code which makes a copy of import_module_ex()'s parameters in the stack frame. When load_next() is called, the reference &name is the location in the parameter area of the frame, but when name is tested in the while loop, the copy in the local area of the frame is used. Since this has not been modified by load_next(), the fact that name has been set to 0x00 is missed. load_next() gets called erroneously and passes a null pointer to strchr.
I tried a volatile declaration, but no joy. Adding a proper local, mod_name, resolved the problem.
Wow. Thanks for the analysis. But this is clearly a compiler bug. Where do we report that? And why would it be unique to OpenBSD? In the mean time, Kurt, please check in your fix -- it can't hurt, and we might as well avoid the pain for the next person who wants to build a debugging Python. The fix could use a comment referring to a compiler bug, to keep the next maintainer from unfixing it. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (4)
-
Guido van Rossum
-
kbk@shore.net
-
Martin v. Loewis
-
Tim Peters