Tracing down segfault
t-meyer at ihug.co.nz
Sun Jun 26 10:49:28 CEST 2005
>> I have (unfortunately) a Python program that I can
>> consistently (in a reproducible way) segfault.
> The _best_ thing to do next is to rebuild Python, and as many other
> packages as possible, in debug mode.
> It's especially useful to rebuild Python that way. Many asserts are
> enabled then, and all of Python's memory allocations go thru a special
> debug allocator then with gimmicks to try and catch out-of-bounds
> stores, double frees, and use of free()'d memory.
I wondered if that might help. I really ought to get around to doing a
debug build someday, I guess. It just doesn't seem like the easiest thing
to do on Windows without the MS tools (although I do recall various c.l.p
messages indicating that the required patches were around somewhere).
Luckily (see below), I managed to avoid it this time.
> You didn't mention which version of any of these you're using, or the
> OS in use. Playing historical odds, and assuming relatively recent
> versions of all, wx is the best guess.
Sorry (although your guesses were pretty good; just as well you pop up
everywhere I post to help me out <0.5 wink>). Windows XP SP2, Python 2.3.5
or 2.4.1, ZODB 3.4.0, wxPython 22.214.171.124.
After taking 24 hours off, I figured out that I could reasonably easily run
the code without using anything that imported wx, and, sure enough, the
segfault doesn't occur. Good in that it's much less code to look at, but
bad in that I didn't write any of the code that uses wx...
[Tony, suspecting threading to be the cause]
> It's unlikely to be the true cause. Apart from some new-in-2.4
> thread-local storage gimmicks, all of the threading module is written
> in Python too. NULL pointers are a (depressingly common) C problem.
Oh well. I was working on a few things at once, and it's possible that I
only noticed it after adding the new threading code.
> So only a single thread is running at the time the segfault occurs?
As far I know, yes. At least, all the threads that I created I have called
join() on without any timeout.
> Is Python also in the process of tearing itself down (i.e., is the
> program trying to exit?).
Yes, the program is trying to exit. It's after the a call is made to
"wx.GetApp().ExitMainLoop()" (but a print statement after that does print).
If I run with python -v, it dies before any of the # clear statements get
printed out. I'm not sure what runs between those :(
The 24 hours off helped me think much more clearly. Once I had narrowed it
down to wx (and stopped worrying about all the threading code) I managed to
find a line of Python that commenting out would get rid of the segfault and
uncommenting would return the segfault.
FWIW, I believe what was happening is that there was a wx.TaskBarIcon class
that, when the "Exit" menu item was chosen, would call an exit function of
the main frame of the wx application. That exit function called "Destroy()"
on the TaskBarIcon class - and it was this call that caused the segfault. I
presume that the problem was that the TaskBarIcon class was waiting for that
exit function to be finished, and didn't appreciate being Destroy()d while
it was waiting.
Many thanks for the help! (Like the recent ZODB problem, the help was
somewhat lateral - here, pointing me away from the threads).
More information about the Python-list