Core dump in garbage collection: _PyGC_Insert???

No problems (apart from the usual test failures) with the CVS from 10 July, but with CVS from 11 July I get core dumps at random points in "make test" which seem to point to a problem with the GC code. Sample dbx output:
(Platform is DEC (Compaq) Alpha, Compaq C T6.3-125 (dtk) on Digital UNIX V4.0F (Rev. 1229))
dbx version 3.11.10 Type 'help' for help. Core file created by program "python"
thread 0xb signal Segmentation fault at >*[__nxm_thread_kill, 0x3ff805c7ca8] ret r31, (r26), 1 (dbx) where
0 __nxm_thread_kill(0x3ffc01b3c10, 0x1200f47dc, 0x0, 0x11fffeed0, 0x3ff8059c724) [0x3ff805c7ca8]
1 pthread_kill(0x1200ea970, 0x1, 0x0, 0x11fffeee0, 0x3ffc01b36c0) [0x3ff805ad6f4] 2 (unknown)() [0x3ff8059712c] 3 (unknown)() [0x3ff807e370c] 4 exc_unwind(0x11fffaf28, 0xabadabad00beed00, 0x3ff80592b90, 0x11fffb1c8, 0x3ff807e3acc) [0x3ff807e380c] 5 exc_raise_signal_exception(0x86, 0x0, 0x120159490, 0x1, 0x1) [0x3ff807e3ac8] 6 (unknown)() [0x3ff805af254] 7 (unknown)() [0x12015948c] 8 (unknown)() [0x12014f36c] 9 (unknown)() [0x120159844] 10 _PyGC_Insert(0x7f, 0x6f, 0x120111690, 0x2, 0x140155e20) [0x120159d04]
This _PyGC_Insert always appears here in the trace - sometimes called from PyTuple_New, sometimes from PyMethod_New...

Mark Favas wrote:
No problems (apart from the usual test failures) with the CVS from 10 July, but with CVS from 11 July I get core dumps at random points in "make test" which seem to point to a problem with the GC code. Sample dbx output:
I wouldn't bet my life on it, but I don't think the GC code is responsible. The only changes checked in during the last days were the "long n = 0;" initialization and my ANSI-fication (which I just reviewed and which should be harmless).
don't-have-a-clue-though-ly y'rs Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de

On Tue, Jul 11, 2000 at 01:05:10AM +0000, Peter Schneider-Kamp wrote:
I wouldn't bet my life on it, but I don't think the GC code is responsible.
Its hard to say. The GC code will probably pick up on a lot of problems because it touches many objects. On the other hand, it could be my bug. I'm prettly limited here as my computer is somewhere between Calgary and Reston as I type this. I will try to find the problem though.
Neil

[Peter Schneider-Kamp]
I wouldn't bet my life on it, but I don't think the GC code is responsible.
[Neil Schemenauer]
Its hard to say.
I'll second that <wink>.
The GC code will probably pick up on a lot of problems because it touches many objects. On the other hand, it could be my bug. I'm prettly limited here as my computer is somewhere between Calgary and Reston as I type this. I will try to find the problem though.
I also doubt it's gc's fault: the recent patches to the gc code were absolutely vanilla, this all worked fine (at least for me) yesterday, and it's failing for multiple people on multiple platforms today. (BTW, Barry, your core.py does not fail under my Windows build.) I haven't had more time tonight to look at it, though, and won't tomorrow either. "Somehow or other" the list of objects it's crawling over is totally hosed (length 3, first one is a legit string object, last two seemingly random trash).
If I remembered how to do this with CVS, I'd just do a mindless binary search over the last day's patches, rebuilding until the problem goes away ...
OK, if back out *only* Jeremy's patch to stringobject.c:
http://www.python.org/pipermail/python-checkins/2000-July/006424.html
all my Windows gc failures go away. I picked on that patch because it's the only non-trivial patch that's gone in recently to a popular part of the code. Jeremy, want to double-check your refcounts <wink>?
suggestive-but-not-yet-proven-ly y'rs - tim

[Peter Schneider-Kamp]
I wouldn't bet my life on it, but I don't think the GC code is responsible.
[Neil Schemenauer]
Its hard to say.
I'll second that <wink>.
The GC code will probably pick up on a lot of problems because it touches many objects. On the other hand, it could be my bug. I'm prettly limited here as my computer is somewhere between Calgary and Reston as I type this. I will try to find the problem though.
I also doubt it's gc's fault: the recent patches to the gc code were absolutely vanilla, this all worked fine (at least for me) yesterday, and it's failing for multiple people on multiple platforms today. (BTW, Barry, your core.py does not fail under my Windows build.) I haven't had more time tonight to look at it, though, and won't tomorrow either. "Somehow or other" the list of objects it's crawling over is totally hosed (length 3, first one is a legit string object, last two seemingly random trash).
If I remembered how to do this with CVS, I'd just do a mindless binary search over the last day's patches, rebuilding until the problem goes away ...
OK, if I back out *only* Jeremy's very recent patch to stringobject.c:
http://www.python.org/pipermail/python-checkins/2000-July/006424.html
all my Windows gc failures go away. I picked on that patch because it's the only non-trivial patch that's gone in recently to a popular part of the code. Jeremy, want to double-check your refcounts <wink -- but it smells like an extra decref>?
suggestive-but-not-yet-proven-ly y'rs - tim

"NS" == Neil Schemenauer nascheme@enme.ucalgary.ca writes:
NS> On Tue, Jul 11, 2000 at 01:05:10AM +0000, Peter Schneider-Kamp NS> wrote:
I wouldn't bet my life on it, but I don't think the GC code is responsible.
NS> Its hard to say. The GC code will probably pick up on a lot of NS> problems because it touches many objects. On the other hand, it NS> could be my bug.
My first guess would be something other than the GC. When I was working on the string_join fix this afternoon, a couple of my interim versions had refcount problems that lead to core dumps in the garbage collector. The GC catches lots of memory problems because it's touching all the objects; that doesn't mean its to blame for all those problems.
Which leads me to ask, Barry, did you run purify on an interpreter with the latest stringobject.c?
Jeremy

"JH" == Jeremy Hylton jeremy@beopen.com writes:
JH> Which leads me to ask, Barry, did you run purify on an JH> interpreter with the latest stringobject.c?
Yes. And I did get other memory errors before the core, which I should have looked at more carefully.
Looks like that might have been it. My fault for encouraging you to check your changes in without looking at them, and then I had to disappear for a few hours. I've got them now and am looking at the code (I see a missing decref, but that should only leak memory).
Will Purify shortly. -Barry

"BAW" == Barry A Warsaw bwarsaw@beopen.com writes:
"JH" == Jeremy Hylton jeremy@beopen.com writes:
JH> Which leads me to ask, Barry, did you run purify on an BAW> Looks like that might have been it. My fault for encouraging BAW> you to check your changes in without looking at them, and then BAW> I had to disappear for a few hours.
I was in a rush, too. I should have waited until after my softball game.
Jeremy

"JH" == Jeremy Hylton jeremy@beopen.com writes:
"NS" == Neil Schemenauer nascheme@enme.ucalgary.ca writes:
NS> Its hard to say. The GC code will probably pick up on a lot NS> of problems because it touches many objects. On the other NS> hand, it could be my bug.
It also appears to mask many problems. I just compiled out the gcmodule (which is sadly undocumented, I believe) and I've just gotten a number of new memory leaks. This is basically running a very simple Python script:
-------------------- snip snip -------------------- '-'.join(('one',)) '-'.join((u'one',)) -------------------- snip snip --------------------
Will investigate further, but at first blush, they might be real cycles created in the exception initialization code.
But anyway...
JH> Which leads me to ask, Barry, did you run purify on an JH> interpreter with the latest stringobject.c?
...I'd actually expect the above script to leak *seq twice in the current CVS string_join(). It doesn't, but then neither does including the missing decrefs cause the objects to be decref'd extra times. Interesting. Still I think string_join() needs to be patched in two places to decref *seq (first, in the seqlen==1 clause, and later just before the return PyUnicode_Join()).
I'll check in a patch, but would appreciate a proofread.
-Barry

Barry A. Warsaw writes:
It also appears to mask many problems. I just compiled out the gcmodule (which is sadly undocumented, I believe) and I've just gotten
Neil, Can you provide something for this module? Plain text is fine if you prefer; I can add the LaTeX markup.
-Fred
participants (7)
-
bwarsaw@beopen.com
-
Fred L. Drake, Jr.
-
Jeremy Hylton
-
Mark Favas
-
Neil Schemenauer
-
Peter Schneider-Kamp
-
Tim Peters