[Python-bugs-list] [ python-Bugs-742911 ] Memory fault on complex weakref/weakkeydict delete
SourceForge.net
noreply@sourceforge.net
Tue, 10 Jun 2003 10:05:29 -0700
Bugs item #742911, was opened at 2003-05-24 17:29
Message generated for change (Comment added) made by jacobs99
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=742911&group_id=5470
Category: Python Interpreter Core
Group: Python 2.2.2
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Mike C. Fletcher (mcfletch)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Memory fault on complex weakref/weakkeydict delete
Initial Comment:
Attached find two modules which together form a
test-case. The cache.py file is ripped out of a
production system (OpenGLContext), and I am seeing
memory faults under both Python 2.2.2 and 2.2.3 when I
run the code. Under 2.2.2 while single-stepping
through the code I was able to provoke an error-message:
Fatal Python error: GC object already in linked list
The error message doesn't show up under 2.2.3, but the
memory-fault does.
Modules here don't use any extension modules, so there
shouldn't be any loose memory references or the like.
Note, you'll likely need to make weakkeydictionary's
__delitem__ use keys instead of iterkeys to even get to
the crashing.
----------------------------------------------------------------------
Comment By: Kevin Jacobs (jacobs99)
Date: 2003-06-10 12:05
Message:
Logged In: YES
user_id=459565
The fix checked in to solve this problem causes a potentially
more serious one in Bug 751998. In short, it seems that
data descriptors become unavailable in __del__ methods,
preventing finalization from completing. I have yet to figure
out why, though I have isolated the problem to revision 2.234
of typeobject.c. I've not looked to see if this also affects
Python 2.2.3, though I wouldn't be too suprised if it does.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-06-01 00:21
Message:
Logged In: YES
user_id=31435
Which service pack <wink>? Thanks for reporting this in
time for 2.2.3, Mike.
----------------------------------------------------------------------
Comment By: Mike C. Fletcher (mcfletch)
Date: 2003-05-31 14:21
Message:
Logged In: YES
user_id=34901
2.2.3 fixes the original problem on Win2k.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2003-05-29 16:35
Message:
Logged In: YES
user_id=33168
2.3 works for me with the original, but didn't before.
Redhat 9.
----------------------------------------------------------------------
Comment By: Jeff Epler (jepler)
Date: 2003-05-29 15:47
Message:
Logged In: YES
user_id=2772
Just one question: does the final test-case cover the
original bug? I have a redhat 9 system with Python 2.2.2,
and temp2.py runs just fine:
$ python2 temp2.py
<__main__.Oops object at 0x8162ecc>
$
The Python version is:
Python 2.2.2 (#1, Feb 24 2003, 19:13:11)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-4)] on linux2
I have not tried any other version of the test-case. I also
didn't try a non-redhat version, so it's barely possible
that some patch they apply affected this issue.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-05-29 09:53
Message:
Logged In: YES
user_id=31435
Just changed Resolution to Fixed. Thanks everyone!
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2003-05-29 09:35
Message:
Logged In: YES
user_id=6380
Thanks to Tim for the analysis, to Neal for the simplified
test, and to Mike for the bug report! The fix was actually
simple: clear the weak ref list *before* calling __del__ or
clearing __dict__. This is the same order as used by classic
classes, so can't be wrong. Applied to 2.2.3 branch and 2.3
trunk, with test case.
----------------------------------------------------------------------
Comment By: Mike C. Fletcher (mcfletch)
Date: 2003-05-28 20:16
Message:
Logged In: YES
user_id=34901
To give timeline: This is from real-life code (PyOpenGL's
OpenGLContext demo/teaching module). It's currently about
to go beta, with a release date target of end-of-summer.
I've worked around the problem for the most common case w/in
the system (exiting from the VRML browser), but that
work-around's not usable for end-user long-running projects
until the users can deallocate the scenegraph to load
another (i.e. go from world to world within a single run of
the application).
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-05-28 18:26
Message:
Logged In: YES
user_id=31435
That's cool. The analysis is much easier to follow if you run
Neal's little program, which crashes right away in a debug
build. Then the analysis just follows the call stack, and it's
instantly obvious (if you know to look for it <wink>) that
we're trying to deallocate a single object twice.
One thing subtype_delloc does that instance_dealloc
doesn't is clear the instance dict before clearing the
weakrefs. Clearing the instance dict can end up executing
anything, and in this case does, in particular materializing a
strong reference to the object we're in the middle of
deallocating (via a weakref that hasn't been cleared). That
seems to be the key point.
Mike found the problem in 2.2.2, I believe in his real-life
OpenGL code. That's why I'm keen to see it fixed for 2.2.3.
I'll be in the office Thursday, BTW.
Ah, here, I'll attach a simpler program that has the same
kind of problem (temp2.py).
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2003-05-28 17:16
Message:
Logged In: YES
user_id=6380
Tim, let's look at this when you're back in the office. My
head spins from just reading the analysis below.
Note that this is a 2.2 and 2.3 bug. I don't necessarily
want to hold up the 2.2.3 release until this is fixed,
unless we have a quick breakthrough.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-05-24 23:49
Message:
Logged In: YES
user_id=31435
Assigned to Guido, because I suspect the problem lies in
the order subtype_dealloc does things.
With reference to Neal's whittled-down case: when
makeSome() ends, we decref i and then decref item. item's
refcount hits 0 then. There's a weakref remaining to item
(in CacheHolder.client), but subtype_dealloc doesn't clear
the weakref at this point. First it clears item's instance
dict. That contains the last strong reference to i.
subtype_dealloc is inovked again, and clears i's instance
dict, and then deals with the weak reference to i. The
weakref to i has a callback associated with it, and
CacheHolder.__call__() is invoked. That invokes self.client
(), still a weakref to item, and because item's weakrefs still
haven't been dealt with, self.client() returns item.
Now we're hosed. item *had* a refcount of 0 at this point,
and is still in the process of getting cleaned out by the first
call to subtype_dealloc (it still thinks it's clearing item's
instance dict). We already called _Py_ForgetReference on
item when its refcount fell to 0. Its refcount gets boosted
back to 1 by virtue of item getting returned by the
self.client() weakref call. Cleaning out the frame for
CacheHolder.__call__() knocks the refcount down to 0
again, and the second attempt to call _Py_ForgetReference
on it blows up.
In a release build, nothing visibly bad happens when I try
it. It crashes if I add
print client.items
at the end of __call__ in a release-build run, though. Looks
like that's just the luck of the draw after things have gone
insane.
I note that instance_dealloc deals with weakrefs much
earlier in the process, so that when Neal makes Items a
classic class instead, the eventual call to self.client()
returns None (instead of items), and nothing bad happens..
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-05-24 21:53
Message:
Logged In: YES
user_id=31435
Outstanding, Neal -- thanks! I can confirm that this crashes
in a current 2.3 debug build on Windows too. I'm feeling
sick and won't pursue it now, though. When cleaning up
from the call to makeSome() (the body of makeSome() has
completed, and we're cleaning up its execution frame,
decref'ing the locals), we're dying in _Py_ForgetReference
with a NULL-pointer derefernce. The refcount on an Items
object has just fallen to 0, and the code is trying to verify
that the object is in the debug-build "list of all objects". But
its prev and next pointers are both NULL -- it's not in the
list, and simply trying to check that it is blows up.
I don't have a theory for what's causing this, but it's
probably not a good thing <heh>.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2003-05-24 18:31
Message:
Logged In: YES
user_id=33168
I cut out a lot of stuff from the test. The new file is
much more minimal, but still crashes for me in a 2.3 debug
build. You only need the one file too (not both files).
There is an issue with new style classes. If Items doesn't
derive from object, I don't get a crash.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=742911&group_id=5470