[Python-bugs-list] [Bug #113812] Serious garbage collection problems with 2.0b1
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 14 Sep 2000 23:43:25 -0700
Bug #113812, was updated on 2000-Sep-07 10:10
Here is a current snapshot of the bug.
Project: Python
Category: Modules
Status: Open
Resolution: None
Bug Group: None
Priority: 8
Summary: Serious garbage collection problems with 2.0b1
Details: Since I've installed version 2.0b1, I ran into a few (serious) problems
with a non-trivial application (+80,000 lines of Python). It seems that
the are all related to the new garbage collection:
- Suddenly, assertions fail because lists have become empty
while they shouldn't have. I looks like their elements have been
garbage collected while they were still reachable.
- Under other circumstances, I get coredumps which always
seem to happen in function "move_root_reachable" of gcmodule.c:
the "traverse" function pointer seems to contain a bogus address.
- When I run a Purified version of the interpreter, I can't reproduce
either problem, but instead, the garbage collector seems to get
stuck in an endless loop each time. This always happens in the
same function "move_root_reachable". Purify doesn't produce any
relevant warning.
The code runs just fine with version 1.6 of the interpreter, or when I
disable the garbage collector.
Probably relevant is the fact that the objects (10,000's) in my application
are very heavily cross-linked (nearly all links are bi-directional), which
probably puts a lot of stress on the garbage collector.
My platform: HP-UX 10.20 / c89 compiler
Follow-Ups:
Date: 2000-Sep-07 10:14
By: edg
Comment:
Just one more thing: when I turn on the gc debugging,
the interpreter also seems to get stuck in an endless loop.
-------------------------------------------------------
Date: 2000-Sep-07 14:41
By: jhylton
Comment:
If you set the GC threshold to 0,
import gc
gc.set_threshold(0)
Do you get the same problem? Not that I doubt there is some sort of gc problem, but I wonder if there is something going wrong in the accounting or in the collection.
How hard would it be for someone to try to reproduce this bug? Obviously, it would be helpful to get a smaller test case that has the same behavior as your large program and also tickles the bug.
Do you have any C extension modules in your application or is it pure Python?
-------------------------------------------------------
Date: 2000-Sep-07 15:06
By: jhylton
Comment:
Please do triage on this bug.
-------------------------------------------------------
Date: 2000-Sep-08 01:33
By: edg
Comment:
The code is 100% pure Python.
When I set the threshold to 0, the problem doesn't occur.
The problem is probably very hard to reproduce by anyone else.
The slightest change in the input data for my application
can make the problem go away. Even running an optimized instead
of a debug version of the interpreter can make a difference.
I'm certainly going to try to strip it down, but that won't be
easy (the same application, using other input data, running
6 times as long, runs just fine).
-------------------------------------------------------
Date: 2000-Sep-11 07:48
By: edg
Comment:
After 2 full days of debugging, I think that I finally found the
cause for the gc problems.
To keep a (very) long story short, what I found out was the following:
- Crashes were due to objects that were destructed twice.
- Endless loops were due to messed-up gc generation lists. The lists should
always remain perfectly circular, but sometimes they ended up like this:
list <-> ... <-> X <-> ... <-> ... -> X
No wonder that the gc code could easily get stuck; even turning on
gc debugging caused the counting code to run in circles forever.
The multiple-destruction problem is almost certainly caused by lists being
messed up.
By turning on the debugging code in gc_list_remove and reducing the gc
threshold to a very small value, I could trigger the crash more
reliably, which allowed me to strip down my 80000 line application to this:
-----------------------------------------------------------------------------
#
# Note: to trigger a crash reliably, the debugging code in gc_list_remove
# _must_ be turned on.
#
import gc
gc.set_threshold(1)
class Node:
def __del__(self):
dir(self)
a = Node()
del a # -> Crash
-----------------------------------------------------------------------------
You wonder: can it be that simple ? :-)
This is what happens when the Node instance `a' is destructed:
1) The Node instance is removed from the gc lists.
2) An instance method is created due to the call of the __del__ method.
THAT METHOD CREATES A NEW REFERENCE TO THE INSTANCE !
3) The code in the __del__ method triggers the allocation of new
objects and because the gc threshold has been set very low, it also
triggers a gc run.
4) During the gc run, the instance method is encountered and its reachable
objects are visited.
5) Since the instance is referenced by the method, the gc code tries to move
the instance to another list, while it was no longer present in any list
-> BINGO
Obviously, the reason why this problem was so hard to reproduce, is the fact
that most classes don't have a __del__ method, and the problem only occurs
when a gc run happens during the execution of a __del__ method.
It think the fix is as simple as this (I'm not too confident, but it
seems to work):
------------------------------------------------------------------------------
*** Objects/classobject.c.orig Mon Sep 11 15:55:03 2000
--- Objects/classobject.c Mon Sep 11 16:12:26 2000
***************
*** 490,496 ****
#ifdef Py_TRACE_REFS
extern long _Py_RefTotal;
#endif
- PyObject_GC_Fini(inst);
/* Call the __del__ method if it exists. First temporarily
revive the object and save the current exception, if any. */
#ifdef Py_TRACE_REFS
--- 490,495 ----
***************
*** 523,529 ****
#ifdef COUNT_ALLOCS
inst->ob_type->tp_free--;
#endif
- PyObject_GC_Init((PyObject *)inst);
return; /* __del__ added a reference; don't delete now */
}
#ifdef Py_TRACE_REFS
--- 522,527 ----
***************
*** 537,542 ****
--- 535,541 ----
#endif /* Py_TRACE_REFS */
Py_DECREF(inst->in_class);
Py_XDECREF(inst->in_dict);
+ PyObject_GC_Fini(inst);
inst = (PyInstanceObject *) PyObject_AS_GC(inst);
PyObject_DEL(inst);
}
------------------------------------------------------------------------------
ie, delay the removal from the gc list till everything has stabilized.
I hope this helps.
-------------------------------------------------------
Date: 2000-Sep-14 23:43
By: nascheme
Comment:
Your analysis looks correct. Great work. I think there is a small problem with your fix however. You should call PyObject_GC_Fini() before the DECREFs on in_class and in_dict. If, for some reason, decrementing the reference counts of these object causes a garbage collection then the instance could still be on the gc lists and have an invalid in_class or in_dict pointer.
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=113812&group_id=5470