[Python-bugs-list] [ python-Bugs-429570 ] GC objects are tracked prematurely

noreply@sourceforge.net noreply@sourceforge.net
Tue, 04 Sep 2001 11:26:53 -0700


Bugs item #429570, was opened at 2001-06-02 04:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=429570&group_id=5470

Category: Python Interpreter Core
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Ernst Jan Plugge (rmc)
Assigned to: Neil Schemenauer (nascheme)
Summary: GC objects are tracked prematurely

Initial Comment:
Under certain circumstances, the Python Garbage 
Collector tracks objects that haven't yet been added 
to the GC chain. This causes hard to debug segfaults 
when GC occurs.

I've reduced the problem to this simple case: a simple 
extension module with one extension type that holds a 
reference to another object. The single module method 
creates two objects, lets them point to eachother, and 
returns one of them. This creates a cycle to be broken 
by the GC.

In the attached source, the second PyObject_GC_Init() 
call is done after the references have been set up. If 
called in a 'while 1: a = trouble.createT( "foo" )' 
loop, it segfaults as soon as a GC cycle is performed 
while the method is still setting up the objects.

If the PyObject_GC_Init() call is moved up to 
immediately after the obj field is set to Py_None, no 
segfault occurs.

I believe this should not happen because one should be 
free to mess with tracked objects as long as they 
haven't been added to the GC chain. If it is in fact 
working as designed, this should be documented.

I'm using Python 2.1 on Linux/Intel. 2.0 has the same 
problem, but it shows in slightly different places. 
2.1 made it easier to create this simple minimal case.

It took a few long, frustrating days to track this one 
down, because it didn't show up until the module was 
up to about 20000 twisty lines of code, all 
interconnected... :-(



----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2001-09-04 11:26

Message:
Logged In: YES 
user_id=35752

This problem has been fixed by recent changes to the GC.  It's
now acceptable to have container objects that are reachable
by the collector but are not tracked.  Also, collection is
now triggered by PyObject_GC_New or PyObject_GC_NewVar, not by
PyObject_GC_Track.

----------------------------------------------------------------------

Comment By: Ernst Jan Plugge (rmc)
Date: 2001-08-21 04:52

Message:
Logged In: YES 
user_id=151380

The cause of the segfault in my original report is the
collision of two issues: first the fact that, at GC time,
all reachable containers must be tracked, and second the
fact that merely adding an object to the GC chain may
trigger a GC sweep. The latter is counter-intuitive to me at
least; I expected GC to occur only at object allocation
time because the GC module doc mentions that GC is triggered
by the number of allocations exceeding a threshold.

The combination of these two issues is what caused the
problem in the first place, and although I had read the docs
reasonably carefully, I missed the connection. Therefore,
I still think this should be mentioned in the docs. For
example by listing the conditions that may trigger a GC
sweep and referring to that list from the text that
documents the restriction that containers reachable from
tracked objects must also be tracked.
Of course if the restriction has been lifted, the issue has
become academic anyway.



----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2001-08-20 13:41

Message:
Logged In: YES 
user_id=35752

The API document says:

  Any container which may be referenced from another object
  reachable by the collector must itself be tracked by the
  collector [...]

Can I close this bug or should the documentation be yet
more explict?  Note that this restriction is lifted by my
"GC API cleanup" patch (i.e. it's okay to have an object
with the GC flag set that is reachable from a tracked
object but is not itself tracked).


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-06-02 23:06

Message:
Logged In: YES 
user_id=31435

Assigned to Neil.  The docs should at least be more 
explicit about that when a container C is added to the GC 
list via PyObject_GC_Init(), all objects reachable from C 
then and until PyObject_GC_Fini() is called must either (a) 
not participate in GC at all, or (b) themselves have been 
PyObject_GC_Init()'ed before becoming reachable from C.

It looks like this particular case would not have blown up 
if, in _PyGC_Insert, op were added to generation0 *before* 
checking to see whether collection should run.  Think 
that's more robust?  Well, maybe in a case where A 
references B references A, but if the cycle being created 
were longer than that this style of programming would still 
lead to problems.

Ernst, independent of all that, when doing

x = y;

get into the rigid habit of incref'ing y before decref'ing 
x.  Sooner or later they're going to point to the same 
object without you realizing it, and then decref'ing x 
first can leave y pointing at trash.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=429570&group_id=5470