Misc questions about type objects and Python 3.0

Wed Oct 9 12:37:31 EDT 2002

"David Brown" <david at no.westcontrol.spam.com> writes:

>  As far as I
> know, the only other way to do this is to regularly scan through all current
> pointers to see what's in use - if there are any objects around that aren't
> pointed to, they can be garbage collected.

Yes, more or less.

For example, the mark-and-sweep method works by traversing the
web of objects, setting a bit in every object reached (marking
them), and when that is finished, work through *all* objects,
deleting those that weren't marked (sweeping).

But there are lots of other methods and variations too.

> This has got to be a much less efficient method.

Not necessarily.  You are likely underestimating the overhead of
reference counting.  For example, think about this little code
snippet:

	y = 17
	z = 69
	x = y + z

Lets look at the last line.  First the values of y and z are
extracted, and the reference counts of 17 and 23 are incremented.
Then the __add__ method of 17 is located, and its refcount is
incremented (you don't want it removed during its execution,
which could happen if the __add__ method called some other code
which eventually removed all other references to the int class).
Then int.__add__ is called.  It calculates the value 86, creates
(or finds) the object 86, increments its refcount, and returns
it.  The refcounts of 17, 69, and the int.__add__ method, can now
be decremented.  86 is bound to x, incrementing the refcount of
86.  86 is finally removed from the stack, decrementing its
refcount.  That's a total of nine (9) refcount manipulations.
And the *only* object that has a different refcount than we
started with, is 86; eight of those changes were in effect
unnecessary.

Now, that is admittedly a rather naive implementation.  The last
two changes of the refcount of 86 could for example easily be
optimized away, leaving us with "only" six unnecessary refcount
changes.  But the int value 86 is, if I remember correctly,
immortal in C-Python, and it is therefore really unnecessary to
change its refcount too...

A more intelligent implementation could probably optimize away a
couple more of those refcount changes for this specific case, but
in the general case you can't always do that.

There are several GC methods that can be decidedly faster than
reference counting.  A google search should give you some reading
material.  http://www.cs.utexas.edu/users/oops/papers.html seems
like a good start; especially Paul R Wilson's "Uniprocessor
Garbage Collection Techniques" is supposed to be a good read.

Besides, C-Python version 2 already implements true GC.  Having
*two* methods for GC running simultaneously, is *definitely*
wasteful.

-- 
Thomas Bellman,   Lysator Computer Club,   Linköping University,  Sweden
"When C++ is your hammer, everything         !  bellman @ lysator.liu.se
 looks like a thumb."                        !  Make Love -- Nicht Wahr!