[Python-3000] Delayed reference counting idea

Adam Olsen rhamph at gmail.com
Mon Sep 18 22:34:44 CEST 2006


On 9/18/06, Raymond Hettinger <rhettinger at ewtllc.com> wrote:
> [Adam Olsen]
> > I don't like the idea of a conservative GC at all in general, but
> > Boehm GC seems to have very good quality, and it's easy to use from
> > the point of view of a C API.

This was Marcin, not me ;)


> Several thoughts:
>
> * An easier C API would significantly benefit the language in terms of
> more extensions being available and in terms of increased reliability
> for those extensions.  The current refcount scheme results in pervasive
> refleak bugs and subsequent, interminable bughunts.  It adds to code
> verbosity/complexity and makes it tricky for beginning extension writers
> to get their first apps done correctly.  IOW, I agree that GC without
> refcounts will make it easier to write good C code.
>
> * I doubt the anecdotal comments about Boehm GC with respect to
> performance.  It may be better or it may be worse.  While I think the
> latter is more likely, only an implementation patch will tell the tale.

I have played with it before, on the CPython codebase.  I really can't
imagine it getting more than a minor speed boost, or else we'd already
be finding that refcounting was taking up a large portion of our CPU
time.  (Anybody have actual numbers on the time spend in malloc/free?)

The real advantage of Boehm is with threading.  Avoiding the locking
means you don't get the giant penalty you'd otherwise get.  Still not
inherently faster than a single-threaded program (which needs no
locking).

I discount Boehm because of the complexity and non-standardness
though.  I'd never want to maintain it, especially since it would
effect all the libraries we link to as well.  Although, with suitable
proxying, it may be possible to limit it to just Python objects..

If I was to seriously consider a python implementation with a tracing
GC, I'd want it to be a moving GC, to fix the high-water mark problem
of malloc.  That seems incompatible with conservative GCs such as
Boehm, although, come to think of it, I could do it using
standard-conforming C (if any API rewrite were permissible).


> * At my company, we write real-time apps that benefit from the current
> refcounting scheme.  We would have to stick with Py2.x unless Boehm GC
> can be implemented without periodically killing responsiveness.

Boehm does have options for incremental GC.


> [Barry Warsaw]
> > What worries me is the unpredictability of gc vs. refcounting.
> > For some class of Python applications it's important that when
> > an object is dereferenced it really goes away right then.
> > I /like/ reference counting!
>
> No doubt that those exist; however, that sort of design is somewhat
> fragile and bugprone leading to endless sessions to find-out who or what
> is keeping an object alive.  This situation can only get worse when
> new-style classes become the norm.  Also, IIRC, bugs involving __del__
> have been one of the more complex, buggy, and dark corners of the
> language.  Statistics incontrovertibly prove that people who habitually
> avoid __del__ lead happier lives and spend fewer hours in therapy ;-)

I agree here.  I think an executor approach is much better; kill the
object, then make a weakref callback do any further cleanups using
copies it made in advance.


-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-3000 mailing list