Reference counting garbage collection

Paul Rubin phr-n2001 at
Thu Aug 23 05:46:24 CEST 2001

"Tim Peters" < at> writes:
> > Is there any particular reason Python uses reference counting garbage
> > collection
> + 100% portable out of the box, even to bizarre platforms.

I don't see any conflict between portability and other GC methods not
counting those which play with the machine memory protection hardware.

> + Simple M&S is slower given Python's extremely high dynamic memory
> throughput (*everything* in Python is heap-allocated, from execution frames
> to ints, and RC is cache-friendly by nature, eager to reuse a piece of
> memory ASAP; just picture "sum += i" in a loop, while keeping
> "*everything's* heap-allocated" in mind).

Yes, simple M/S isn't appropriate except for small systems.  A
generational scheme is indicated here.  However, maybe some future
implementation can avoid doing so much heap allocation.  At that point
it should be possible to compile Python into pretty good machine code.

> + Sharing data with random C, Fortran, Java etc programs is a godawful mess
> when GC decides to move things around.  Python never moves objects, and
> that's a major blessing when integrating with foreign systems.

Yes, good point.

> > (which leaks memory if there's circular structure)
> Recent Pythons collect those.

OK, that was my main concern.  I started writing an LRU cache in the
obvious way (with the entries on a circular linked list) then started
worrying about whether it would ever get GC'd without special attention.
If this is fixed I can stop worrying :).

> > instead of Lisp-style garbage collection?
> Afraid that doesn't mean anything specific without reference to a specific
> Lisp implementation; you seem to have compacting M&S in mind, perhaps
> generational or perhaps not, but, regardless, the seminal role Lisp played
> in driving GC algorithms means there's virtually nothing that isn't "Lisp-
> style" for *some* value of "Lisp".

By Lisp-style GC, I mostly just mean GC that handles circular
structure properly.  At another level it means GC that doesn't require
a lot of programmer attention when writing C extensions, as ref
counting needs.  But I certainly wasn't thinking specifically of m/s.

> Cycles are irrelevant now.  While it *is* a PITA to worry about refcounts,
> it would be a bigger PITA to worry about memory moving.

This is valid.

> > Also, using a compacting GC helps localize memory references, speeding
> > up the program by improving cache performance.
> If you dig for solid evidence that this is true in real programs, you may be
> surprised at how little you find.  Try this for a starting point on why
> deeper analysis suggests life is more complicated than picturing easy worst
> cases (notwithstanding that I invited you to picture one above <wink>):

Thanks, that's an interesting paper but I don't know what to conclude
from it.  Zorn's measurements started in Berkeley in the 80's.  Back
then, a typical workstation was a 16 mhz 68000 with 80 nsec memory
access time.  These days, a typical workstation is a 1 Ghz Athlon with
50 nsec memory access time.  So today's CPU is > 100x faster (more
parallelism in addition to higher frequency) but memory is only
slightly faster.  So the cycle cost of a cache miss has gone up
tremendously--you can execute 50 or more instructions in the time it
takes to read something from DRAM on a cache miss.  This -has- to have
some effect on the cost of memory fragmentation.

> > So maybe the GC implementation should be revisited at some point.
> It's been revisited several times by several people.  The only thing the
> implemented alternatives to date have consistently had in common is worse
> performance.
> ain't-nothin'-obvious-about-gc-strategies-ly y'rs  - tim

That's for sure--there's a large literature of GC research that goes
back for decades.  It's not appropriate to ignore everything that's
been tried.

More information about the Python-list mailing list