Reference counting garbage collection
tdelaney at avaya.com
Thu Aug 23 05:19:04 CEST 2001
> From: Paul Rubin [mailto:phr-n2001 at nightsong.com]
> "Delaney, Timothy" <tdelaney at avaya.com> writes:
> > I believe another major reason is that adding "real" GC
> would cause problems
> > for C extensions. All Python's cycle collector does is
> break the links
> > forming the cycle - the reference counting semantics then
> handle the rest.
> Actually, conventional GC is simpler for C extensions since the C
> extensions for the most part don't have to think about the GC.
> In the current situation we see a lot of reference counting bugs
> in C extensions.
That depends on the GC implementation of course. Any implementation which
moves memory would cause *major* problems for C extensions.
> > Java has a "real" GC - and this often leads to problems -
> in particular,
> > programs being relatively fast until a memory threshold is
> reached - and
> > then the GC starts kicking in and things really start
> slowing down in a
> > non-deterministic way.
> That's all a matter of how the GC is implemented. There are certainly
> realtime GC algorithms that have been used in some Lisp and
> Java systems.
Very true - however, there are very few deterministic garbage collectors
around (any?). There are a few more time-bounded collectors (what you are
referring to as a realime GC system) which guarantee that they will return
control within a set period of time.
All GC implementations have major tradeoffs - these usually involve being
non-deterministic, prone to "clumping" or having much higher memory
requirements (choose at least 2). Generational collectors go some way to
avoiding the "clumping" effect, time-bound collectors go much further
towards this goal. Interruptible collectors may find themselves attempting
to collect the same piece of garbage many times over if the system is
particularly busy. Copy collectors suffer from extreme memory overhead.
Conservative M&S will likely miss some garbage, non-conservative is much
less likely to, but usually take longer.
One of the major problems with *all* non-reference-counting GC systems is
that the point when you need GC to be working best is when it starts working
worst. When a system becomes busy, with rapid allocations and deallocations
of memory, there tends to be less time to perform garbage collection.
Python's reference counting GC implementation seems to me a good tradeoff -
some slowdown (but in a deterministic manner), some memory overhead
(constant per object). It does suffer somewhat from clumping, but no more
than objects allocated on the stack in C++ when they go out of scope.
> I guess that might be the case, at least with the current body of
> Python programs. I'm thinking of Python's implementation in terms of
> whether it's suitable for large, long-running programs, and it's
> not clear whether anyone really writes those in Python.
It depends entirely on the requirements of the system. However, I would bet
on Python's memory management for large, long-running programs over any of
the garbage collectors in common usage today - it is less likely to suffer
as the system gets bigger.
> The combination of RC and GC (or cycle breaking) that's been described
> here sounds like a pretty good approach. It wasn't described in the
> Python docs that I read, which described a pure RC system, but that's
> ok :).
More information about the Python-list