steve at REMOVE-THIS-cybersource.com.au
Sat Aug 28 13:25:05 CEST 2010
On Fri, 27 Aug 2010 18:06:19 -0700, Paul Rubin wrote:
> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
>> I've repeatedly asked, both here and elsewhere, why reference counting
>> isn't "real" garbage collection. Nobody has been able to give me a
>> satisfactory answer. As far as I can tell, it's a bit of
>> pretentiousness with no basis in objective fact.
> Well, it's a bit of a subjective matter. I'd say it's not real gc
> because 1) it's unsound (misses reference cycles),
You can add cycle detection to a reference count gc, at the cost of more
If you read the Wikipedia article I linked to, tracing algorithms can
also be unsound:
Some collectors running in a particular environment can
correctly identify all pointers (references) in an object;
these are called "precise" (also "exact" or "accurate")
collectors, the opposite being a "conservative" or "partly
conservative" collector. Conservative collectors have to
assume that any bit pattern in memory could be a pointer if
(when interpreted as a pointer) it would point into any
allocated object. Thus, conservative collectors may have
some false negatives, where storage is not released because
of accidental fake pointers...
> and 2) it requires
> constant attention from the mutator to incr and decr the reference
> So developing modules for the CPython API means endlessly
> finding and fixing refcount bugs that lead to either crashes/security
> failures, or memory leaks. If you program the Java JNI or a typical
> Lisp FFI, you'll find that real gc is a lot simpler to use since you
> avoid all the refcount maintenance hassles. You allocate memory and
> shut your eyes, and the gc takes care of freeing it when it figures out
> that you are done. Refcounting is basically a form of manual memory
> management, while gc is automatic.
That's a problem with the CPython API, not reference counting. The
problem is that the CPython API is written at too low a level, beneath
that at which the garbage collector exists, so naturally you have to
manually manage memory.
> Someone said here recently that as a program gets larger, saying "this
> will work as long as we do X every time without fail" becomes equal to
> saying "this won't work". Substitute "properly maintain all ref counts"
> for X and you can see the problem. I've seen released "production"
> "tested" Python C modules with subtle refcount bugs on more than one
> occasion. In gc'd systems there are fewer places for the code to go
On the other hand, tracing gcs have their own set of problems too, mostly
due to the use of finalizers and attempts to make garbage collection run
more predictably. See here:
For tidying Java resources, think about the use of a clean
up routine. When you have finished with an object, call the
routine to null out all references, deregister listeners,
clear out hash tables, and so on. This is far more efficient
than using a finalizer and has the useful side-benefit of
speeding up garbage collection. The Garbage Collector does
not have so many object references to chase in the next
garbage collection cycle.
Translated: "Rather than relying on the garbage collector to clean up
resources after you, do it yourself, manually, so the garbage collector
has less work to do."
Tracing garbage collectors aren't a panacea. They're software themselves,
and complex software, which means they're subject to bugs like the one
which plagued Flash plugin 9:
The more complicated the garbage collector, the more scope you have for
some interaction between your high-level code and the gc leading to
memory not be reclaimed or extreme slowdown. Like this:
More information about the Python-list