Python vs Java garbage collection?

Stuart D. Gathman stuart at bmsi.com
Sat Dec 21 21:49:40 EST 2002


On Sat, 21 Dec 2002 16:19:19 -0500, Robert Oschler wrote:

> This is a very unscientific observation I have here, in the sense that
> I've done no formal research, but in my web and newsgroup perusals, I
> seem to have come across quite a few mentions of problems with Java
> applications in regards to untimely garbage collection and memory
> "hogging".  Yet I have come across very few of the same complaints with
> Python.

I speak from lots of experience with both Python and Java.  As someone
else has mentioned, the very early (1996) collectors for Java were
conservative.

However, since JDK 1.1.6, both Sun and IBM implementations of Java have
had robust garbage collection and fast allocation.  I have yet to see a
memory problem in Java that was actually due to the GC.  Python can also
have the same problems, but seems to have them less - perhaps because the
language is higher level.  In order of prevalance, I have seen:

  1. The application has a reference to a large and growing collection
which the programmer has forgotten about.  Some call this a "memory
leak", but it is not really a leak since all the memory is in fact
reachable.  I call it "data cancer", because it is an unwanted and often
fatal growth of a data structure.

  2. A library has a non-Java resource such as a window, which is not
disposed because the application forgot to do so and the library programmer
forgot to do so in the finalizer.  (Ditto for Python C extensions.)

  3. A native code library has a good old fashioned C memory leak.

Python uses reference counting.  This is the slowest form of garbage
collection, but it has the virtue that (apart from cycles) memory is
released at the earliest possible moment.  Since the language is
interpreted anyway, the overhead for reference counting is not
objectionable.  Other forms of GC are faster, but use more memory because
reclamation is delayed.

The problem with Python reference counting, is that it encourages sloppy
programming like:

  data = open('myfile','r').read()

depending on the reference counting GC to release and close the file
object immediately when read() returns.  This habit must be broken before
Python can evolve to Lisp like speed.  The proper code:

  fp = open('myfile','r')
  data = fp.read()
  fp.close()

is not as pretty.  Perhaps some clever pythonista will invent some
syntactic sugar to help the medicine go down.

-- 
	      Stuart D. Gathman <stuart at bmsi.com>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.



More information about the Python-list mailing list