Moving towards Python 3.0 (was Re: [Python-Dev] Speed up functioncalls)

Michael Chermside mcherm at mcherm.com
Mon Jan 31 17:51:05 CET 2005


Evan Jones writes:
> My knowledge about garbage collection is weak, but I have read a little
> bit of Hans Boehm's work on garbage collection. [...] The biggest
> disadvantage mentioned is that simple pointer assignments end up
> becoming "increment ref count" operations as well...

Hans Boehm certainly has some excellent points. I believe a little
searching through the Python dev archives will reveal that attempts
have been made in the past to use his GC tools with CPython, and that
the results have been disapointing. That may be because other parts
of CPython are optimized for reference counting, or it may be just
because this stuff is so bloody difficult!

However, remember that changing away from reference counting is a change
to the semantics of CPython. Right now, people can (and often do) assume
that objects which don't participate in a reference loop are collected
as soon as they go out of scope. They write code that depends on
this... idioms like:

    >>> text_of_file = open(file_name, 'r').read()

Perhaps such idioms aren't a good practice (they'd fail in Jython or
in IronPython), but they ARE common. So we shouldn't stop using
reference counting unless we can demonstrate that the alternative is
clearly better. Of course, we'd also need to devise a way for extensions
to cooperate (which is a problem Jython, at least, doesn't face).

So it's NOT an obvious call, and so far numerous attempts to review
other GC strategies have failed. I wouldn't be so quick to dismiss
reference counting.

> My only argument for making Python capable of leveraging multiple
> processor environments is that multithreading seems to be where the big
> performance increases will be in the next few years. I am currently
> using Python for some relatively large simulations, so performance is
> important to me.

CPython CAN leverage such environments, and it IS used that way.
However, this requires using multiple Python processes and inter-process
communication of some sort (there are lots of choices, take your pick).
It's a technique which is more trouble for the programmer, but in my
experience usually has less likelihood of containing subtle parallel
processing bugs. Sure, it'd be great if Python threads could make use
of separate CPUs, but if the cost of that were that Python dictionaries
performed as poorly as a Java HashTable or synchronized HashMap, then it
wouldn't be worth the cost. There's a reason why Java moved away from
HashTable (the threadsafe data structure) to HashMap (not threadsafe).

Perhaps the REAL solution is just a really good IPC library that makes
it easier to write programs that launch "threads" as separate processes
and communicate with them. No change to the internals, just a new
library to encourage people to use the technique that already works.

-- Michael Chermside



More information about the Python-Dev mailing list