[Python-ideas] Ideas towards GIL removal
talin at acm.org
Fri Apr 13 08:51:11 CEST 2007
Greg Ewing wrote:
> I've been thinking about some ideas for reducing the
> amount of refcount adjustment that needs to be done,
> with a view to making GIL removal easier.
I'm thinking along similar lines, but my approach is to eliminate
refcounting entirely. (Note that the incref and decref macros could
still exist for backwards compatibility, but would do nothing.)
Garbage collection for concurrent systems is an active area of research,
however it appears that many of the research systems out there have
settled on a few basic design parameters. Most of them use a copying
collector for the "young" generation (with a separate heap per thread,
for exactly the reasons you suggest), and a shared mark-and-sweep heap
store for the older generation that uses a traditional free list.
For my own amusement and curiosity, I've been playing around with
implementing such a collector, using a heap allocator design that's
loosely based on the one from dlmalloc, which is an open source malloc
implementation with very good overall performance. The idea is to build
a stand-alone garbage collection library, similar to the popular Boehm
collector, but parallel by design and intended for "cooperative"
language interpreters rather than uncooperative languages such as C and C++.
Of course, this is purely a hobby-level effort, and I admit that I
really have no clue as to what I am doing here - the real point of the
exercise is to educate myself about the problem space, not to actually
produce a working library, although that might be a possible side effect
(equally likely is that I'll never finish it.) I doubt that an untutored
amateur such as myself can actually create a robust, efficient parallel
implementation, given how hard such programming actually is, and how
inexperienced I am. But it's interesting and enjoyable to work on, and
that's the only reason I need. (And also produces some interesting side
effects - I wasn't happy with the various graphical front-ends to gdb,
so I took a couple of days off on wrote one using wxPython :)
Now, all that being said, even if such a GC library were to exist, that
is a long way from removal of the GIL, although it is a necessary step.
For example, take the case of a dictionary in which more than one thread
is inserting values. Clearly, that will require a lock or some other
mechanism to prevent corruption of the hash table as it is updated. I
think we want to avoid the Java situation where every object has its own
lock. Instead, we'd have to require the user to provide a lock around
that insertion operation. But what about dictionaries that the user
isn't aware of, such as class methods and module contents? In a world
without a GIL, what kind of steps need to be taken to insure that shared
data structures can be updated without creating chaos?
More information about the Python-ideas