Greg Ewing wrote:
I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier.
I'm thinking along similar lines, but my approach is to eliminate refcounting entirely. (Note that the incref and decref macros could still exist for backwards compatibility, but would do nothing.)
Garbage collection for concurrent systems is an active area of research, however it appears that many of the research systems out there have settled on a few basic design parameters. Most of them use a copying collector for the "young" generation (with a separate heap per thread, for exactly the reasons you suggest), and a shared mark-and-sweep heap store for the older generation that uses a traditional free list.
For my own amusement and curiosity, I've been playing around with implementing such a collector, using a heap allocator design that's loosely based on the one from dlmalloc, which is an open source malloc implementation with very good overall performance. The idea is to build a stand-alone garbage collection library, similar to the popular Boehm collector, but parallel by design and intended for "cooperative" language interpreters rather than uncooperative languages such as C and C++.
Of course, this is purely a hobby-level effort, and I admit that I really have no clue as to what I am doing here - the real point of the exercise is to educate myself about the problem space, not to actually produce a working library, although that might be a possible side effect (equally likely is that I'll never finish it.) I doubt that an untutored amateur such as myself can actually create a robust, efficient parallel implementation, given how hard such programming actually is, and how inexperienced I am. But it's interesting and enjoyable to work on, and that's the only reason I need. (And also produces some interesting side effects - I wasn't happy with the various graphical front-ends to gdb, so I took a couple of days off on wrote one using wxPython :)
Now, all that being said, even if such a GC library were to exist, that is a long way from removal of the GIL, although it is a necessary step. For example, take the case of a dictionary in which more than one thread is inserting values. Clearly, that will require a lock or some other mechanism to prevent corruption of the hash table as it is updated. I think we want to avoid the Java situation where every object has its own lock. Instead, we'd have to require the user to provide a lock around that insertion operation. But what about dictionaries that the user isn't aware of, such as class methods and module contents? In a world without a GIL, what kind of steps need to be taken to insure that shared data structures can be updated without creating chaos?