I've written some text for the What's New article on the GCing of cycles, but it wasn't a topic I paid close attention to at the time, so I'd like corrections. Here's what I've got; please send me comments privately. 6. Optional Collection of Cycles The C implementation of Python uses reference counting to implement garbage collection. Every Python object maintains a count of the number of references pointing to itself, and adjusts the count as references are created or destroyed. Once the reference count reaches zero, the object is no longer accessible, since you need to have a reference to an object to access it, and if the count is zero, no references exist any longer. Reference counting has some pleasant properties: it's easy to understand and implement, and the resulting implementation is portable, fairly fast, and reacts well with other libraries that implement their own memory handling schemes. The major problem with reference counting is that it sometimes doesn't realise that objects are no longer accessible, resulting in a memory leak. This happens when there are cycles of references. Consider the simplest possible cycle, a class instance which has a reference to itself: instance = SomeClass() instance.myself = instance After the above two lines of code have been executed, the reference count of instance is 2; one reference is from the variable named "'instance'", and the other is from the "myself"attribute of the instance. If the next line of code is del instance, what happens? The reference count of instance is decreased by 1, so it has a reference count of 1; the reference in the "myself" attribute still exists. Yet the instance is no longer accessible through Python code, and it could be deleted. Several objects can participate in a cycle if they have references to each other, causing all of the objects to be leaked. An experimental step has been made toward fixing this problem. When compiling Python, the -with-cycle-gc (XXX correct option flag?) option can be specified. This causes a cycle detection algorithm to be periodically executed, which looks for inaccessible cycles and deletes the objects involved. Why isn't this enabled by default? Running the cycle detection algorithm takes some time, and some tuning will be required to minimize the overhead cost. It's not yet obvious how much performance is lost, because benchmarking this is tricky and depends sensitively on how often the program creates and destroys objects. XXX is this actually the correct reason? Or is it fear of breaking software that runs happily while leaving garbage? Several people worked on this problem. Early versions were written by XXX1, XXX2. (I vaguely remember several people writing first cuts at this. Anyone recall who?) The implementation that's in Python 1.6 is a rewritten version, this time done by Neil Schemenauer. Lots of other people offered suggestions along the way, such as (in alphabetical order) Marc-André Lemburg, Tim Peters, Greg Stein, Eric Tiedemann. (XXX other names? If you feel you should be added, e-mail me). The March 2000 archives of the python-dev mailing list contain most of the relevant discussion, especially in the threads titled ``Reference cycle collection for Python'' and ``Finalization again''.
"A.M. Kuchling" wrote:
...
An experimental step has been made toward fixing this problem. When compiling Python, the -with-cycle-gc (XXX correct option flag?) option can be specified. This causes a cycle detection algorithm to be periodically executed, which looks for inaccessible cycles and deletes the objects involved.
I propose (without any investigation into the difficulty of implementation) that import cyclicgc should turn on the -with-cycle-gc flag. Then you could do a import cyclicgc import someDOM And not worry about cycles. If it's too slow, you remove the line and take responsibility for cycle checking. The important thing is that you don't have to tell YOUR USERS to turn on GC in order for your code to work. I think that this model will get more people actively using (and depending upon!) GC and thus give us a better idea about usage patterns, stability and performance. This is especially important in the beta period when we are trying to shake out bugs. Really, this comes back to Greg's point that we should not have incompatible sub-languages running around. A module depends on the feature or it doesn't. The end-user should not be responsible for knowing whether any module in a whole program needs GC. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski
"PP" == Paul Prescod
writes:
PP> "A.M. Kuchling" wrote:
...
An experimental step has been made toward fixing this problem. When compiling Python, the -with-cycle-gc (XXX correct option flag?) option can be specified. This causes a cycle detection algorithm to be periodically executed, which looks for inaccessible cycles and deletes the objects involved.
PP> I propose (without any investigation into the difficulty of PP> implementation) that PP> import cyclicgc PP> should turn on the -with-cycle-gc flag. The -with-cycle-gc flag is an option to configure when you build Python. It can't be turned on or off at runtime. We hope that everyone will build with the -with-cycle-gc flag during the beta testing, but we don't expect to have enough confidence in it by final release to make it anything other than an experimental option. Jeremy
Jeremy Hylton wrote:
....
The -with-cycle-gc flag is an option to configure when you build Python. It can't be turned on or off at runtime.
Doh! Sorry. Andrew's text does say that but I didn't read carefully and testing it with my stuff hasn't floated to the top yet (because when I looked into it, I realized I'd have to recompile...double doh!) Compiling is really slow on my "transitional" computer. I'm a little disappointed that it will be so difficult for people to test with GC on, but I'm not going to claim that there is some easy way to make it a runtime option because I haven't looked at it at all. There probably isn't. -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski
On Wed, Jun 28, 2000 at 11:09:38AM -0700, Paul Prescod wrote:
I'm a little disappointed that it will be so difficult for people to test with GC on, but I'm not going to claim that there is some easy way to make it a runtime option because I haven't looked at it at all. There probably isn't.
There isn't. With an interpreter compiled --with-cycle-gc you can turn off the collection with gc.set_threshold(0) but that's not the same thing. Neil
On Wed, Jun 28, 2000 at 11:09:38AM -0700, Paul Prescod wrote:
I'm a little disappointed that it will be so difficult for people to test with GC on, but I'm not going to claim that there is some easy way to make it a runtime option because I haven't looked at it at all. There probably isn't.
There isn't. With an interpreter compiled --with-cycle-gc you can turn off the collection with gc.set_threshold(0) but that's not the same thing.
Neil
I wonder if we should turn this option *on* during beta testing? That way we gather a lot more experience with its use! Maybe nobody complains and we can leave it on in the final release... --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Jun 28, 2000 at 03:59:46PM -0500, Guido van Rossum wrote:
On Wed, Jun 28, 2000 at 11:09:38AM -0700, Paul Prescod wrote:
I'm a little disappointed that it will be so difficult for people to test with GC on, but I'm not going to claim that there is some easy way to make it a runtime option because I haven't looked at it at all. There probably isn't.
There isn't. With an interpreter compiled --with-cycle-gc you can turn off the collection with gc.set_threshold(0) but that's not the same thing.
Neil
I wonder if we should turn this option *on* during beta testing?
That way we gather a lot more experience with its use! Maybe nobody complains and we can leave it on in the final release...
That presumes an adequate number of testers and a wide variety of deployment/usage scenarios. Given the relatively short timeframe, that may be a bit aggressive. I *do* agree with enabling it during the beta. But I'm just not sure about leaving it on. Cheers, -g -- Greg Stein, http://www.lyra.org/
Guido van Rossum wrote:
I wonder if we should turn this option *on* during beta testing?
Why not, as long as --without-gc does not suffer from bugs hidden by --with-gc.
That way we gather a lot more experience with its use! Maybe nobody complains and we can leave it on in the final release...
This is risky. We don't have enough experience with this implementation. All we know is that things get slower and consume more memory. Without a clear picture of the impact of this GC implementation, it's probably a bad idea to enable (i.e. impose) it by default. The "optional experimental feature" label is good and IMO it complies with anybody's expectations. Interested people will jump in and will eventually contribute with improvements, those who don't care won't and will live happily without it (at least until the day the feature and its implementation reach their expected maturity). -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Vladimir Marangozov wrote:
...
This is risky. We don't have enough experience with this implementation. All we know is that things get slower and consume more memory. Without a clear picture of the impact of this GC implementation, it's probably a bad idea to enable (i.e. impose) it by default.
Won't we have a clear picture by the end of the beta period?
The "optional experimental feature" label is good and IMO it complies with anybody's expectations. Interested people will jump in and will eventually contribute with improvements, those who don't care won't and will live happily without it (at least until the day the feature and its implementation reach their expected maturity).
A million Window users will live unhappily without it because they don't know how to recompile to get it! -- Paul Prescod - Not encumbered by corporate consensus The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made the modern world possible. - The Advent of the Algorithm (pending), by David Berlinski
Won't we have a clear picture by the end of the beta period?
Not clear. Unfortunately, few people who run important apps will download a beta and try it. But they *will* download a new release labeled "final" and install it without making sure it works for their app. This has been my experience throughout Python's life. I've become pretty conservative about staying backwards compatible as a result...
The "optional experimental feature" label is good and IMO it complies with anybody's expectations. Interested people will jump in and will eventually contribute with improvements, those who don't care won't and will live happily without it (at least until the day the feature and its implementation reach their expected maturity).
A million Window users will live unhappily without it because they don't know how to recompile to get it!
This can be solved the way we solve everything for Windows users: give them two distributions to choose from -- or maybe one distribution with a checkbox to choose which version to use. --Guido van Rossum (home page: http://www.python.org/~guido/)
On 28 June 2000, Jeremy Hylton said:
PP> import cyclicgc
PP> should turn on the -with-cycle-gc flag.
The -with-cycle-gc flag is an option to configure when you build Python. It can't be turned on or off at runtime.
No, no, *obviously* Paul meant that "import cyclicgc" should configure and compile a new Python interpreter with "--with-cycle-gc" and 'exec()' it for you. What else would it mean? ;-) Greg (PS. if this were Perl, you could say use cyclegc; to turn it on, and then elsewhere in your code no cyclegc; to turn it off. Maybe Python needs an "unimport" command?) (Kidding!)
participants (9)
-
A.M. Kuchling
-
Fredrik Lundh
-
Greg Stein
-
Greg Ward
-
Guido van Rossum
-
Jeremy Hylton
-
Neil Schemenauer
-
Paul Prescod
-
Vladimir.Marangozov@inrialpes.fr