reference leaks, __del__, and annotations

The checkins list has been struggling with generator reference leaks; the latest conclusion was that some are unavoidable because of __del__ cycles. That sort of defeats the purpose of resource managers. (Yes, it can be worked around on a case-by-case basis.) As I see it, part of the problem is that (1) When there is a cycle, python refuses to guess. (2) There is no way for a __del__ method to hint at ordering constraints. (3) There is no lightweight version of __del__ to say "I don't care about ordering constraints." How about using an (optional) annotation on __del__ methods, to indicate how cycles should be broken? As a strawman proposal: deletes = [obj for obj in cycle if hasattr(obj, "cycle")] deletes.sort() for obj in deletes: obj.__del__() Lightweight __del__ methods (such as most resource managers) could set the cycle attribute to True, and thereby ensure that they don't cause unbreakable cycles. Fancier object frameworks could use different values for the cycle attribute. Any object whose __del__ is not annotated will still be at least as likely to get finalized as it is today. -jJ

(Apologies for the thinko -- corrected because it was in the example code.) The checkins list has been struggling with generator reference leaks; the latest conclusion was that some are unavoidable because of __del__ cycles. That sort of defeats the purpose of resource managers. (Yes, it can be worked around on a case-by-case basis.) As I see it, part of the problem is that (1) When there is a cycle, python refuses to guess. (2) There is no way for a __del__ method to hint at ordering constraints. (3) There is no lightweight version of __del__ to say "I don't care about ordering constraints." How about using an (optional) annotation on __del__ methods, to indicate how cycles should be broken? As a strawman proposal: deletes = [(obj.__del__.cycle, obj) for obj in cycle if hasattr(obj, "__del__") and hasattr(obj.__del__, "cycle")] deletes.sort() for (cycle, obj) in deletes: obj.__del__() Lightweight __del__ methods (such as most resource managers) could set the cycle attribute to True, and thereby ensure that they won't cause unbreakable cycles. Fancier object frameworks could use different values for the cycle attribute. Any object whose __del__ is not annotated will still be at least as likely to get finalized as it is today. -jJ

Jim Jewett wrote:
The checkins list has been struggling with generator reference leaks; the latest conclusion was that some are unavoidable because of __del__ cycles. That sort of defeats the purpose of resource managers.
Seems to me we need a whole new approach to finalization that's friendly to cyclic gc, such as a way of registering a finalizer that doesn't depend on the original object. If such a mechanism were available, could it be used instead of a __del__ method to clean up after a generator? (I'm asking because I'm not sure exactly what a generator __del__ needs to do.)
As a strawman proposal:
deletes = [(obj.__del__.cycle, obj) for obj in cycle if hasattr(obj, "__del__") and hasattr(obj.__del__, "cycle")] deletes.sort() for (cycle, obj) in deletes: obj.__del__()
I think we need to be very careful about doing anything like this. From what Tim said recently, the consequences of an object getting its __del__ annotation wrong could be as bad as crashing the interpreter. -- Greg

"Jim Jewett" <jimjjewett@gmail.com> wrote in news:fb6fbf560603301716x13c4cda7x7fd5e462850b5a03@mail.gmail.com:
As a strawman proposal:
deletes = [(obj.__del__.cycle, obj) for obj in cycle if hasattr(obj, "__del__") and hasattr(obj.__del__, "cycle")] deletes.sort() for (cycle, obj) in deletes: obj.__del__()
Lightweight __del__ methods (such as most resource managers) could set the cycle attribute to True, and thereby ensure that they won't cause unbreakable cycles. Fancier object frameworks could use different values for the cycle attribute. Any object whose __del__ is not annotated will still be at least as likely to get finalized as it is
That doesn't look right to me. Surely if you have a cycle what you want to do is to pick just *one* of the objects in the cycle and break the link which makes it participate in the cycle. That should be sufficient to cause the rest of the cycle to collapse with __del__ methods being called from the normal refcounting mechanism. So something like this: for obj in cycle: if hasattr(obj, "__breakcycle__"): obj.__breakcycle__() break Every object which knows it can participate in a cycle then has the option of defining a method which it can use to tear down the cycle. e.g. by releasing the resource and then deleting all of its attributes, but no guarantees are made over which obj has this method called. An object with a __breakcycle__ method would have to be extra careful as its methods could still be called after it has broken the cycle, but it does mean that the responsibilities are in the right place (i.e. defining the method implies taking that into account).

Duncan Booth wrote:
Surely if you have a cycle what you want to do is to pick just *one* of the objects in the cycle and break the link which makes it participate in the cycle. That should be sufficient to cause the rest of the cycle to collapse with __del__ methods being called from the normal refcounting mechanism.
So something like this:
for obj in cycle: if hasattr(obj, "__breakcycle__"): obj.__breakcycle__() break
Every object which knows it can participate in a cycle then has the option of defining a method which it can use to tear down the cycle. e.g. by releasing the resource and then deleting all of its attributes, but no guarantees are made over which obj has this method called. An object with a __breakcycle__ method would have to be extra careful as its methods could still be called after it has broken the cycle, but it does mean that the responsibilities are in the right place (i.e. defining the method implies taking that into account).
Unfortunately, there's two problems with that idea: a. it's broken, since we now have a partially torn down object at the tail end of our former cycle. What happens if the penultimate object's finaliser tries to access that broken one? b.it doesn't actually help in the case of generators (which are the ones causing all the grief). The generator object itself (which implements the __del__ method) knows nothing about what caused the cycle (the cycle is going to be due to the Python code in the body of the generator). As PJE posted the other day, the problem is that the GC assumes that because the *type* has a __del__ method, the *instance* needs finalisation. And for objects with an explicit close method (like generators), context management semantics (like generator-based context managers), or the ability to be finalised in the normal course of events (like generator-iterators), most instances *don't* need finalisation, as they'll have already been finalised in the normal course of events. Generators are even more special, in that they only require finalisation in the first place if they're stopped on a yield statement inside a try-finally block. A simple Boolean attribute (e.g. __finalized__) should be enough. If the type has a __del__ method, then the GC would check the __finalized__ attribute. If it's both present and true, the GC can ignore the finaliser on that instance (i.e. never invokes it, and doesn't treat cycles as uncollectable because of it) I don't know the GC well enough to know how hard that would be to implement, but I suspect we need to do it (or something like it) if PEP 342 isn't going to cause annoying memory leaks in real applications. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

At 11:27 PM 3/31/2006 +1000, Nick Coghlan wrote:
Generators are even more special, in that they only require finalisation in the first place if they're stopped on a yield statement inside a try-finally block.
Or a try-except block. Or a 'with' statement. It's only loop blocks that are exempt.
A simple Boolean attribute (e.g. __finalized__) should be enough. If the type has a __del__ method, then the GC would check the __finalized__ attribute. If it's both present and true, the GC can ignore the finaliser on that instance (i.e. never invokes it, and doesn't treat cycles as uncollectable because of it)
I don't know the GC well enough to know how hard that would be to implement, but I suspect we need to do it (or something like it) if PEP 342 isn't going to cause annoying memory leaks in real applications.
As Tim suggested, it'd be better to have the code be generator-specific, at least for now. That had actually been my original plan, to make it generator-specific, but I was afraid of breaking encapsulation in the garbage collector by having it know about generators. But now that Uncle Timmy has blessed the approach, I'll go back and add it in. (On Monday, unless somebody gets to it before me.)

[Phillip J. Eby]
... As Tim suggested, it'd be better to have the code be generator-specific, at least for now. That had actually been my original plan, to make it generator-specific, but I was afraid of breaking encapsulation in the garbage collector by having it know about generators.
It sucks in a way, but so would adding yet another new slot just for (at present, and possibly forever) making gc and generators play nicer together. "Practicality beats purity" here.
But now that Uncle Timmy has blessed the approach, I'll go back and add it in. (On Monday, unless somebody gets to it before me.)
It won't be me: I wasn't even able to make enough time to understand the new generator features at the Python level, let alone the implementation. At PyCon, when Guido showed his slide with a new yield-as-expression example, for the rest of his talk I was wondering what the heck the example meant <0.3 wink>.

Nick Coghlan wrote:
Generators are even more special, in that they only require finalisation in the first place if they're stopped on a yield statement inside a try-finally block.
I find it rather worrying that there could be a few rare cases in which my generators cause memory leaks, through no fault of my own and without my being able to do anything about it. Will there be a coding practice one can follow to ensure that this doesn't happen? -- Greg

Greg Ewing wrote:
Nick Coghlan wrote:
Generators are even more special, in that they only require finalisation in the first place if they're stopped on a yield statement inside a try-finally block.
I find it rather worrying that there could be a few rare cases in which my generators cause memory leaks, through no fault of my own and without my being able to do anything about it.
The GC changes PJE is looking at are to make sure you *can* do something about it. If the generator hasn't been started, or has already finished, then the GC won't consider it as needing finalisation.
Will there be a coding practice one can follow to ensure that this doesn't happen?
I believe PJE's fix should take care of most cases (depending on how aggressive we can safely be, it may even take care of all of them). If there are any remaining cases, I think the main thing is to avoid keeping half-finished generators around: from contextlib import closing with closing(itr): # Use the iterator in here as you wish # secure in the knowledge it will be # cleaned up promptly when you are done # whether it is a file, a generator or # something with a database connection for item in itr: print item Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Nick Coghlan wrote:
from contextlib import closing
with closing(itr): # Use the iterator in here as you wish # secure in the knowledge it will be # cleaned up promptly when you are done # whether it is a file, a generator or # something with a database connection for item in itr: print item
I seem to remember we've been here before. I'll be disappointed if I have to wrap every for-loop that I write in a with-statement on the offchance that it might be using a generator that needs finalisation in order to avoid leaking memory. I'm becoming more and more convinced that we desperately need something better than __del__ methods to do finalisation. A garbage collector that can't be relied upon to collect garbage is simply not acceptable. -- Greg

On 4/1/06, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'm becoming more and more convinced that we desperately need something better than __del__ methods to do finalisation. A garbage collector that can't be relied upon to collect garbage is simply not acceptable.
Sure. I don't believe it's too hard, it just means violating some of the premises people have been writing __del__ methods under. For instance, to clean up cycles nicely we might have to set some attributes to None before calling __del__, so you can't rely on attributes being meaningful anymore. However, this is already the case for global names; I've seen many people wonder about their __del__ method raising warnings (since exceptions are ignored) going, say, 'NoneType has no attribute 'registry'' when they try to un-register their class but the global registry has been cleaned up already. While we're at it, I would like for the new __del__ (which would probably have to be a new method) to disallow reviving self, just because it makes it unnecessarily complicated and it's rarely needed. Allowing a partially deleted object (or an object part of a partially deleted reference-cycle) to revive itself is not terribly useful, and there's no way to restore the rest of the cycle. I suggested a __dealloc__ method earlier in the thread to do this. I didn't think of allowing attributes to be cleared before calling the method, but I do believe that is necessary to allow, now that I've thought more about it. An alternative would be to make GC check for a 'cleanup-cycle' method on any of the objects in the cycle, and just feed it the complete cycle of objects, asking it to clean it up itself (or maybe reconnect one of the objects itself.) That would also make debugging uncollectable cycles a lot easier ;-) But I'm not sure whether that will improve things. The generator example, the trigger for this discussion, could solve its cycle by just closing itself, after which the cycle is either broken or reconnected, but I don't know if other typical cycles could be resolved that easily. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

"Thomas Wouters" <thomas@python.org> writes:
While we're at it, I would like for the new __del__ (which would probably have to be a new method) to disallow reviving self, just because it makes it unnecessarily complicated and it's rarely needed.
I'm not sure the problem is so much that anyone _wants_ to support resurrection in __del__, it's just that it can't be prevented. l = [] class A(object): def __del__(self): l.append(self) a = A() a = 1 What would you have this do? And if we want to have a version of __del__ that can't reference 'self', we have it already: weakrefs with callbacks. What happened to the 'get rid of __del__ in py3k' idea? Cheers, mwh -- <freeside> On a scale of One to AWESOME, twisted.web is PRETTY ABSTRACT!!!! -- from Twisted.Quotes

On 4/3/06, Michael Hudson <mwh@python.net> wrote:
I'm not sure the problem is so much that anyone _wants_ to support resurrection in __del__, it's just that it can't be prevented.
Well, Java has an answer to that (at least I believe Tim Peters told me so years ago): it allows resurrection, but will only call the finalizer once. IOW if the resurrected object is GC'ed a second time, its finalizer won't be called. This would require a bit "__del__ already called" on an object, but don't we have a whole word of GC-related flags? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Apr 3, 2006, at 3:12 PM, Neil Schemenauer wrote:
Guido van Rossum <guido@python.org> wrote:
This would require a bit "__del__ already called" on an object, but don't we have a whole word of GC-related flags?
No.
Actually there is. Kinda. Currently python's refcounting scheme uses 4 words per object (gc_next, gc_prev, gc_refs, ob_refcnt), and has one spare word in the padding of PyGC_Head that's just sitting there wasting memory. So really it's using up 5 words per object, and that 5th word could actually be used for flags... /* GC information is stored BEFORE the object structure. */ typedef union _gc_head { struct { union _gc_head *gc_next; union _gc_head *gc_prev; int gc_refs; } gc; long double dummy; /* force worst-case alignment */ } PyGC_Head; #define PyObject_HEAD \ _PyObject_HEAD_EXTRA \ int ob_refcnt; \ struct _typeobject *ob_type; typedef struct _object { PyObject_HEAD } PyObject; James

[Guido]
but don't we have a whole word of GC-related flags?
[Neil S]
No.
[James Y Knight]
Actually there is. Kinda. Currently python's refcounting scheme uses 4 words per object (gc_next, gc_prev, gc_refs, ob_refcnt), and has one spare word in the padding of PyGC_Head that's just sitting there wasting memory.
Using which compiler? This varies across boxes. Most obviously, on a 64-bit box all these members are 8 bytes (note that ob_refcnt is Py_ssize_t in 2.5, not int anymore), but even on some 32-bit boxes the "long double" trick only forces 4-byte alignment.

On Apr 3, 2006, at 4:02 PM, Tim Peters wrote:
Using which compiler? This varies across boxes. Most obviously, on a 64-bit box all these members are 8 bytes (note that ob_refcnt is Py_ssize_t in 2.5, not int anymore), but even on some 32-bit boxes the "long double" trick only forces 4-byte alignment.
Hm, yes, my mistake. I see that on linux/x86, long double only takes 12 bytes and is 4-byte aligned. Even though the actual CPU really wants it 8-byte aligned, the ABI has not been changed to allow that. On OSX/ppc32, OSX/ppc64 and linux/x86-64, doubles are 16 bytes, and 8- byte aligned. So the struct uses 16 bytes or 32 bytes, and has the extra word of unused space. All right, then, you could use the top bit of the ob_refcnt field. There's no way to possibly have 2**32 objects on a 32bit system anyhow. James

James Y Knight wrote:
All right, then, you could use the top bit of the ob_refcnt field. There's no way to possibly have 2**32 objects on a 32bit system anyhow.
Of this kind, we have several spare bits: there are atleast two bits in the ob_type field, since the type pointer is atleast 4-aligned on all platforms we support. OTOH, using these bits would break existing code. Regards, Martin

James Y Knight wrote:
All right, then, you could use the top bit of the ob_refcnt field. There's no way to possibly have 2**32 objects on a 32bit system anyhow.
That would slow down every Py_INCREF and Py_DECREF, which would need to be careful to exclude the top bit from their operations. -- Greg

[various people debating how to steal a bit from an existing PyObject member] Let's stop this. It's a "bike shed" argument: even if we had 1000 spare bits, it wouldn't do any good. Having a bit to record whether an object has been finalized doesn't solve any problem that's been raised recently (it doesn't cure any problems with resurrection, for example). Even if we wanted it just to support a slow, dubious way to collect trash cycles containing objects with __del__ methods, gcmodule.c would need major rewriting to make that happen. If someone is interested enough to do that, fine, but it sure ain't me ;-)

[Michael Hudson]
I'm not sure the problem is so much that anyone _wants_ to support resurrection in __del__, it's just that it can't be prevented.
[Guido]
Well, Java has an answer to that (at least I believe Tim Peters told me so years ago): it allows resurrection, but will only call the finalizer once. IOW if the resurrected object is GC'ed a second time, its finalizer won't be called.
Right, that's a technical trick Java uses. Note that it doesn't stop resurrection: all the resurrection-related pitfalls remain. One good result is that cycles containing objects with finalizers don't stop gc progress forever; some progress can always be made, although it may be as little as reclaiming one object per full gc cycle (ignoring that "full gc cycle" is a fuzzy concept in a runs-in-parallel threaded gc). A bad result is an endless stream of nearly-impenetrable articles encouraging deep fear of Java finalizers ;-); e.g., http://www.devx.com/Java/Article/30192/0/page/1
This would require a bit "__del__ already called" on an object, but don't we have a whole word of GC-related flags?
Nope! You're probably thinking of gc_refs. That's a Py_ssize_t today, and is overloaded to hold, at various times, a status enum (which only needs a few bits) or a copy of the object's refcount (which uses all the bits).

Michael Hudson wrote:
And if we want to have a version of __del__ that can't reference 'self', we have it already: weakrefs with callbacks.
Does that actually work at the moment? Last I heard, there was some issue with gc and weakref callbacks as well. Has that been resolved? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiam! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Michael Hudson wrote:
And if we want to have a version of __del__ that can't reference 'self', we have it already: weakrefs with callbacks.
Does that actually work at the moment? Last I heard, there was some issue with gc and weakref callbacks as well. Has that been resolved?
Talk about FUD. Yes, it works, as far as I know. Cheers, mwh -- <lament> Slashdot karma, unfortunately, is not real karma, because it doesn't involve the death of the people who have it -- from Twisted.Quotes

On 4/3/06, Michael Hudson <mwh@python.net> wrote:
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Michael Hudson wrote:
And if we want to have a version of __del__ that can't reference 'self', we have it already: weakrefs with callbacks.
Does that actually work at the moment? Last I heard, there was some issue with gc and weakref callbacks as well. Has that been resolved?
Talk about FUD. Yes, it works, as far as I know.
Not sure if everyone is talking about the same thing. This is still a problem (at least for me): http://svn.python.org/projects/python/trunk/Lib/test/crashers/weakref_in_del... It creates a weakref to self in __del__. There are 7 crashers, plus 5 more due to infinite recursion. :-( That doesn't include the parts of test_trace that are commented out. At least test_trace needs to be fixed prior to 2.5. n

"Neal Norwitz" <nnorwitz@gmail.com> writes:
On 4/3/06, Michael Hudson <mwh@python.net> wrote:
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Michael Hudson wrote:
And if we want to have a version of __del__ that can't reference 'self', we have it already: weakrefs with callbacks.
Does that actually work at the moment? Last I heard, there was some issue with gc and weakref callbacks as well. Has that been resolved?
Talk about FUD. Yes, it works, as far as I know.
Not sure if everyone is talking about the same thing. This is still a problem (at least for me): http://svn.python.org/projects/python/trunk/Lib/test/crashers/weakref_in_del...
It creates a weakref to self in __del__.
Yes, but that has nothing to do with the cycle collector. I even have a way to fix it, but I don't know if it breaks anything else... Cheers, mwh -- I wouldn't trust the Anglo-Saxons for much anything else. Given they way English is spelled, who could trust them on _anything_ that had to do with writing things down, anyway? -- Erik Naggum, comp.lang.lisp

[Michael Hudson]
And if we want to have a version of __del__ that can't reference 'self', we have it already: weakrefs with callbacks.
[Greg Ewing]
Does that actually work at the moment? Last I heard, there was some issue with gc and weakref callbacks as well. Has that been resolved?
[Michael]
Talk about FUD. Yes, it works, as far as I know.
I'm sure Greg has in mind this thread (which was in fact also the thread that floated the idea of getting rid of __del__ in P3K): http://mail.python.org/pipermail/python-dev/2004-November/049744.html As that said, some weakref gc semantics are pretty arbitrary now, and it gave two patches that implemented distinct semantic variants. A problem is that the variant semantics also seem pretty arbitrary ;-), and there's a dearth of compelling use cases to guide a decision. If someone devoted enough time to seriously trying to get rid of __del__, I suspect compelling use cases would arise. I never use __del__ anyway, so my motivation to spend time on it is hard to detect.

Tim Peters wrote:
A problem is that the variant semantics also seem pretty arbitrary ;-), and there's a dearth of compelling use cases to guide a decision.
At the time I think I suggested that it would be reasonable if weakref callbacks were guaranteed to be called as long as they weren't trash themselves and they didn't reference any trash. From what you said in a recent message, it sounds like that's the way it currently works. These semantics would be sufficient to be able to use weakrefs to register finalisers, I think. You keep a global list of weakrefs with callbacks that call the finalizer and then remove the weakref from the global list. I'll put together a prototype some time and see if it works. I actually have a use case at the moment -- a couple of types that need to release external resources, and since they're part of a library I'm distributing, I can't be sure people won't put them in cycles. -- Greg

[Tim Peters]
A problem is that the variant semantics also seem pretty arbitrary ;-), and there's a dearth of compelling use cases to guide a decision.
[Greg Ewing] |> At the time I think I suggested that it would be
reasonable if weakref callbacks were guaranteed to be called as long as they weren't trash themselves and they didn't reference any trash. From what you said in a recent message, it sounds like that's the way it currently works.
Nope, and that was the point of the first message in the thread I referenced. The issues were explained well in that thread, so I don't want to repeat it all here again.
These semantics would be sufficient to be able to use weakrefs to register finalisers, I think. You keep a global list of weakrefs with callbacks that call the finalizer and then remove the weakref from the global list.
I'll put together a prototype some time and see if it works. I actually have a use case at the moment -- a couple of types that need to release external resources, and since they're part of a library I'm distributing, I can't be sure people won't put them in cycles.
Note that it's very easy to do this with __del__. The trick is for your type not to have a __del__ method itself, but to point to a simple "cleanup object" with a __del__ method. Give that "contained" object references to the resources you want to close, and you're done. Because your "real" objects don't have __del__ methods, cycles involving them can't inhibit gc. The cleanup object's only purpose in life is to close resources. Like: class _RealTypeResourceCleaner: def __init__(self, *resources): self.resources = resources def __del__(self): if self.resources is not None: for r in self.resources: r.close() self.resources = None # and typically no other methods are needed, or desirable, in # this helper class class RealType: def __init__(*args): ... # and then, e.g., self.cleaner = _ResourceCleaner(resource1, resource2) ... tons of stuff, but no __del__ .... That's the simple, general way to mix cycles and __del__ without problems.

Tim Peters wrote:
Note that it's very easy to do this with __del__. The trick is for your type not to have a __del__ method itself, but to point to a simple "cleanup object" with a __del__ method. Give that "contained" object references to the resources you want to close, and you're done. Because your "real" objects don't have __del__ methods, cycles involving them can't inhibit gc. The cleanup object's only purpose in life is to close resources. Like:
class _RealTypeResourceCleaner: def __init__(self, *resources): self.resources = resources
def __del__(self): if self.resources is not None: for r in self.resources: r.close() self.resources = None
# and typically no other methods are needed, or desirable, in # this helper class
class RealType: def __init__(*args): ... # and then, e.g., self.cleaner = _ResourceCleaner(resource1, resource2)
... tons of stuff, but no __del__ ....
That's the simple, general way to mix cycles and __del__ without problems.
So, stealing this trick for generators would involve a "helper" object with a close() method, a __del__ method that invoked it, and access to the generator's frame stack (rather than to the generator itself). class _GeneratorCleaner: __slots__ = ["_gen_frame"] def __init__(self, gen_frame): self._gen_frame = gen_frame def close(self): # Do whatever gen.close() currently does to the # raise GeneratorExit in the frame stack # and catch it again def __del__(self): self.close() The generator's close() method would then change to be: def close(self): self._cleaner.close() Would something like that eliminate the current cycle problem? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

"Tim Peters" <tim.peters@gmail.com> writes:
[Michael Hudson]
... What happened to the 'get rid of __del__ in py3k' idea?
Apart from its initial mention, every now & again someone asks what happened to it :-).
Good enough for me :) Cheers, mwh (not subscribed to python-3000) -- You're going to have to remember that I still think of Twisted as a big multiplayer game, and all this HTTP stuff is just kind of a grotty way to display room descriptions. -- Glyph Lefkowitz

On 4/1/06, Nick Coghlan <ncoghlan@gmail.com> wrote:
Greg Ewing wrote:
I find it rather worrying that there could be a few rare cases in which my generators cause memory leaks, through no fault of my own and without my being able to do anything about it.
The GC changes PJE is looking at are to make sure you *can* do something about it. If the generator hasn't been started, or has already finished, then the GC won't consider it as needing finalisation.
Actually, if a generator has already finished, it no longer holds a suspended frame alive, and there is no cycle (at least not through the generator.) That's why test_generators no longer leaks; explicitly closing the generator breaks the cycle. So the only thing fixing GC would add is cleaning up cycles where a created but not started generator is the only thing keeping the cycle alive. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

On 3/31/06, Jim Jewett <jimjjewett@gmail.com> wrote:
The checkins list has been struggling with generator reference leaks; the latest conclusion was that some are unavoidable because of __del__ cycles.
That was Tim's conclusion, but I wasn't quite done thinking about it ;)
That sort of defeats the purpose of resource managers. (Yes, it can be worked around on a case-by-case basis.)
As I see it, part of the problem is that
(1) When there is a cycle, python refuses to guess. (2) There is no way for a __del__ method to hint at ordering constraints. (3) There is no lightweight version of __del__ to say "I don't care about ordering constraints."
An additional (and much more complicating) problem is that __del__ can (and is allowed to) revive 'self'. That means that halfway through your cleanup, however you decided to do it, you can suddenly find out that you shouldn't be cleaning up at all. And of course any given __del__ can rely on all of the parts of the cycle, even if one of those parts _claims_ it can safely break the cycle. I think there are three different scenarios involving cycles and/or __del__ methods: - An object may conciously create a cycle, and know how to resolve it. A '__breakcycle__' method or such may be the right way to handle those cases. It would have to be pretty sure that no one outside itself can (or should) rely on the attribute or closure it breaks the cycle with, though. (Sensible exceptions ought to be fine, IMHO.) - An object may not care about cycles, but want to do some cleanup when it is deleted. The C types have 'tp_dealloc' for this, and that's what PyFile uses to close files. If you want to emulate this behaviour in Python, you are forced to use __del__, creating the unreclaimable cycle problem. Perhaps a __dealloc__ class/staticmethod makes sense; it would be passed a dictionary of instancedata, but not 'self', so it can never revive 'self' and can be sanely used in cycles -- that is, some of its instancedata may still suddenly be None, but that's a problem with __del__ methods and global variables, too. - An object may care about cycles, actually need a __del__ method that can revive the object, but not have a sane way to say 'break the cycle at this point'. Generators may be an example of that kind of object, although I'm not sure: they could throw() an exception to end the cycle. I don't think we can reclaim cycles where none of the objects can fairly break the cycle. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
participants (13)
-
"Martin v. Löwis"
-
Duncan Booth
-
Greg Ewing
-
Guido van Rossum
-
James Y Knight
-
Jim Jewett
-
Michael Hudson
-
Neal Norwitz
-
Neil Schemenauer
-
Nick Coghlan
-
Phillip J. Eby
-
Thomas Wouters
-
Tim Peters