
On Mon, 26 Dec 2005 17:07:35 +0100, Andrea Arcangeli <andrea@cpushare.com> wrote:
Hello,
Hey, This is really a question for a Python list. However, I've attached some comments below.
I was just shoked today when I noticed this:
------------------- import sys
class A(object): y = None def x(self): pass def __del__(self): print 'deleted'
a = A() print sys.getrefcount(a) if 1: a.y = a.x print sys.getrefcount(a) del a -------------------
I understood the cross references memleaks well, like "x.y = y; y.x= x; del x,y", but I didn't imagine that "a.y = a.x" would be enough to generate a memleak. "a.y = a.x" isn't referencing another structure, it's referencing itself only. Infact if I do this the memleak goes away!!
I'm not sure how far you've gotten into this, but here's the basic explanation: "a.x" gives you a "bound method instance"; since you might do anything at all with the object it evaluates to, it wraps up a reference to the object "a" references, so it knows what object to use as "self"; this has the effect of increasing the reference count of "a", but it doesn't actually leak any memory. Of course, in creating a cycle which contains an object with an implementation of __del__, you have created a leak, since Python's GC cannot collect that kind of graph. Hopefully the __del__ implementation is only included as an aid to understanding what is going on, and you don't actually need it in any of your actual applications. Once removed, the cycle will be collectable by Python. Another strategy is to periodically examine gc.garbage and manually break cycles. This way, if you do have any __del__ implementations, they will no longer be part of a cycle, and Python will again be able to collect these objects.
------------------- import sys
class A(object): def x(self): pass y = x def __del__(self): print 'deleted'
a = A() print sys.getrefcount(a) a.x() a.y() print a.x, a.y del a -------------------
Now the fact a static field doesn't generate a reference but a dynamic one does is quite confusing to me and it also opened a can of worms in my code. I can handle that now that I know about it, but I wonder what people recommends to solve memleaks of this kind.
This is an interesting case. Python does not do what you probably expect here. When you define a class with methods, Python does not actually create any method objects! It is the actual attribute lookup on an instance which creates the method object. You can see this in the following example: >>> class X: ... def y(self): pass ... >>> a = X() >>> a.y is a.y False >>> a.y is X.__dict__['y'] False >>> X.__dict__['y'] is X.__dict__['y'] True >>> So when you added "y" to your class "A", Python didn't care, because there aren't even any method objects until you access an attribute which is bound to a function. Continuing the above example: >>> sys.getrefcount(a) 2 >>> L = [a.y, a.y, a.y, a.y] >>> sys.getrefcount(a) 6 >>>
I'd also like to know how other languages like ruby and java behave in terms of self-references of objects. Can't the language understand it's a self reference, and in turn it's the same as an integer or a string, like it already does when the member is initialized statically?
I don't know Ruby well enough to comment directly, but I believe Ruby's GC is much simpler (and less capable) than Python's. Java doesn't have bound methods (or unbound methods, or heck, functions): the obvious way in which you would construct them on top of the primitives the language does offer seems to me as though it would introduce the same "problem" you are seeing in Python, but that may just be due to the influence Python has had on my thinking.
Infact can't the language be smart enough to even understand when two cross referenced objects lost visibility from all points of view, and drop both objects even if they hold a reference on each other? I understand this is a lot more complicated but wouldn't it be possible in theory? What does the garbage collection of other languages like ruby and java, the same as python or more advanced?
When you have "two cross referenced objects", that's a cycle, and Python will indeed clean it up. The only exception is if there is a __del__ implementation, as I mentioned above. This is a general problem with garbage collection. If you have two objects which refer to each other and which each wish to perform some finalization, which finalizer do you call first?
So far my python programs never really cared to released memory (so my not full understanding of python refcounts wasn't a problem), but now since I'm dealing with a server I must make sure that the "proto" is released after a loseConnection invocation. So I must cleanup all cross and self! references in loseConnection and use weakrefs where needed.
Now those structures that I'm leaking (like the protocol object) are so tiny that there's no chance that I could ever notice the memleak in real life, so I had to add debugging code to trap memleaks. You can imagine my server code like this:
You might be surprised :) These things tend to build up, if your process is long-running.
class cpushare_protocol(Int32StringReceiver): def connectionMade(self): [..] self.hard_handlers = { PROTO_SECCOMP : self.seccomp_handler, PROTO_LOG : self.log_handler, } [..] def log_handler(self, string): [..] def seccomp_handler(self, string): [..] def __del__(self): print 'protocol deleted' def connectionLost(self, reason): [..] # memleaks del self.hard_handlers print 'protocol refcount:', sys.getrefcount(self) #assert sys.getrefcount(self) == 4
For things like hard_handlers (that are self-referencing callbacks) I can't even use the weakref.WeakValueDictionary, because it wouldn't hold itself, the object gets released immediately. So the only chance I have to release the memory of the protocol object when the connection is dropped, is to do an explicit del self.hard_handlers in loseConnection.
I wonder what other twisted developers do to avoid those troubles. Perhaps I shouldn't use self referencing callbacks to hold the state machine, and do like the smpt protocol that does this:
def lookupMethod(self, command): return getattr(self, 'do_' + command.upper(), None)
basically working with strings instead of pointers. Or I can simply make sure to cleanup all structures when I stop using them (like with the del self.hard_handlers above), but then I'll lose part of the automatic garbage collection features of python. I really want garbage collection or I could have written this in C++ if I'm forced to cleanup by hand.
(You can probably guess what I'm going to say here. ;) In general, I avoid implementing __del__. My programs may end up with cycles, but as long as I don't have __del__, Python can figure out how to free the objects. Note that it does sometimes take it a while (and this has implications for peak memory usage which may be important to you), but if you find a case that it doesn't handle, then you've probably found a bug in the GC that python-dev will fix. Hope this helps, and happy holidays, Jean-Paul