Re: [Twisted-Python] help with refcounts and memleaks

Dec. 26, 2005

      On Mon, 26 Dec 2005 17:07:35 +0100, Andrea Arcangeli <andrea@cpushare.com> wrote:
...
Hello,
Hey,

This is really a question for a Python list.  However, I've attached 
some comments below.
...
I was just shoked today when I noticed this:
-------------------
import sys
class A(object):
  y = None
  def x(self):
      pass
  def __del__(self):
      print 'deleted'
a = A()
print sys.getrefcount(a)
if 1:
  a.y = a.x
print sys.getrefcount(a)
del a
-------------------
I understood the cross references memleaks well, like "x.y = y; y.x= x;
del x,y", but I didn't imagine that "a.y = a.x" would be enough to
generate a memleak. "a.y = a.x" isn't referencing another structure,
it's referencing itself only. Infact if I do this the memleak goes
away!!
I'm not sure how far you've gotten into this, but here's the basic 
explanation: "a.x" gives you a "bound method instance"; since you 
might do anything at all with the object it evaluates to, it wraps 
up a reference to the object "a" references, so it knows what object 
to use as "self"; this has the effect of increasing the reference 
count of "a", but it doesn't actually leak any memory.

Of course, in creating a cycle which contains an object with an 
implementation of __del__, you have created a leak, since Python's 
GC cannot collect that kind of graph.

Hopefully the __del__ implementation is only included as an aid to 
understanding what is going on, and you don't actually need it in 
any of your actual applications.  Once removed, the cycle will be 
collectable by Python.

Another strategy is to periodically examine gc.garbage and manually 
break cycles.  This way, if you do have any __del__ implementations, 
they will no longer be part of a cycle, and Python will again be 
able to collect these objects.
...
-------------------
import sys
class A(object):
  def x(self):
      pass
  y = x
  def __del__(self):
      print 'deleted'
a = A()
print sys.getrefcount(a)
a.x()
a.y()
print a.x, a.y
del a
-------------------
Now the fact a static field doesn't generate a reference but a dynamic
one does is quite confusing to me and it also opened a can of worms in
my code. I can handle that now that I know about it, but I wonder what
people recommends to solve memleaks of this kind.
This is an interesting case.  Python does not do what you probably 
expect here.  When you define a class with methods, Python does not 
actually create any method objects!  It is the actual attribute lookup 
on an instance which creates the method object.  You can see this in 
the following example:

    >>> class X:
    ...   def y(self): pass
    ... 
    >>> a = X()
    >>> a.y is a.y
    False
    >>> a.y is X.__dict__['y']
    False
    >>> X.__dict__['y'] is X.__dict__['y']
    True
    >>> 

So when you added "y" to your class "A", Python didn't care, because 
there aren't even any method objects until you access an attribute 
which is bound to a function.  Continuing the above example:

    >>> sys.getrefcount(a)
    2
    >>> L = [a.y, a.y, a.y, a.y]
    >>> sys.getrefcount(a)
    6
    >>>
...
I'd also like to know how other languages like ruby and java behave in
terms of self-references of objects. Can't the language understand it's
a self reference, and in turn it's the same as an integer or a string,
like it already does when the member is initialized statically?
I don't know Ruby well enough to comment directly, but I believe Ruby's 
GC is much simpler (and less capable) than Python's.  Java doesn't have 
bound methods (or unbound methods, or heck, functions): the obvious way 
in which you would construct them on top of the primitives the language 
does offer seems to me as though it would introduce the same "problem" 
you are seeing in Python, but that may just be due to the influence 
Python has had on my thinking.
...
Infact can't the language be smart enough to even understand when two
cross referenced objects lost visibility from all points of view, and
drop both objects even if they hold a reference on each other? I
understand this is a lot more complicated but wouldn't it be possible in
theory? What does the garbage collection of other languages like ruby
and java, the same as python or more advanced?
When you have "two cross referenced objects", that's a cycle, and 
Python will indeed clean it up.  The only exception is if there is a 
__del__ implementation, as I mentioned above.  This is a general problem 
with garbage collection.  If you have two objects which refer to each 
other and which each wish to perform some finalization, which finalizer 
do you call first?
...
So far my python programs never really cared to released memory (so my
not full understanding of python refcounts wasn't a problem), but now
since I'm dealing with a server I must make sure that the "proto" is
released after a loseConnection invocation. So I must cleanup all cross
and self! references in loseConnection and use weakrefs where needed.
Now those structures that I'm leaking (like the protocol object) are so
tiny that there's no chance that I could ever notice the memleak in real
life, so I had to add debugging code to trap memleaks. You can imagine
my server code like this:
You might be surprised :)  These things tend to build up, if your process 
is long-running.
...
class cpushare_protocol(Int32StringReceiver):
  def connectionMade(self):
      [..]
      self.hard_handlers = {
      	PROTO_SECCOMP : self.seccomp_handler,
      	PROTO_LOG : self.log_handler,
      	}
      [..]
  def log_handler(self, string):
      [..]
  def seccomp_handler(self, string):
      [..]
  def __del__(self):
      print 'protocol deleted'
  def connectionLost(self, reason):
      [..]
      # memleaks
      del self.hard_handlers
      print 'protocol refcount:', sys.getrefcount(self)
      #assert sys.getrefcount(self) == 4
For things like hard_handlers (that are self-referencing callbacks) I
can't even use the weakref.WeakValueDictionary, because it wouldn't hold
itself, the object gets released immediately. So the only chance I have
to release the memory of the protocol object when the connection is
dropped, is to do an explicit del self.hard_handlers in loseConnection.
I wonder what other twisted developers do to avoid those troubles.
Perhaps I shouldn't use self referencing callbacks to hold the state
machine, and do like the smpt protocol that does this:
def lookupMethod(self, command):
       return getattr(self, 'do_' + command.upper(), None)
basically working with strings instead of pointers. Or I can simply make
sure to cleanup all structures when I stop using them (like with the del
self.hard_handlers above), but then I'll lose part of the automatic
garbage collection features of python. I really want garbage collection
or I could have written this in C++ if I'm forced to cleanup by hand.
(You can probably guess what I'm going to say here. ;)  In general, I 
avoid implementing __del__.  My programs may end up with cycles, but 
as long as I don't have __del__, Python can figure out how to free the 
objects.  Note that it does sometimes take it a while (and this has 
implications for peak memory usage which may be important to you), but 
if you find a case that it doesn't handle, then you've probably found 
a bug in the GC that python-dev will fix.

Hope this helps, and happy holidays,

Jean-Paul

Re: [Twisted-Python] help with refcounts and memleaks

Jean-Paul Calderone