Looping-related Memory Leak

Tom Davis binjured at gmail.com
Tue Jul 1 15:44:53 CEST 2008


On Jun 30, 3:12 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Mon, 30 Jun 2008 10:55:00 -0700, Tom Davis wrote:
> > To me, this seems illogical.  I can understand that the GC is
> > reluctant to reclaim objects that have many connections to other
> > objects and so forth, but once those objects' scopes are gone, why
> > doesn't it force a reclaim?  For instance, I can use timeit to create
> > an object instance, run a method of it, then `del` the variable used
> > to store the instance, but each loop thereafter continues to require
> > more memory and take more time. 1000 runs may take .27 usec/pass
> > whereas 100000 takes 2 usec/pass (Average).
>
> `del` just removes the name and one reference to that object.  Objects are
> only deleted when there's no reference to them anymore.  Your example
> sounds like you keep references to objects somehow that are accumulating.
> Maybe by accident.  Any class level bound mutables or mutable default
> values in functions in that source code?  Would be my first guess.
>
> Ciao,
>         Marc 'BlackJack' Rintsch

Marc,

Thanks for the tips.  A quick confirmation:

I took "class level bound mutables" to mean something like:

  Class A(object):
    SOME_MUTABLE = [1,2]
  ...

And "mutable default values" to mean:

  ...
  def a(self, arg=[1,2]):
  ...

If this is correct, I have none of these.  I understand your point
about the references, but in my `timeit` example the statement is as
simple as this:

  import MyClass
  a = MyClass()
  del a

So, yes, it would seem that object references are piling up and not
being removed.  This is entirely by accident.  Is there some kind of
list somewhere that says "If your class has any of these attributes
(mutable defaults, class-level mutables, etc.) it may not be properly
dereferenced:"? My obvious hack around this is to only do X loops at a
time and make a cron to run the script over and over until all the
files have been processed, but I'd much prefer to make the code run as
intended. I ran a test overnight last night and found that at first a
few documents were handled per second, but when I woke up it had
slowed down so much that it took over an hour to process a single
document! The RAM usage went from 20mb at the start to over 300mb when
it should actually never use more than about 20mb because everything
is handled with local variables and new objects are instantiated for
each document. This is a serious problem.

Thanks,

Tom



More information about the Python-list mailing list