Looping-related Memory Leak
binjured at gmail.com
Tue Jul 1 09:50:49 EDT 2008
On Jun 30, 8:24 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Jun 30, 1:55 pm, Tom Davis <binju... at gmail.com> wrote:
> > On Jun 26, 5:38 am, Carl Banks <pavlovevide... at gmail.com> wrote:
> > > On Jun 26, 5:19 am, Tom Davis <binju... at gmail.com> wrote:
> > > > I am having a problem where a long-running function will cause a
> > > > memory leak / balloon for reasons I cannot figure out. Essentially, I
> > > > loop through a directory of pickled files, load them, and run some
> > > > other functions on them. In every case, each function uses only local
> > > > variables and I even made sure to use `del` on each variable at the
> > > > end of the loop. However, as the loop progresses the amount of memory
> > > > used steadily increases.
> > > Do you happen to be using a single Unpickler instance? If so, change
> > > it to use a different instance each time. (If you just use the module-
> > > level load function you are already using a different instance each
> > > time.)
> > > Unpicklers hold a reference to everything they've seen, which prevents
> > > objects it unpickles from being garbage collected until it is
> > > collected itself.
> > > Carl Banks
> > Carl,
> > Yes, I was using the module-level unpickler. I changed it with little
> > effect. I guess perhaps this is my misunderstanding of how GC works.
> > For instance, if I have `a = Obj()` and run `a.some_method()` which
> > generates a highly-nested local variable that cannot be easily garbage
> > collected, it was my assumption that either (1) completing the method
> > call or (2) deleting the object instance itself would automatically
> > destroy any variables used by said method. This does not appear to be
> > the case, however. Even when a variable/object's scope is destroyed,
> > it would seem t hat variables/objects created within that scope cannot
> > always be reclaimed, depending on their complexity.
> > To me, this seems illogical. I can understand that the GC is
> > reluctant to reclaim objects that have many connections to other
> > objects and so forth, but once those objects' scopes are gone, why
> > doesn't it force a reclaim?
> Are your objects involved in circular references, and do you have any
> objects with a __del__ method? Normally objects are reclaimed when
> the reference count goes to zero, but if there are cycles then the
> reference count never reaches zero, and they remain alive until the
> generational garbage collector makes a pass to break the cycle.
> However, the generational collector doesn't break cycles that involve
> objects with a __del__method.
There are some circular references, but these are produced by objects
created by BeautifulSoup. I try to decompose all of them, but if
there's one part of the code to blame it's almost certainly this. I
have no objects with __del__ methods, at least none that I wrote.
> Are you calling any C extensions that might be failing to decref an
> object? There could be a memory leak.
Perhaps. Yet another thing to look into.
> Are you keeping a reference around somewhere. For example, appending
> results to a list, and the result keeps a reference to all of your
> unpickled data for some reason.
> You know, we can throw out all these scenarios, but these suggestions
> are just common pitfalls. If it doesn't look like one of these
> things, you're going to have to do your own legwork to help isolate
> what's causing the behavior. Then if needed you can come back to us
> with more detailed information.
> Start with your original function, and slowly remove functionality
> from it until the bad behavior goes away. That will give you a clue
> what's causing it.
I realize this and thank you folks for your patience. I thought
perhaps there was something simple I was overlooking, but in this case
it would seem that there are dozens of things outside of my direct
control that could be causing this, most likely from third-party
libraries I am using. I will continue to try to debug this on my own
and see if I can figure anything out. Memory leaks and failing GC and
so forth are all new concerns for me.
More information about the Python-list