Leaking memory problem
Hi!
I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem..
I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100200 iterations 8GB is used and the program exits with MemoryError.
I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by recomputing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations..
I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ?
I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything.
Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere?
Thanks for any help! Jaakko
I added allocation tracking tools to numpy for exactly this reason. They are not very well documented, but you can see how to use them here:
https://github.com/numpy/numpy/tree/master/tools/allocation_tracking
Ray
On Mon, Feb 25, 2013 at 8:41 AM, Jaakko Luttinen jaakko.luttinen@aalto.fi wrote:
Hi!
I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem..
I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100200 iterations 8GB is used and the program exits with MemoryError.
I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by recomputing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations..
I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ?
I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything.
Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere?
Thanks for any help! Jaakko _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Mon, Feb 25, 2013 at 8:41 AM, Jaakko Luttinen jaakko.luttinen@aalto.fi wrote:
Hi!
I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem..
I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100200 iterations 8GB is used and the program exits with MemoryError.
I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by recomputing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations..
There are some stories where pythons garbage collection is too slow to kick in. try to call gc.collect in the loop to see if it helps.
roughly what I remember: collection works by the number of objects, if you have a few very large arrays, then memory increases, but garbage collection doesn't start yet.
Josef
I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ?
I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything.
Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere?
Thanks for any help! Jaakko _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Josef's suggestion is the first thing I'd try.
Are you doing any of this in C ? It is easy to end up duplicating memory that you need to Py_DECREF . In the C debugger you should be able to monitor the ref count of your python objects.
btw, for manual tracking of reference counts you can do, sys.getrefcount it has come handy for me every once in a while but usually the garbage collector is all I've needed besides patience.
The way I usually run the gc is by doing
gc.enable( ) gc.set_debug(gc.DEBUG_LEAK)
as pretty much my first lines and then after everything is said and done I do something along the lines of,
gc.collect( ) for x in gc.garbage: s = str(x) print type(x)
you'd have to set up your program to quite before it runs out of memory of course but I understand you get to run for quite a few iterations before failure.
Raul
On 25/02/2013 9:03 AM, josef.pktd@gmail.com wrote:
On Mon, Feb 25, 2013 at 8:41 AM, Jaakko Luttinen jaakko.luttinen@aalto.fi wrote:
Hi!
I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem..
I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100200 iterations 8GB is used and the program exits with MemoryError.
I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by recomputing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations..
There are some stories where pythons garbage collection is too slow to kick in. try to call gc.collect in the loop to see if it helps.
roughly what I remember: collection works by the number of objects, if you have a few very large arrays, then memory increases, but garbage collection doesn't start yet.
Josef
I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ?
I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything.
Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere?
Thanks for any help! Jaakko _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Is this with 1.7? There see a few memory leak fixes in 1.7, so if you aren't using that you should try it to be sure. And if you are using it, then there is one known memory leak bug in 1.7 that you might want to check whether you're hitting: https://github.com/numpy/numpy/issues/2969
n On 25 Feb 2013 13:41, "Jaakko Luttinen" jaakko.luttinen@aalto.fi wrote:
Hi!
I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem..
I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100200 iterations 8GB is used and the program exits with MemoryError.
I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by recomputing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations..
I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ?
I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything.
Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere?
Thanks for any help! Jaakko _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Thanks for all the answers, they were helpful!
I was using 1.7.0 and now installed from git: https://github.com/numpy/numpy/archive/master.zip
And it looks like the memory leak is gone, so I guess I was hitting that known memory leak bug. Thanks!
Jaakko
On 02/26/2013 09:04 AM, Nathaniel Smith wrote:
Is this with 1.7? There see a few memory leak fixes in 1.7, so if you aren't using that you should try it to be sure. And if you are using it, then there is one known memory leak bug in 1.7 that you might want to check whether you're hitting: https://github.com/numpy/numpy/issues/2969
n
On 25 Feb 2013 13:41, "Jaakko Luttinen" <jaakko.luttinen@aalto.fi mailto:jaakko.luttinen@aalto.fi> wrote:
Hi! I was wondering if anyone could help me in finding a memory leak problem with NumPy. My project is quite massive and I haven't been able to construct a simple example which would reproduce the problem.. I have an iterative algorithm which should not increase the memory usage as the iteration progresses. However, after the first iteration, 1GB of memory is used and it steadily increases until at about 100200 iterations 8GB is used and the program exits with MemoryError. I have a collection of objects which contain large arrays. In each iteration, the objects are updated in turns by recomputing the arrays they contain. The number of arrays and their sizes are constant (do not change during the iteration). So the memory usage should not increase, and I'm a bit confused, how can the program run out of memory if it can easily compute at least a few iterations.. I've tried to use Pympler, but I've understood that it doesn't show the memory usage of NumPy arrays.. ? I also tried gc.set_debug(gc.DEBUG_UNCOLLECTABLE) and then printing gc.garbage at each iteration, but that doesn't show anything. Does anyone have any ideas how to debug this kind of memory leak bug? And how to find out whether the bug is in my code, NumPy or elsewhere? Thanks for any help! Jaakko _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org <mailto:NumPyDiscussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (5)

Jaakko Luttinen

josef.pktd＠gmail.com

Nathaniel Smith

Raul Cota

Thouis (Ray) Jones