Debugging memory exhaustion in Python?
I've written Python code to calculate a bunch of things for a bunch of simulations. The code goes through about 5GB in 10-100MB chunks. The problem is that Python eventually runs out of memory, consuming (according to top) 3GB. I don't see why it should be doing this--as far as I know I'm not hanging on to any references of anything. I've fooled around with the garbage collector, turning debugging information on and trying to see if it will give me useful info about who or what is still hanging around in memory. I've tried to delete all the user variables but even after this the garbage collector can't free any more memory. What I need is du for python memory, just to get a sense of how/why this is happening. Anyone have suggestions about how to get traction on this? Thanks, Greg
Are you using a specific IDE ? Matthieu 2007/6/15, Greg Novak <novak@ucolick.org>:
I've written Python code to calculate a bunch of things for a bunch of simulations. The code goes through about 5GB in 10-100MB chunks. The problem is that Python eventually runs out of memory, consuming (according to top) 3GB. I don't see why it should be doing this--as far as I know I'm not hanging on to any references of anything.
I've fooled around with the garbage collector, turning debugging information on and trying to see if it will give me useful info about who or what is still hanging around in memory.
I've tried to delete all the user variables but even after this the garbage collector can't free any more memory.
What I need is du for python memory, just to get a sense of how/why this is happening. Anyone have suggestions about how to get traction on this?
Thanks, Greg _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Some questions: 1) What version of python are you using? Python 2.4 and below has some issues with memory not being released back to the OS. 2) What data structures are you using to represent the data? Brian On 6/14/07, Greg Novak <novak@ucolick.org> wrote:
I've written Python code to calculate a bunch of things for a bunch of simulations. The code goes through about 5GB in 10-100MB chunks. The problem is that Python eventually runs out of memory, consuming (according to top) 3GB. I don't see why it should be doing this--as far as I know I'm not hanging on to any references of anything.
I've fooled around with the garbage collector, turning debugging information on and trying to see if it will give me useful info about who or what is still hanging around in memory.
I've tried to delete all the user variables but even after this the garbage collector can't free any more memory.
What I need is du for python memory, just to get a sense of how/why this is happening. Anyone have suggestions about how to get traction on this?
Thanks, Greg _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Are you using a specific IDE ?
Plain old IPython, but it happens when I run it in a bare python interpreter as well. On 6/15/07, Brian Granger <ellisonbg.net@gmail.com> wrote:
1) What version of python are you using? Python 2.4 and below has some issues with memory not being released back to the OS.
2.5
2) What data structures are you using to represent the data?
Lots of arrays... It's mostly particle data, although I do flagrantly generate lots of temporaries. I'm not careful at all about that. I thought this could have speed implications, but I didn't realize it could have memory exhaustion implications, too. Since I'm only handling 10's of MB at a time, I also thought that memory fragmentation wouldn't be a severe problem. If I had GB arrays and started generating lots of temporary copies, I could see that that would lead to trouble... I found a program called heapy that's supposed to help with this. Anyone have any experience with it? Thanks for your thoughts, Greg
On Jun 15, 2007, at 15:48 , Greg Novak wrote:
Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Are you using a specific IDE ?
Plain old IPython, but it happens when I run it in a bare python interpreter as well.
On 6/15/07, Brian Granger <ellisonbg.net@gmail.com> wrote:
1) What version of python are you using? Python 2.4 and below has some issues with memory not being released back to the OS.
2.5
2) What data structures are you using to represent the data?
Lots of arrays... It's mostly particle data, although I do flagrantly generate lots of temporaries. I'm not careful at all about that. I thought this could have speed implications, but I didn't realize it could have memory exhaustion implications, too. Since I'm only handling 10's of MB at a time, I also thought that memory fragmentation wouldn't be a severe problem. If I had GB arrays and started generating lots of temporary copies, I could see that that would lead to trouble...
When this happens to me its either because I screwed up handling the reference counts in a C extension, or I'm keeping old copies of arrays in a cache or a log object. -- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
2007/6/15, Greg Novak <novak@ucolick.org>:
Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Are you using a specific IDE ?
Plain old IPython, but it happens when I run it in a bare python interpreter as well.
I asked this because IPython can keep some extra references, so memory is not freed, but if that happens with the simple interpreter :| Matthieu
Is there a simple way to get IPython to release its references? I'm interested in that, too, independently. Is it as simple as clearing out the In[] and Out[] lists? Greg On 6/15/07, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
2007/6/15, Greg Novak <novak@ucolick.org>:
Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Are you using a specific IDE ?
Plain old IPython, but it happens when I run it in a bare python interpreter as well.
I asked this because IPython can keep some extra references, so memory is not freed, but if that happens with the simple interpreter :|
Matthieu
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Greg Novak wrote:
Is there a simple way to get IPython to release its references? I'm interested in that, too, independently. Is it as simple as clearing out the In[] and Out[] lists?
There are also variables _NN which correspond to Out[NN] that need to be deleted. There are also _, __, and ___, but those will get rotated shortly. Also, don't worry about In; it's just the strings you typed, nothing too memory consuming. Here's a function that you can use: import bisect def clearout(__IP, upto=None): """ Clear the IPython Out cache, possibly only up to a given entry. """ ns = __IP.ns_table['user'] Out = ns.get('Out', None) if Out is not None: keys = sorted(Out) if upto is not None: keys = keys[:bisect.bisect_right(keys, upto)] for key in keys: del Out[key] else: # No cache. # Still might have the _NN variables sitting around. keys = [] for var in ns: if var.startswith('_'): try: nn = int(var[1:]) except ValueError: continue if upto is not None and nn < upto: keys.append(nn) for key in keys: _key = '_%s' % key del ns[_key] print 'Remove Out entries: %s' % keys In [1]: from clearout import clearout In [2]: 2 Out[2]: 2 In [3]: 3 Out[3]: 3 In [4]: 4 Out[4]: 4 In [5]: 5 Out[5]: 5 In [6]: 6 Out[6]: 6 In [7]: 7 Out[7]: 7 In [8]: 8 Out[8]: 8 In [9]: 9 Out[9]: 9 In [10]: 10 Out[10]: 10 In [11]: print Out {2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10} In [12]: clearout(__IP, upto=6) Remove Out entries: [2, 3, 4, 5, 6] In [13]: print Out {7: 7, 8: 8, 9: 9, 10: 10} -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 6/15/07, Brian Granger <ellisonbg.net@gmail.com> wrote:
1) What version of python are you using? Python 2.4 and below has some issues with memory not being released back to the OS.
It seems like this is what's happening, even though I'm using Python 2.5. I have a function that, when called, pretty reliably makes the memory usage and resident size go up by ~40 megabytes every time it's called. I'm looking at the VmSize and VmRSS lines in /proc/pid/status on an Ubuntu machine to determine memory usage. I expected to find zillions of objects added to the list returned by gc.get_objects. However, there were only 27 objects added, and they all seemed small -- strings, small dicts, one Frame object, and that's about it. I mentioned a python module called Heapy: http://guppy-pe.sourceforge.net/ It lets you set a reference point and then look at the sizes of all objects allocated after that time. This confirms what I found above manually-- only a few objects created, and they're small. So it does seem as though the Python garbage collector has freed the objects, but it hasn't returned the memory to the operating system. This continues until I have several GB allocated and the program crashes. I'm not using any of my own C extensions for this (where I could screw up the reference counting) and it doesn't look like the problem is leaking objects anyway. So... does anyone have any thoughts about what could cause this? Thanks, Greg
On 19/06/07, Greg Novak <novak@ucolick.org> wrote:
It seems like this is what's happening, even though I'm using Python 2.5. I have a function that, when called, pretty reliably makes the memory usage and resident size go up by ~40 megabytes every time it's called. I'm looking at the VmSize and VmRSS lines in /proc/pid/status on an Ubuntu machine to determine memory usage. I expected to find zillions of objects added to the list returned by gc.get_objects. However, there were only 27 objects added, and they all seemed small -- strings, small dicts, one Frame object, and that's about it.
What does the Frame object contain? Doesn't it have the complete set of function local variables? I suppose you're listing everything it points to as well. Keep in mind that numpy objects sometimes keep alive big hunks of memory. For example, if you allocate a huge array and then pick out a small piece using a view, the original huge chunk of memory is kept (and it is not allocated using python's malloc so it may not be accounted for in your tools). There's also the problem that a view holds a reference to the array object it's a view of, so taking views of views of views of ... can lead to arbitrarily long chains of objects. Anne
participants (6)
-
Anne Archibald -
Brian Granger -
David M. Cooke -
Greg Novak -
Matthieu Brucher -
Robert Kern