Hi, I am starting to push the limits of the available memory and I'd like to understand a bit better how Python handles memory... If I try to allocate something too big for the available memory I often get a MemoryError exception. However, in other situations, Python memory use continues to grow until the machine falls over. I was hoping to understand the difference between those cases. From what I've read Python never returns memory to the OS (is this right?) so the second case, python is holding on to memory that it isn't really using (for objects that have been destroyed). I guess my question is why doesn't it reuse the memory freed from object deletions instead of requesting more - and even then when requesting more, why does it continue until the machine falls over and not cause a MemoryError? While investigating this I found this script: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474 which does wonders for my code. I was wondering if this function should be included in Numpy as it seems to provide an important feature, or perhaps an entry on the wiki (in Cookbook section?) Thanks, Robin
Robin schrieb:
If I try to allocate something too big for the available memory I often get a MemoryError exception. However, in other situations, Python memory use continues to grow until the machine falls over. I was hoping to understand the difference between those cases. From what I've read Python never returns memory to the OS (is this right?) so the second case, python is holding on to memory that it isn't really using (for objects that have been destroyed). I guess my question is why doesn't it reuse the memory freed from object deletions instead of requesting more - and even then when requesting more, why does it continue until the machine falls over and not cause a MemoryError?
Your assumption isn't correct. Python releases memory. For small objects Python uses its own memory allocation system as explained in http://svn.python.org/projects/python/trunk/Objects/obmalloc.c . For integer and floats uses a separate block allocation schema. Christian
Robin wrote:
Hi,
I am starting to push the limits of the available memory and I'd like to understand a bit better how Python handles memory...
This is why I switched to 64 bit linux and never looked back.
If I try to allocate something too big for the available memory I often get a MemoryError exception. However, in other situations, Python memory use continues to grow until the machine falls over. I was hoping to understand the difference between those cases. I don't know what "falls over" mean. It could be that you're getting swap death -- the kernel starts attempting to use virtual memory (hard disk) for some of the RAM. This would be characterized by your CPU use dropping to near-zero, your hard disk grinding away, and your swap space use increasing.
The MemoryError simply means that Python made a request for memory that the kernel didn't grant. There's something else you might run into -- the maximum memory size of a process before the kernel kills that process. On linux i686, IIRC this limit is 3 GB. I'm not sure why you get different behavior on different runs. FWIW, with 64 bit linux the worst that happens to me now is swap death, which can be forestalled by adding lots of RAM.
From what I've read Python never returns memory to the OS (is this right?) so the second case, python is holding on to memory that it isn't really using (for objects that have been destroyed). I guess my question is why doesn't it reuse the memory freed from object deletions instead of requesting more - and even then when requesting more, why does it continue until the machine falls over and not cause a MemoryError?
It's hard to say without knowing what your code does. A first guess is that you're allocating lots of memory without allowing it to be freed. Specifically, you may have references to objects which you no longer need, and you should eliminate those references and allow them to be garbage collected. In some cases, circular references can be hard for python to detect, so you might want to play around with the gc module and judicious use of the del statement. Note also that IPython keeps references to past results by default (the history).
While investigating this I found this script: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474 which does wonders for my code. I was wondering if this function should be included in Numpy as it seems to provide an important feature, or perhaps an entry on the wiki (in Cookbook section?)
I don't think it belongs in numpy per se, and I'm not sure of the necessity of a spot on the scipy cookbook given that it's in the python cookbook. Perhaps more useful would be starting a page called "MemoryIssues" on the scipy wiki -- I imagine this subject, as a whole, is of particular interest for many in the numpy/scipy crowd. Certainly adding a link and description to that recipe would be useful in that context. But please, feel free to add to or edit the wiki as you see fit -- if you think something will be useful, by all means, go ahead and do it. I think there are enough eyes on the wiki that it's fairly self-regulating. -Andrew
--- Robin <robince@gmail.com> wrote: [...]
While investigating this I found this script: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474 which does wonders for my code. I was wondering if this function should be included in Numpy as it seems to provide an important feature, or perhaps an entry on the wiki (in Cookbook section?)
I am the author of the mentioned recipe, and the reason I have written it is similar to your situation. I would add, however that ideally there shouldn't be such a problem but in reality there is. I have no clue why. As Christian said, Python does release memory. There was a problem before Python 2.5 as I understand, but the memory manager was patched (see http://evanjones.ca/python-memory-part3.html) and now I personally don't use Python <2.5 for that reason. The new manager helped, but still I face that problem, so I wrote the recipe. --- Andrew wrote: [...]
It's hard to say without knowing what your code does. A first guess is that you're allocating lots of memory without allowing it to be freed. Specifically, you may have references to objects which you no longer need, and you should eliminate those references and allow them to be garbage collected. In some cases, circular references can be hard for python to detect, so you might want to play around with the gc module and judicious use of the del statement. Note also that IPython keeps references to past results by default (the history).
Sound advice, specially the part about iPython which is often overlooked. I would have to say I have tried to play a lot with the gc module, calling gc.collect / enable / disable / playing with thresholds. In practice it helps a little but not much. In my experience, it is more likely in numpy code using only arrays of numbers to have references/views to arrays that you do not need than to have circular references. I haven't looked at the internals of gc, obmalloc or any other Python code. What happens to me is usually the machine starts to use virtual memory, slowing the whole computation a lot. I wonder if your algorithm that needs allocation of huge memory to cause a MemoryError can be modified to avoid that. I have found that to be the case is some situations. As an example, for PCA you might find depending on you matrix size the use of the transpose or other algorithms more suitable -- I ended up using http://folk.uio.no/henninri/pca_module. While I am of course partial to the fate of the cookbook recipe, I also feel that it doesn't directly belong in numpy -- it should be useful for other Pythonistas. May be in numpy, python proper somewhere, or one of the parallel processing libraries. I agree that a wiki page will be more beneficial -- though not sure what else should be there. Regards, Muhammad Alkarouri __________________________________________________________ Sent from Yahoo! Mail. A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html
participants (4)
-
Andrew Straw
-
Christian Heimes
-
Muhammad Alkarouri
-
Robin