I'm trying to understand how numpy decides when to release memory and whether it's possible to exert any control over that. The situation is that I'm profiling memory usage on a system in which a great deal of the overall memory is tied up in ndarrays. Since numpy manages ndarray memory on its own (i.e. without the python gc, or so it seems), I'm finding that I can't do much to convince numpy to release memory when things get tight. For python object, for example, I can explicitly run gc.collect(). So, in an effort to at least understand the system better, can anyone tell me how/when numpy decides to release memory? And is there any way via either the Python or C-API to explicitly request release? Thanks. Austin
On Tue, Nov 13, 2012 at 8:26 AM, Austin Bingham <austin.bingham@gmail.com> wrote:
I'm trying to understand how numpy decides when to release memory and whether it's possible to exert any control over that. The situation is that I'm profiling memory usage on a system in which a great deal of the overall memory is tied up in ndarrays. Since numpy manages ndarray memory on its own (i.e. without the python gc, or so it seems), I'm finding that I can't do much to convince numpy to release memory when things get tight. For python object, for example, I can explicitly run gc.collect().
So, in an effort to at least understand the system better, can anyone tell me how/when numpy decides to release memory? And is there any way via either the Python or C-API to explicitly request release? Thanks.
Numpy array memory is released when the corresponding Python objects are deleted, so it exactly follows Python's rules. You can't explicitly request release, because by definition, if memory is not released, then it means that it's still accessible somehow, so releasing it could create segfaults. Perhaps you have stray references sitting around that you have forgotten to clear -- that's a common cause of memory leaks in Python. gc.get_referrers() can be useful to debug such things. Some things to note: - Numpy uses malloc() instead of going through the Python low-level memory allocation layer (which itself is a wrapper around malloc with various optimizations for small objects). This is really only relevant because it might create some artifacts depending on how your memory profiler gathers data. - gc.collect() doesn't do that much in Python... it only matters if you have circular references. Mostly Python releases the memory associated with objects as soon as the object becomes unreferenced. You could try avoiding circular references, and then gc.collect() won't even do anything. - If you have multiple views of the same memory in numpy, then they share the same underlying memory, so that memory won't be released until all of the views objects are released. (The one thing to watch out for is you can do something like 'huge_array = np.zeros((2, 10000000)); tiny_array = a[:, 100]' and now since tiny_array is a view onto huge_array, so long as a reference to tiny_array exists the full big memory allocation will remain.) -n
OK, if numpy is just subject to Python's behavior then what I'm seeing must be due to the vagaries of Python. I've noticed that things like removing a particular line of code or reordering seemingly unrelated calls (unrelated to the memory issue, that is) can affect when memory is reported as free. I'll just assume that everything is in order and carry on. Thanks! Austin On Tue, Nov 13, 2012 at 9:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
I'm trying to understand how numpy decides when to release memory and whether it's possible to exert any control over that. The situation is
I'm profiling memory usage on a system in which a great deal of the overall memory is tied up in ndarrays. Since numpy manages ndarray memory on its own (i.e. without the python gc, or so it seems), I'm finding that I can't do much to convince numpy to release memory when things get tight. For
On Tue, Nov 13, 2012 at 8:26 AM, Austin Bingham <austin.bingham@gmail.com> wrote: that python
object, for example, I can explicitly run gc.collect().
So, in an effort to at least understand the system better, can anyone tell me how/when numpy decides to release memory? And is there any way via either the Python or C-API to explicitly request release? Thanks.
Numpy array memory is released when the corresponding Python objects are deleted, so it exactly follows Python's rules. You can't explicitly request release, because by definition, if memory is not released, then it means that it's still accessible somehow, so releasing it could create segfaults. Perhaps you have stray references sitting around that you have forgotten to clear -- that's a common cause of memory leaks in Python. gc.get_referrers() can be useful to debug such things.
Some things to note: - Numpy uses malloc() instead of going through the Python low-level memory allocation layer (which itself is a wrapper around malloc with various optimizations for small objects). This is really only relevant because it might create some artifacts depending on how your memory profiler gathers data. - gc.collect() doesn't do that much in Python... it only matters if you have circular references. Mostly Python releases the memory associated with objects as soon as the object becomes unreferenced. You could try avoiding circular references, and then gc.collect() won't even do anything. - If you have multiple views of the same memory in numpy, then they share the same underlying memory, so that memory won't be released until all of the views objects are released. (The one thing to watch out for is you can do something like 'huge_array = np.zeros((2, 10000000)); tiny_array = a[:, 100]' and now since tiny_array is a view onto huge_array, so long as a reference to tiny_array exists the full big memory allocation will remain.)
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 11/13/12 10:27 AM, Austin Bingham wrote:
OK, if numpy is just subject to Python's behavior then what I'm seeing must be due to the vagaries of Python. I've noticed that things like removing a particular line of code or reordering seemingly unrelated calls (unrelated to the memory issue, that is) can affect when memory is reported as free. I'll just assume that everything is in order and carry on. Thanks!
Profiling memory can be tricky because the operating system may not return memory *immediately* as requested, and it might mislead you in some situations. So do not trust too much in memory profilers to be too exact and rather focus on the big picture (i.e. my app is reclaiming a lot of memory for a large amount o time? if yes, then start worrying, but not before). -- Francesc Alted
How are you monitoring memory usage? Personally I've been using psutil and it seems to work well, although I've used it only on Windows and not in applications with large numpy arrays, so I can't tell whether it would work you. Also, keep in mind that: - The "auto-delete object when it goes out of scope" behavior is specific to the CPython implementation and not part of the Python standard, so if you're actually using a different implementation you may see a different behavior. - CPython deals with small objects in a special way, not actually releasing allocated memory. For more info: http://deeplearning.net/software/theano/tutorial/python-memory-management.ht... -=- Olivier 2012/11/13 Austin Bingham <austin.bingham@gmail.com>
OK, if numpy is just subject to Python's behavior then what I'm seeing must be due to the vagaries of Python. I've noticed that things like removing a particular line of code or reordering seemingly unrelated calls (unrelated to the memory issue, that is) can affect when memory is reported as free. I'll just assume that everything is in order and carry on. Thanks!
Austin
On Tue, Nov 13, 2012 at 9:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
I'm trying to understand how numpy decides when to release memory and whether it's possible to exert any control over that. The situation is
I'm profiling memory usage on a system in which a great deal of the overall memory is tied up in ndarrays. Since numpy manages ndarray memory on its own (i.e. without the python gc, or so it seems), I'm finding that I can't do much to convince numpy to release memory when things get tight. For
On Tue, Nov 13, 2012 at 8:26 AM, Austin Bingham <austin.bingham@gmail.com> wrote: that python
object, for example, I can explicitly run gc.collect().
So, in an effort to at least understand the system better, can anyone tell me how/when numpy decides to release memory? And is there any way via either the Python or C-API to explicitly request release? Thanks.
Numpy array memory is released when the corresponding Python objects are deleted, so it exactly follows Python's rules. You can't explicitly request release, because by definition, if memory is not released, then it means that it's still accessible somehow, so releasing it could create segfaults. Perhaps you have stray references sitting around that you have forgotten to clear -- that's a common cause of memory leaks in Python. gc.get_referrers() can be useful to debug such things.
Some things to note: - Numpy uses malloc() instead of going through the Python low-level memory allocation layer (which itself is a wrapper around malloc with various optimizations for small objects). This is really only relevant because it might create some artifacts depending on how your memory profiler gathers data. - gc.collect() doesn't do that much in Python... it only matters if you have circular references. Mostly Python releases the memory associated with objects as soon as the object becomes unreferenced. You could try avoiding circular references, and then gc.collect() won't even do anything. - If you have multiple views of the same memory in numpy, then they share the same underlying memory, so that memory won't be released until all of the views objects are released. (The one thing to watch out for is you can do something like 'huge_array = np.zeros((2, 10000000)); tiny_array = a[:, 100]' and now since tiny_array is a view onto huge_array, so long as a reference to tiny_array exists the full big memory allocation will remain.)
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I've been using psutil, pmap (linux command), and resource in various capacities, all on cpython. When I wasn't seeing memory freed when I expected, I got to wondering if maybe numpy was maintaining pools of buffers for reuse or something like that. It sounds like that's not the case, though, so I'm following up other possibilities. Austin On Tue, Nov 13, 2012 at 1:08 PM, Olivier Delalleau <shish@keba.be> wrote:
How are you monitoring memory usage? Personally I've been using psutil and it seems to work well, although I've used it only on Windows and not in applications with large numpy arrays, so I can't tell whether it would work you.
Also, keep in mind that: - The "auto-delete object when it goes out of scope" behavior is specific to the CPython implementation and not part of the Python standard, so if you're actually using a different implementation you may see a different behavior. - CPython deals with small objects in a special way, not actually releasing allocated memory. For more info: http://deeplearning.net/software/theano/tutorial/python-memory-management.ht...
-=- Olivier
2012/11/13 Austin Bingham <austin.bingham@gmail.com>
OK, if numpy is just subject to Python's behavior then what I'm seeing must be due to the vagaries of Python. I've noticed that things like removing a particular line of code or reordering seemingly unrelated calls (unrelated to the memory issue, that is) can affect when memory is reported as free. I'll just assume that everything is in order and carry on. Thanks!
Austin
On Tue, Nov 13, 2012 at 9:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
I'm trying to understand how numpy decides when to release memory and whether it's possible to exert any control over that. The situation is
I'm profiling memory usage on a system in which a great deal of the overall memory is tied up in ndarrays. Since numpy manages ndarray memory on its own (i.e. without the python gc, or so it seems), I'm finding that I can't do much to convince numpy to release memory when things get tight. For
On Tue, Nov 13, 2012 at 8:26 AM, Austin Bingham <austin.bingham@gmail.com> wrote: that python
object, for example, I can explicitly run gc.collect().
So, in an effort to at least understand the system better, can anyone tell me how/when numpy decides to release memory? And is there any way via either the Python or C-API to explicitly request release? Thanks.
Numpy array memory is released when the corresponding Python objects are deleted, so it exactly follows Python's rules. You can't explicitly request release, because by definition, if memory is not released, then it means that it's still accessible somehow, so releasing it could create segfaults. Perhaps you have stray references sitting around that you have forgotten to clear -- that's a common cause of memory leaks in Python. gc.get_referrers() can be useful to debug such things.
Some things to note: - Numpy uses malloc() instead of going through the Python low-level memory allocation layer (which itself is a wrapper around malloc with various optimizations for small objects). This is really only relevant because it might create some artifacts depending on how your memory profiler gathers data. - gc.collect() doesn't do that much in Python... it only matters if you have circular references. Mostly Python releases the memory associated with objects as soon as the object becomes unreferenced. You could try avoiding circular references, and then gc.collect() won't even do anything. - If you have multiple views of the same memory in numpy, then they share the same underlying memory, so that memory won't be released until all of the views objects are released. (The one thing to watch out for is you can do something like 'huge_array = np.zeros((2, 10000000)); tiny_array = a[:, 100]' and now since tiny_array is a view onto huge_array, so long as a reference to tiny_array exists the full big memory allocation will remain.)
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Nov 13, 2012 at 1:31 PM, Austin Bingham <austin.bingham@gmail.com> wrote:
I've been using psutil, pmap (linux command), and resource in various capacities, all on cpython. When I wasn't seeing memory freed when I expected, I got to wondering if maybe numpy was maintaining pools of buffers for reuse or something like that. It sounds like that's not the case, though, so I'm following up other possibilities.
Those tools show how much memory the OS has allocated to the process. In general, processes can request memory from the OS, but *they cannot give it back*. At the C level, if you call free(), then what actually happens is that the memory management library in your process makes a note for itself that that memory is not used, and may return it from a future malloc(), but from the OS's point of view it is still "allocated". (And python uses another similar system on top for malloc()/free(), but this doesn't really change anything.) So the OS memory usage you see is generally a "high water mark", the maximum amount of memory that your process ever needed. The exception is that for large single allocations (e.g. if you create a multi-megabyte array), a different mechanism is used. Such large memory allocations *can* be released back to the OS. So it might specifically be the non-numpy parts of your program that are producing the issues you see. -n
On Tue, Nov 13, 2012 at 2:27 AM, Austin Bingham <austin.bingham@gmail.com>wrote:
OK, if numpy is just subject to Python's behavior then what I'm seeing must be due to the vagaries of Python. I've noticed that things like removing a particular line of code or reordering seemingly unrelated calls (unrelated to the memory issue, that is) can affect when memory is reported as free. I'll just assume that everything is in order and carry on. Thanks!
If you are running interactively in IPython, references will be kept to return values. That can eventually eat up memory if you are working with a lot of big arrays. <snip> Chuck
participants (5)
-
Austin Bingham
-
Charles R Harris
-
Francesc Alted
-
Nathaniel Smith
-
Olivier Delalleau