Strange memory consumption in numpy?
Hi all, In the context of memory profiling an application (with memory_profiler module) we came up a strange behaviour in numpy, see for yourselves: Line # Mem usage Increment Line Contents ================================================ 29 @profile 30 23.832 MB 0.000 MB def main(): 31 46.730 MB 22.898 MB arr1 = np.random.rand(1000000, 3) 32 58.180 MB 11.449 MB arr1s = arr1.astype(np.float32) 33 35.289 MB -22.891 MB del arr1 34 35.289 MB 0.000 MB gc.collect() 35 58.059 MB 22.770 MB arr2 = np.random.rand(1000000, 3) 36 69.500 MB 11.441 MB arr2s = arr2.astype(np.float32) 37 69.500 MB 0.000 MB del arr2 38 69.500 MB 0.000 MB gc.collect() 39 69.500 MB 0.000 MB arr3 = np.random.rand(1000000, 3) 40 80.945 MB 11.445 MB arr3s = arr3.astype(np.float32) 41 80.945 MB 0.000 MB del arr3 42 80.945 MB 0.000 MB gc.collect() 43 80.945 MB 0.000 MB return arr1s, arr2s, arr3s The lines 31-34 are behaving as expected, but then we don't understand 35-38 (why is arr2 not garbage collected ?) and 39-42 (why doesn't the random allocate any memory ?). Can anyone give a reasonable explanation ? I attach the full script for reference. Best regards, Martin
On Thu, May 16, 2013 at 8:35 AM, Martin Raspaud <martin.raspaud@smhi.se> wrote:
Hi all,
In the context of memory profiling an application (with memory_profiler module) we came up a strange behaviour in numpy, see for yourselves:
Line # Mem usage Increment Line Contents ================================================ 29 @profile 30 23.832 MB 0.000 MB def main(): 31 46.730 MB 22.898 MB arr1 = np.random.rand(1000000, 3) 32 58.180 MB 11.449 MB arr1s = arr1.astype(np.float32) 33 35.289 MB -22.891 MB del arr1 34 35.289 MB 0.000 MB gc.collect() 35 58.059 MB 22.770 MB arr2 = np.random.rand(1000000, 3) 36 69.500 MB 11.441 MB arr2s = arr2.astype(np.float32) 37 69.500 MB 0.000 MB del arr2 38 69.500 MB 0.000 MB gc.collect() 39 69.500 MB 0.000 MB arr3 = np.random.rand(1000000, 3) 40 80.945 MB 11.445 MB arr3s = arr3.astype(np.float32) 41 80.945 MB 0.000 MB del arr3 42 80.945 MB 0.000 MB gc.collect() 43 80.945 MB 0.000 MB return arr1s, arr2s, arr3s
The lines 31-34 are behaving as expected, but then we don't understand 35-38 (why is arr2 not garbage collected ?) and 39-42 (why doesn't the random allocate any memory ?).
Can anyone give a reasonable explanation ?
memory_profiler only looks at the amount of memory that the OS has allocated to the Python process. It cannot measure the amount of memory actually given to living objects. Python does not always return memory back to the OS immediately when it frees the memory for an object. Your two observations are linked. Python freed the memory of arr2 immediately, but it did not return the memory to the OS, so memory_profiler could not notice it. When arr3 is allocated, it happened to fit into the block of memory that arr2 once owned, so Python's memory allocator just used that block again. Since Python did not have to go out to the OS to get more memory, memory_profiler could not notice that, either. -- Robert Kern
On 16/05/13 10:26, Robert Kern wrote:
Can anyone give a reasonable explanation ?
memory_profiler only looks at the amount of memory that the OS has allocated to the Python process. It cannot measure the amount of memory actually given to living objects. Python does not always return memory back to the OS immediately when it frees the memory for an object. Your two observations are linked. Python freed the memory of arr2 immediately, but it did not return the memory to the OS, so memory_profiler could not notice it. When arr3 is allocated, it happened to fit into the block of memory that arr2 once owned, so Python's memory allocator just used that block again. Since Python did not have to go out to the OS to get more memory, memory_profiler could not notice that, either.
Robert, Thanks a lot for the clear explanation, it makes perfect sense now. You're talking about living objects, but as I understand the few memory profilers I found around the web for python can't track numpy arrays. Any pointers on something that would work with numpy ? Best regards, Martin
On Thu, May 16, 2013 at 1:32 PM, Martin Raspaud <martin.raspaud@smhi.se> wrote:
On 16/05/13 10:26, Robert Kern wrote:
Can anyone give a reasonable explanation ?
memory_profiler only looks at the amount of memory that the OS has allocated to the Python process. It cannot measure the amount of memory actually given to living objects. Python does not always return memory back to the OS immediately when it frees the memory for an object. Your two observations are linked. Python freed the memory of arr2 immediately, but it did not return the memory to the OS, so memory_profiler could not notice it. When arr3 is allocated, it happened to fit into the block of memory that arr2 once owned, so Python's memory allocator just used that block again. Since Python did not have to go out to the OS to get more memory, memory_profiler could not notice that, either.
Robert,
Thanks a lot for the clear explanation, it makes perfect sense now.
You're talking about living objects, but as I understand the few memory profilers I found around the web for python can't track numpy arrays. Any pointers on something that would work with numpy ?
meliae has special support for numpy.ndarray objects. It's a little broken, in that it will double-count views, but you can provide a better specialization if you wish (look for the add_special_size() function). https://launchpad.net/meliae -- Robert Kern
participants (2)
-
Martin Raspaud
-
Robert Kern