[Numpy-discussion] memory usage (Emil Sidky)

emil sidky at uchicago.edu
Wed Oct 15 14:31:04 EDT 2008


> Huang-Wen Chen wrote:
>> Robert Kern wrote:
>>>> from numpy import *
>>>> for i in range(1000):
>>>>   a = random.randn(512**2)
>>>>   b = a.argsort(kind='quick')
>>> Can you try upgrading to numpy 1.2.0? On my machine with numpy 1.2.0
>>> on OS X, the memory usage is stable.
>>>   
>> I tried the code fragment on two platforms and the memory usage is also 
>> normal.
>>
>> 1. numpy 1.1.1, python 2.5.1 on Vista 32bit
>> 2. numpy 1.2.0, python 2.6 on RedHat 64bit
> 
> If I recall correctly, there were some major improvements in python's 
> memory management/garbage collection from version 2.4 to 2.5. If you 
> could try to upgrade your python to 2.5 (and possibly also your numpy to 
> 1.2.0), you'd probably see some better behaviour.
> 
> Regards,
> Vincent.
> 

Problem fixed. Thanks.

But it turns out there were two things going on:
(1) Upgrading to numpy 1.2 (even with python 2.4) fixed the memory usage
for the loop with argsort in it.
(2) Unfortunately, when I went back to my original program and ran it
with the upgraded numpy, it still was chewing up tons of memory. I
finally found the problem:
Consider the following two code snippets (extension of my previous example).
from numpy import *
d = []
for i in range(1000):
   a = random.randn(512**2)
   b = a.argsort(kind= 'quick')
   c = b[-100:]
   d.append(c)

and

from numpy import *
d = []
for i in range(1000):
   a = random.randn(512**2)
   b = a.argsort(kind= 'quick')
   c = b[-100:].copy()
   d.append(c)

The difference being that c is a reference to the last 100 elements of b
in the first example, while c is a copy of the last 100 in the second
example.
Both examples yield identical results (provide randn is run with the
same seed value). But the former chews up tons of memory, and the latter
doesn't.
I don't know if this explanation makes any sense, but it is as if python
has to keep all the generated b's around in the first example because c
is only a reference.

Anyway, bottom line is that my problem is solved.
Thanks,
Emil



More information about the NumPy-Discussion mailing list