Hi Everybody: I was having a massive memory leak problem in some simulations I coded that I traced down to the setmember1d method. After running for about an hour, my program, which should have been taking up .6% of the RAM was taking up 100% of my 4GB of RAM and would eventually get killed by the system. After replacing that method with the following inefficient code, the memory problems disappeared and I can run for days without my memory use increasing beyond .6%: def ismember(a,b): ainb = zeros(len(a),dtype=bool) for item in b: ainb = ainb | (a==item) return ainb Here's the information about my setup: % uname -a Linux 2.4.21-27.0.4.ELsmp #1 SMP Sat Apr 16 18:53:14 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux % python Python 2.4.4 (#1, Jan 21 2007, 12:09:48) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-49)] on linux2 Type "help", "copyright", "credits" or "license" for more information. %>>> import numpy %>>> numpy.version.version '1.0.1' %>>> It is an AMD opteron machine running RedHat Enterprise with an older version of GCC. The same error occurred with an older version of python (the April 2006 release with version 2.4.3). Does anyone have any idea of what might be occurring in setmember1d in combination with this setup that would cause such a massive memory leak? Thanks, Per
Per B. Sederberg wrote:
Does anyone have any idea of what might be occurring in setmember1d in combination with this setup that would cause such a massive memory leak?
Can you check out numpy from SVN and see if you can reproduce the leak? I do not see a leak with a recent checkout on OS X with the following code: In [15]: from numpy import * In [16]: ar1 = arange(1000000) In [17]: ar2 = arange(3, 7) In [18]: import itertools In [19]: for i in itertools.count(1): ....: if not i % 1000: ....: print i ....: x = setmember1d(ar1, ar2) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern <robert.kern <at> gmail.com> writes:
Per B. Sederberg wrote:
Does anyone have any idea of what might be occurring in setmember1d in combination with this setup that would cause such a massive memory leak?
Can you check out numpy from SVN and see if you can reproduce the leak? I do not see a leak with a recent checkout on OS X with the following code:
In [15]: from numpy import *
In [16]: ar1 = arange(1000000)
In [17]: ar2 = arange(3, 7)
In [18]: import itertools
In [19]: for i in itertools.count(1): ....: if not i % 1000: ....: print i ....: x = setmember1d(ar1, ar2)
I tried it out and your test code does NOT break with the svn version of numpy, but memory use explodes with the current stable release. We're talking using up 2GB in around 20 seconds. So the next question is what changed to fix that bug? It looks like the only difference in the code is that the call to argsort now has (kind='mergesort'). Also, why does setmember1d require unique elements? It seems as though that really narrows its uses. Thanks, Per
Per B. Sederberg wrote:
So the next question is what changed to fix that bug? It looks like the only difference in the code is that the call to argsort now has (kind='mergesort').
Possibly the default sort code had a bug that was fixed, too, or something deep in the array object. I've rerun the above test after removing the kind='mergesort', and I still see no leak.
Also, why does setmember1d require unique elements? It seems as though that really narrows its uses.
The algorithm it uses requires it. That's how it can be fast and not written with for loops. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (2)
-
Per B. Sederberg -
Robert Kern