Mailman 3 efficient use of numpy.where() and .any() - NumPy-Discussion

efficient use of numpy.where() and .any()

older
Problems building numpy and scipy...

Mark.Miller

23 Apr 2007 23 Apr '07

2:37 p.m.

Greetings: In some of my code, I need to use large matrix of random numbers that meet specific criteria (i.e., some random numbers need to be removed and replaces with new ones). I have been working with .any() and .where() to facilitate this process. In the code below, .any() is used in a while loop to test for the presence of matrix elements that do not meet criteria. After that, .where() is used to obtain the tuple of indices of elements that do not meet criteria. I then use iteration over the tuple of indices and replace each 'bad' random number one at a time. Here's an example:

...

...
...
a=numpy.random.normal(0,1,(3,3)) a array([[ 0.79653228, -2.28751484, 1.15989261], [-0.7549573 , 2.35133816, 0.22551564], [ 0.85666713, 1.17484243, 1.18306248]]) while (a<0).any() or (a>1).any(): ind=numpy.where(a<0) for aa in xrange(len(ind[0])): a[ind[0][aa],ind[1][aa]]=numpy.random.normal(0,1) ind=numpy.where(a>1) for aa in xrange(len(ind[0])): a[ind[0][aa],ind[1][aa]]=numpy.random.normal(0,1)

...

...
...
a array([[ 0.79653228, 0.99298488, 0.24903299], [ 0.10884186, 0.10139654, 0.22551564], [ 0.85666713, 0.76554597, 0.38383126]])

My main question: is there a more efficient approach that I could be using? I would ideally like to be able to remove the two for loops. An ancillary question: I don't quite see why the test in the while loop above works fine, however, the following formation does not.

...

...
...
while (0<a<1).any(): ind=numpy.where(a<0) for aa in xrange(len(ind[0])): a[ind[0][aa],ind[1][aa]]=numpy.random.normal(0,1) ind=numpy.where(a>1) for aa in xrange(len(ind[0])): a[ind[0][aa],ind[1][aa]]=numpy.random.normal(0,1) print type(ind)

Traceback (most recent call last): File "<pyshell#71>", line 1, in <module> while (0<a<1).any(): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

...

...
...

Can someone clarify this? Thanks, -Mark

Show replies by date

Pierre GM

23 Apr 23 Apr

3:49 p.m.

On Monday 23 April 2007 10:37:57 Mark.Miller wrote:

...

Greetings:

In some of my code, I need to use large matrix of random numbers that meet specific criteria (i.e., some random numbers need to be removed and replaces with new ones).

I have been working with .any() and .where() to facilitate this process.

Have you tried nonzero() ? a[a<0] = numpy.random.normal(0,1) will put a random number from the normal distribution where your initial a is negative. No Python loops needed, no Python temps.

...

Traceback (most recent call last): File "<pyshell#71>", line 1, in <module> while (0<a<1).any():

The double condition (0<a<1) is not legit. You should try logical.and(a>0,a<1) or (a>0) & (a<1) Note the () around each condition in case #2.

Mark.Miller

4:16 p.m.

Excellent suggestions...just a few comments: Pierre GM wrote:

...

On Monday 23 April 2007 10:37:57 Mark.Miller wrote:

...
Greetings:

In some of my code, I need to use large matrix of random numbers that meet specific criteria (i.e., some random numbers need to be removed and replaces with new ones).

I have been working with .any() and .where() to facilitate this process.

Have you tried nonzero() ?

Nonzero isn't quite what I'm after, as the tests are more complicated than what I illustrated in my example.

...

a[a<0] = numpy.random.normal(0,1)

This is a neat construct that I didn't realize was possible. However, it has the undesirable (in my case) effect of placing a single new random number in each locations where a<0. While this could work, I ideally need a different random number chosen for each replaced value. Does that make sense?

...

will put a random number from the normal distribution where your initial a is negative. No Python loops needed, no Python temps.

...
Traceback (most recent call last): File "<pyshell#71>", line 1, in <module> while (0<a<1).any():

The double condition (0<a<1) is not legit. You should try logical.and(a>0,a<1) or (a>0) & (a<1)

Note the () around each condition in case #2.

This makes perfect sense. Thanks for pointing it out to me. It should easily do the trick. Any and all additional suggestions are greatly appreciated, -Mark

...

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

4:23 p.m.

Mark.Miller wrote:

...

Pierre GM wrote:

...

...
a[a<0] = numpy.random.normal(0,1)

This is a neat construct that I didn't realize was possible. However, it has the undesirable (in my case) effect of placing a single new random number in each locations where a<0. While this could work, I ideally need a different random number chosen for each replaced value. Does that make sense?

Certainly. How about this? mask = (a<0) a[mask] = numpy.random.normal(0, 1, size=mask.sum()) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Mark.Miller

4:40 p.m.

Robert Kern wrote:

...

Certainly. How about this?

mask = (a<0) a[mask] = numpy.random.normal(0, 1, size=mask.sum())

That's slick. I believe it's precisely what I'm after. Appreciate it, -Mark

Pierre GM

4:32 p.m.

...

...
Have you tried nonzero() ?

Nonzero isn't quite what I'm after, as the tests are more complicated than what I illustrated in my example.

Tests such as (a<0)&(b>1) will give you arrays of booleans. The nonzero give you where the two conditions are met (viz, where the results is True, or 1)

...

...
a[a<0] = numpy.random.normal(0,1)

This is a neat construct that I didn't realize was possible. However, it has the undesirable (in my case) effect of placing a single new random number in each locations where a<0. While this could work, I ideally need a different random number chosen for each replaced value. Does that make sense?

Completely. So what about the slightly more complicated: test = a<0 if test .any() a[test] = numpy.random.normal(0,1,size=len(test.nonzero()[0])) test.nonzero() outputs a tuple of indices. test.nonzero()[0] gives you the indices along the first axis, len(test.nonzero()[0]) the number of elements where the condition is met. You could also get the same result with something like test = a<0 if test .any() a[test] = numpy.random.normal(0,1,size=test.sum()) Actually, the following simpler solution works as well: a[a<0] = numpy.random.normal(0,1,size=a.size)

Anne Archibald

4:25 p.m.

On 23/04/07, Pierre GM <pgmdevlist@gmail.com> wrote:

...

Have you tried nonzero() ?

a[a<0] = numpy.random.normal(0,1)

will put a random number from the normal distribution where your initial a is negative. No Python loops needed, no Python temps.

When you say "no python temps" I guess you mean, no temporary *variables*? If I understand correctly, this allocates a temporary boolean array to hold the result of "a<0".

...

The double condition (0<a<1) is not legit. You should try logical.and(a>0,a<1) or (a>0) & (a<1)

Note the () around each condition in case #2.

This is an unfortunate limitation that comes from the fact that we can't override the behaviour of python's logical operations. a<b<c does the right thing for python scalars, but it does it by being expanded to (approximately) "a<b and b<c", and "and" doesn't do the right thing for arrays. The best we can do is override the bitwise operators for boolean arrays. This is a shame as I often want to select array elements that fall into a given range, and creating three temporary arrays instead of one is unpleasant. Anne

Pierre GM

4:36 p.m.

...

When you say "no python temps" I guess you mean, no temporary *variables*? If I understand correctly, this allocates a temporary boolean array to hold the result of "a<0".

Indeed, hence my precising "no *python* temps". There still be a tmp created at one point or another (if I'm not mistaken)

...

This is an unfortunate limitation that comes from the fact that we can't override the behaviour of python's logical operations. a<b<c does the right thing for python scalars, but it does it by being expanded to (approximately) "a<b and b<c", and "and" doesn't do the right thing for arrays.

Note that in addition of the bitwise operators, you can use the "logical_" functions. OK, you'll still end up w/ temporaries, but I wonder whether there couldn't be some tricks to bypass that...

Anne Archibald

4:47 p.m.

On 23/04/07, Pierre GM <pgmdevlist@gmail.com> wrote:

...

Note that in addition of the bitwise operators, you can use the "logical_" functions. OK, you'll still end up w/ temporaries, but I wonder whether there couldn't be some tricks to bypass that...

If you're really determined not to make many temps, you can use their output arguments and do it all in-place on the first temporary. The few times I've rewritten code that way it hasn't made an appreciable positive difference in the speed, and it was sometimes significantly slower, perhaps because of the increased memory footprint (the temps were longer-lived than when I did it by hand). malloc() is really really cheap, and garbage collection is also extremely cheap for huge arrays. Anne

Francesc Altet

5:57 p.m.

El dl 23 de 04 del 2007 a les 12:47 -0400, en/na Anne Archibald va escriure:

...

On 23/04/07, Pierre GM <pgmdevlist@gmail.com> wrote:

...
Note that in addition of the bitwise operators, you can use the "logical_" functions. OK, you'll still end up w/ temporaries, but I wonder whether there couldn't be some tricks to bypass that...

If you're really determined not to make many temps, you can use their output arguments and do it all in-place on the first temporary. The few times I've rewritten code that way it hasn't made an appreciable positive difference in the speed, and it was sometimes significantly slower, perhaps because of the increased memory footprint (the temps were longer-lived than when I did it by hand). malloc() is really really cheap, and garbage collection is also extremely cheap for huge arrays.

Or you can use numexpr. Numexpr doesn't need temporaries (provided that all the arrays in the expresion are contiguous). It only uses a small buffer for carrying out computations whose size is, when compared with large matrices, negligible. It is also usally faster than the pure NumPy approach: # Pure NumPy

...

...
...
Timer("c=(i>2) | ((f**2>3) & ~(i*f<2))", "import numpy; i=numpy.arange(10**6); f=numpy.random.rand(10**6)").repeat(3, 10) [0.55586099624633789, 0.55565214157104492, 0.556427001953125]

...

...
...
Timer("c=numexpr.evaluate('(i>2) | ((f**2>3) & ~(i*f<2))')", "import

# Using Numexpr tables; import tables.numexpr as numexpr; import numpy; i=numpy.arange(10**6); f=numpy.random.rand(10**6)").repeat(3, 10) [0.25569510459899902, 0.25547599792480469, 0.25554895401000977] I've used here the numexpr instance that comes with PyTables, but you can use also the one in the scipy's sandbox as it also supports booleans since some months ago. Cheers, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth

Pierre GM

3:53 p.m.

Oh, I pressed "send" too early. Just an addition: numpy.where creates a new array from some condition. If you only want to change elements of an existing array that satisfies a given condition, indexing is far more efficient: no temporary is created. Hence the suggestion of a[a<0]

6409

Age (days ago)

6409

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Anne Archibald
Francesc Altet
Mark.Miller
Pierre GM
Robert Kern

efficient use of numpy.where() and .any()

Mark.Miller

Pierre GM

Mark.Miller

Robert Kern

Mark.Miller

Pierre GM

Anne Archibald

Pierre GM

Anne Archibald

Francesc Altet

Pierre GM

tags

participants (5)