Mailman 3 numarray.where confusion - NumPy-Discussion

numarray.where confusion

Alok Singhal

May 26, 2004

7:49 a.m.

Hi, I am having trouble understanding how exactly "where" works in numarray. What I am trying to do: I am preparing a two-level mask in an array and then assign values to the array where both masks are true:

...

...
...
from numarray import * a = arange(10) # First mask m1 = where(a > 5) a[m1] array([6, 7, 8, 9]) # Second mask m2 = where(a[m1] < 8) a[m1][m2] array([6, 7]) # So far so good # Now change some values a[m1][m2] = array([10, 20]) a[m1][m2] array([6, 7]) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Didn't work # Let's try a temporary variable t = a[m1] t[m2] array([6, 7]) t[m2] = array([10, 20]) t[m2], t (array([10, 20]), array([10, 20, 8, 9])) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

So, my assignment to a[m1][m2] seems to work (no messages), but it doesn't produce the effect I want it to. I have read the documentation but I couldn't find something that would explain this behavior. So my questions: - did I miss something important in the documentation, - I am expecting something I shouldn't, or - there is a bug in numarray? Thanks, Alok -- Alok Singhal (as8ca@virginia.edu) __ Graduate Student, dept. of Astronomy / _ University of Virginia \_O \ http://www.astro.virginia.edu/~as8ca/ __/

Show replies by date

Perry Greenfield

May 2004

8:25 a.m.

Alok Singhal wrote:

...

Hi,

I am having trouble understanding how exactly "where" works in numarray.

What I am trying to do:

I am preparing a two-level mask in an array and then assign values to the array where both masks are true:

...
...
...
from numarray import * a = arange(10) # First mask m1 = where(a > 5) a[m1] array([6, 7, 8, 9]) # Second mask m2 = where(a[m1] < 8) a[m1][m2] array([6, 7]) # So far so good # Now change some values a[m1][m2] = array([10, 20]) a[m1][m2] array([6, 7]) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Didn't work # Let's try a temporary variable t = a[m1] t[m2] array([6, 7]) t[m2] = array([10, 20]) t[m2], t (array([10, 20]), array([10, 20, 8, 9])) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

So, my assignment to a[m1][m2] seems to work (no messages), but it doesn't produce the effect I want it to.

I have read the documentation but I couldn't find something that would explain this behavior.

So my questions:

- did I miss something important in the documentation, - I am expecting something I shouldn't, or - there is a bug in numarray?

(due to confusions with "a" in text I'll use x in place of "a") I believe the problem you are seeing (I'm not 100% certain yet) is that although it is possible to assign to an array-indexed array, that doing that twice over doesn't work since Python is, in effect, treating x[m1] as an expression even though it is on the left side. That expression results in a new array that the second indexing updates, but then is thrown away since it is not assigned to anything else. Your second try creates a temporary t which is also not a view into a so when you update t, a is not updated. try x[m1[0][m2]] = array([10,20]) instead. The intent here is to provide x with the net index array by indexing m1 first rather than indexing x first. (note the odd use of m1[0]; this is necessary since where() will return a tuple of index arrays (to allow use in multidimensional cases as indices, so the m1[0] extracts the array from the tuple; Since m1 is a tuple, indexing it with another index array (well, tuple containing an index array) doesn't work). Perry Greenfield

Todd Miller

8:42 a.m.

On Wed, 2004-05-26 at 10:48, Alok Singhal wrote:

...

Hi,

I am having trouble understanding how exactly "where" works in numarray.

What I am trying to do:

I am preparing a two-level mask in an array and then assign values to the array where both masks are true:

...
...
...
from numarray import * a = arange(10) # First mask m1 = where(a > 5) a[m1] array([6, 7, 8, 9]) # Second mask m2 = where(a[m1] < 8) a[m1][m2]

a[m1] is a new array here.

...

array([6, 7])

...
...
...
# So far so good # Now change some values a[m1][m2] = array([10, 20])

And here too. This does a write into what is effectively a temporary variable returned by the expression a[m1]. Although the write occurs, it is lost.

...

...
...
...
a[m1][m2] array([6, 7]) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here's how I did it (there was an easier way I overlooked): a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20]) The principle here is to keep the masks as "full sized" boolean arrays rather than index arrays so they can be combined using the bitwise and operator. The resulting mask can be used to index just once eliminating the temporary. Regards, Todd

Francesc Alted

9:07 a.m.

A Dimecres 26 Maig 2004 17:41, Todd Miller va escriure:

...

Here's how I did it (there was an easier way I overlooked):

a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])

Perhaps the easier way looks like this?

...

...
...
a = arange(10) a[(a>5) & (a<8)] = array([10, 20]) a array([ 0, 1, 2, 3, 4, 5, 10, 20, 8, 9])

Indexing is a very powerful (and fun) thing, indeed :) -- Francesc Alted

Todd Miller

9:29 a.m.

On Wed, 2004-05-26 at 12:06, Francesc Alted wrote:

...

A Dimecres 26 Maig 2004 17:41, Todd Miller va escriure:

...
Here's how I did it (there was an easier way I overlooked):

a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])

Perhaps the easier way looks like this?

...
...
...
a = arange(10) a[(a>5) & (a<8)] = array([10, 20])

Much, much better. Thanks! Todd

...

...
...
...
a array([ 0, 1, 2, 3, 4, 5, 10, 20, 8, 9])

Indexing is a very powerful (and fun) thing, indeed :) -- Todd Miller <jmiller@stsci.edu>

Andrew Straw

10:44 a.m.

Todd Miller wrote:

...

On Wed, 2004-05-26 at 12:06, Francesc Alted wrote:

...
A Dimecres 26 Maig 2004 17:41, Todd Miller va escriure:

...
Here's how I did it (there was an easier way I overlooked):

a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])

Perhaps the easier way looks like this?

...
...
...
a = arange(10) a[(a>5) & (a<8)] = array([10, 20])

Is there an equivalently slick way to accomplish to what I'm trying below? (the the values in c[:,1] get changed based on the same-row values in c[:,0]?)

from numarray import * a=arange(10) b=arange(10)+20 c=concatenate((a[:,NewAxis],b[:,NewAxis]),axis=1) c[c[:,0]>7][:,1] = 0 # doesn't work because it makes a copy and therefore doesn't modify c Cheers! Andrew

Alok Singhal

11:04 a.m.

On 26/05/04: 10:43, Andrew Straw wrote:

...

Todd Miller wrote:

...
On Wed, 2004-05-26 at 12:06, Francesc Alted wrote:

...
...
...
...
a = arange(10) a[(a>5) & (a<8)] = array([10, 20])

Is there an equivalently slick way to accomplish to what I'm trying below? (the the values in c[:,1] get changed based on the same-row values in c[:,0]?)

from numarray import * a=arange(10) b=arange(10)+20 c=concatenate((a[:,NewAxis],b[:,NewAxis]),axis=1) c[c[:,0]>7][:,1] = 0 # doesn't work because it makes a copy and therefore doesn't modify c

Well, for your case, the following works:

...

...
...
print c [[ 0 20] [ 1 21] [ 2 22] [ 3 23] [ 4 24] [ 5 25] [ 6 26] [ 7 27] [ 8 28] [ 9 29]] t0 = c[:, 0] t1 = c[:, 1] t1[t0 > 7] = 0 print c [[ 0 20] [ 1 21] [ 2 22] [ 3 23] [ 4 24] [ 5 25] [ 6 26] [ 7 27] [ 8 0] [ 9 0]]

Not sure this helps in your real code though. Alok -- Alok Singhal (as8ca@virginia.edu) * * Graduate Student, dept. of Astronomy * * * University of Virginia http://www.astro.virginia.edu/~as8ca/ * *

Alok Singhal

10:19 a.m.

On 26/05/04: 11:24, Perry Greenfield wrote:

...

(due to confusions with "a" in text I'll use x in place of "a") I believe the problem you are seeing (I'm not 100% certain yet) is that although it is possible to assign to an array-indexed array, that doing that twice over doesn't work since Python is, in effect, treating x[m1] as an expression even though it is on the left side. That expression results in a new array that the second indexing updates, but then is thrown away since it is not assigned to anything else.

Your second try creates a temporary t which is also not a view into a so when you update t, a is not updated.

Thanks or this info. It makes sense now. I suspected earlier that t was not a view but a copy, but didn't realise that the same thing was happening with x[m1][m2].

...

try

x[m1[0][m2]] = array([10,20])

instead. The intent here is to provide x with the net index array by indexing m1 first rather than indexing x first. (note the odd use of m1[0]; this is necessary since where() will return a tuple of index arrays (to allow use in multidimensional cases as indices, so the m1[0] extracts the array from the tuple; Since m1 is a tuple, indexing it with another index array (well, tuple containing an index array) doesn't work).

This works, but for the fact that in my real code I *am* dealing with multidimensional arrays. But this is a nice trick to remember. (So, the following "does not work": x = arange(9) x.shape=(3,3) m1 = where(x > 4) m2 = where(x[m1] < 7) x[m1[0][m2]] ) On 26/05/04: 11:41, Todd Miller wrote:

...

Here's how I did it (there was an easier way I overlooked):

a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])

Ah. This works! Even for multidimensional arrays. On 26/05/04: 18:06, Francesc Alted wrote:

...

Perhaps the easier way looks like this?

...
...
...
a = arange(10) a[(a>5) & (a<8)] = array([10, 20]) a array([ 0, 1, 2, 3, 4, 5, 10, 20, 8, 9])

Indexing is a very powerful (and fun) thing, indeed :)

I like this too. Thank you all for the help! Alok -- Alok Singhal (as8ca@virginia.edu) __ Graduate Student, dept. of Astronomy / _ University of Virginia \_O \ http://www.astro.virginia.edu/~as8ca/ __/

Perry Greenfield

12:03 p.m.

...

...
try

x[m1[0][m2]] = array([10,20])

instead. The intent here is to provide x with the net index array by indexing m1 first rather than indexing x first. (note the odd use of m1[0]; this is necessary since where() will return a tuple of index arrays (to allow use in multidimensional cases as indices, so the m1[0] extracts the array from the tuple; Since m1 is a tuple, indexing it with another index array (well, tuple containing an index array) doesn't work).

This works, but for the fact that in my real code I *am* dealing with multidimensional arrays. But this is a nice trick to remember.

(So, the following "does not work":

x = arange(9) x.shape=(3,3) m1 = where(x > 4) m2 = where(x[m1] < 7) x[m1[0][m2]] )

correct. You'd have to break apart the m1 tuple and index all the components, e.g., m11, m12 = m1 x[m11[m2],m12[m2]] = ... This gets clumsier with the more dimensions that must be handled, but you still can do it. It would be most useful if the indexed array is very large, the number of items selected is relatively small and one doesn't want to incur the memory overhead of all the mask arrays of the admittedly much nicer notational approach that Francesc illustrated. Perry

Francesc Alted

12:47 a.m.

A Dimecres 26 Maig 2004 21:01, Perry Greenfield va escriure:

...

correct. You'd have to break apart the m1 tuple and index all the components, e.g.,

m11, m12 = m1 x[m11[m2],m12[m2]] = ...

This gets clumsier with the more dimensions that must be handled, but you still can do it. It would be most useful if the indexed array is very large, the number of items selected is relatively small and one doesn't want to incur the memory overhead of all the mask arrays of the admittedly much nicer notational approach that Francesc illustrated.

Well, boolean arrays have the property that they use very little memory (only 1 byte / element), and normally perform quite well doing indexing. Some timings:

...

...
...
import timeit t1 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2],m12[m2]]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t2 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t1.repeat(3,1000) [3.1320240497589111, 3.1235389709472656, 3.1198310852050781] t2.repeat(3,1000) [1.1218469142913818, 1.117638111114502, 1.1156759262084961]

i.e. using boolean arrays for indexing is roughly 3 times faster. For larger arrays this difference is even more noticeable:

...

...
...
t3 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2],m12[m2]]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t4 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t3.repeat(3,10) [3.1818649768829346, 3.20477294921875, 3.190640926361084] t4.repeat(3,10) [0.42328095436096191, 0.42140507698059082, 0.41979002952575684]

as you see, now the difference is almost an order of magnitude (!). So, perhaps assuming the small memory overhead, in most of cases it is better to use boolean selections. However, it would be nice to know the ultimate reason of why this happens, because the Perry approach seems intuitively faster. -- Francesc Alted

Perry Greenfield

10:48 a.m.

Francesc Alted va escriure:

...

A Dimecres 26 Maig 2004 21:01, Perry Greenfield va escriure:

...
correct. You'd have to break apart the m1 tuple and index all the components, e.g.,

m11, m12 = m1 x[m11[m2],m12[m2]] = ...

This gets clumsier with the more dimensions that must be handled, but you still can do it. It would be most useful if the indexed array is very large, the number of items selected is relatively small and one doesn't want to incur the memory overhead of all the mask arrays of the admittedly much nicer notational approach that Francesc illustrated.

Well, boolean arrays have the property that they use very little memory (only 1 byte / element), and normally perform quite well doing indexing. Some timings:

...
...
...
import timeit t1 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2] ,m12[m2]]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t2 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t1.repeat(3,1000) [3.1320240497589111, 3.1235389709472656, 3.1198310852050781] t2.repeat(3,1000) [1.1218469142913818, 1.117638111114502, 1.1156759262084961]

i.e. using boolean arrays for indexing is roughly 3 times faster.

For larger arrays this difference is even more noticeable:

...
...
...
t3 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2] ,m12[m2]]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t4 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t3.repeat(3,10) [3.1818649768829346, 3.20477294921875, 3.190640926361084] t4.repeat(3,10) [0.42328095436096191, 0.42140507698059082, 0.41979002952575684]

as you see, now the difference is almost an order of magnitude (!).

So, perhaps assuming the small memory overhead, in most of cases it is better to use boolean selections. However, it would be nice to know the ultimate reason of why this happens, because the Perry approach seems intuitively faster.

Yes I agree. It was good of you to post these timings. I don't think we had actually compared the two approaches though the results don't surprise me (though I suspect the results may change if the first mask has a very small percentage of elements; the large timing test has nearly all elements selected for the first mask). Perry

7567

Age (days ago)

7568

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Alok Singhal
Andrew Straw
Francesc Alted
Perry Greenfield
Todd Miller

numarray.where confusion

Alok Singhal

Perry Greenfield

Todd Miller

Francesc Alted

Todd Miller

Andrew Straw

Alok Singhal

Alok Singhal

Perry Greenfield

Francesc Alted

Perry Greenfield

tags

participants (5)