![](https://secure.gravatar.com/avatar/911703b2b4ecdd4d0579b516ff31756a.jpg?s=120&d=mm&r=g)
Hi, I am having trouble understanding how exactly "where" works in numarray. What I am trying to do: I am preparing a two-level mask in an array and then assign values to the array where both masks are true:
from numarray import * a = arange(10) # First mask m1 = where(a > 5) a[m1] array([6, 7, 8, 9]) # Second mask m2 = where(a[m1] < 8) a[m1][m2] array([6, 7]) # So far so good # Now change some values a[m1][m2] = array([10, 20]) a[m1][m2] array([6, 7]) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Didn't work # Let's try a temporary variable t = a[m1] t[m2] array([6, 7]) t[m2] = array([10, 20]) t[m2], t (array([10, 20]), array([10, 20, 8, 9])) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, my assignment to a[m1][m2] seems to work (no messages), but it doesn't produce the effect I want it to. I have read the documentation but I couldn't find something that would explain this behavior. So my questions: - did I miss something important in the documentation, - I am expecting something I shouldn't, or - there is a bug in numarray? Thanks, Alok -- Alok Singhal (as8ca@virginia.edu) __ Graduate Student, dept. of Astronomy / _ University of Virginia \_O \ http://www.astro.virginia.edu/~as8ca/ __/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
Alok Singhal wrote:
Hi,
I am having trouble understanding how exactly "where" works in numarray.
What I am trying to do:
I am preparing a two-level mask in an array and then assign values to the array where both masks are true:
from numarray import * a = arange(10) # First mask m1 = where(a > 5) a[m1] array([6, 7, 8, 9]) # Second mask m2 = where(a[m1] < 8) a[m1][m2] array([6, 7]) # So far so good # Now change some values a[m1][m2] = array([10, 20]) a[m1][m2] array([6, 7]) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Didn't work # Let's try a temporary variable t = a[m1] t[m2] array([6, 7]) t[m2] = array([10, 20]) t[m2], t (array([10, 20]), array([10, 20, 8, 9])) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, my assignment to a[m1][m2] seems to work (no messages), but it doesn't produce the effect I want it to.
I have read the documentation but I couldn't find something that would explain this behavior.
So my questions:
- did I miss something important in the documentation, - I am expecting something I shouldn't, or - there is a bug in numarray?
(due to confusions with "a" in text I'll use x in place of "a") I believe the problem you are seeing (I'm not 100% certain yet) is that although it is possible to assign to an array-indexed array, that doing that twice over doesn't work since Python is, in effect, treating x[m1] as an expression even though it is on the left side. That expression results in a new array that the second indexing updates, but then is thrown away since it is not assigned to anything else. Your second try creates a temporary t which is also not a view into a so when you update t, a is not updated. try x[m1[0][m2]] = array([10,20]) instead. The intent here is to provide x with the net index array by indexing m1 first rather than indexing x first. (note the odd use of m1[0]; this is necessary since where() will return a tuple of index arrays (to allow use in multidimensional cases as indices, so the m1[0] extracts the array from the tuple; Since m1 is a tuple, indexing it with another index array (well, tuple containing an index array) doesn't work). Perry Greenfield
![](https://secure.gravatar.com/avatar/faf9400121dca9940496a7473b1d8179.jpg?s=120&d=mm&r=g)
On Wed, 2004-05-26 at 10:48, Alok Singhal wrote:
Hi,
I am having trouble understanding how exactly "where" works in numarray.
What I am trying to do:
I am preparing a two-level mask in an array and then assign values to the array where both masks are true:
from numarray import * a = arange(10) # First mask m1 = where(a > 5) a[m1] array([6, 7, 8, 9]) # Second mask m2 = where(a[m1] < 8) a[m1][m2]
a[m1] is a new array here.
array([6, 7])
# So far so good # Now change some values a[m1][m2] = array([10, 20])
And here too. This does a write into what is effectively a temporary variable returned by the expression a[m1]. Although the write occurs, it is lost.
a[m1][m2] array([6, 7]) a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Here's how I did it (there was an easier way I overlooked): a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20]) The principle here is to keep the masks as "full sized" boolean arrays rather than index arrays so they can be combined using the bitwise and operator. The resulting mask can be used to index just once eliminating the temporary. Regards, Todd
![](https://secure.gravatar.com/avatar/81b3970c8247b2521d2f814de5b24475.jpg?s=120&d=mm&r=g)
A Dimecres 26 Maig 2004 17:41, Todd Miller va escriure:
Here's how I did it (there was an easier way I overlooked):
a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])
Perhaps the easier way looks like this?
a = arange(10) a[(a>5) & (a<8)] = array([10, 20]) a array([ 0, 1, 2, 3, 4, 5, 10, 20, 8, 9])
Indexing is a very powerful (and fun) thing, indeed :) -- Francesc Alted
![](https://secure.gravatar.com/avatar/faf9400121dca9940496a7473b1d8179.jpg?s=120&d=mm&r=g)
On Wed, 2004-05-26 at 12:06, Francesc Alted wrote:
A Dimecres 26 Maig 2004 17:41, Todd Miller va escriure:
Here's how I did it (there was an easier way I overlooked):
a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])
Perhaps the easier way looks like this?
a = arange(10) a[(a>5) & (a<8)] = array([10, 20])
Much, much better. Thanks! Todd
a array([ 0, 1, 2, 3, 4, 5, 10, 20, 8, 9])
Indexing is a very powerful (and fun) thing, indeed :) -- Todd Miller <jmiller@stsci.edu>
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
Todd Miller wrote:
On Wed, 2004-05-26 at 12:06, Francesc Alted wrote:
A Dimecres 26 Maig 2004 17:41, Todd Miller va escriure:
Here's how I did it (there was an easier way I overlooked):
a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])
Perhaps the easier way looks like this?
a = arange(10) a[(a>5) & (a<8)] = array([10, 20])
Is there an equivalently slick way to accomplish to what I'm trying below? (the the values in c[:,1] get changed based on the same-row values in c[:,0]?)
from numarray import * a=arange(10) b=arange(10)+20 c=concatenate((a[:,NewAxis],b[:,NewAxis]),axis=1) c[c[:,0]>7][:,1] = 0 # doesn't work because it makes a copy and therefore doesn't modify c Cheers! Andrew
![](https://secure.gravatar.com/avatar/911703b2b4ecdd4d0579b516ff31756a.jpg?s=120&d=mm&r=g)
On 26/05/04: 10:43, Andrew Straw wrote:
Todd Miller wrote:
On Wed, 2004-05-26 at 12:06, Francesc Alted wrote:
a = arange(10) a[(a>5) & (a<8)] = array([10, 20])
Is there an equivalently slick way to accomplish to what I'm trying below? (the the values in c[:,1] get changed based on the same-row values in c[:,0]?)
from numarray import * a=arange(10) b=arange(10)+20 c=concatenate((a[:,NewAxis],b[:,NewAxis]),axis=1) c[c[:,0]>7][:,1] = 0 # doesn't work because it makes a copy and therefore doesn't modify c
Well, for your case, the following works:
print c [[ 0 20] [ 1 21] [ 2 22] [ 3 23] [ 4 24] [ 5 25] [ 6 26] [ 7 27] [ 8 28] [ 9 29]] t0 = c[:, 0] t1 = c[:, 1] t1[t0 > 7] = 0 print c [[ 0 20] [ 1 21] [ 2 22] [ 3 23] [ 4 24] [ 5 25] [ 6 26] [ 7 27] [ 8 0] [ 9 0]]
Not sure this helps in your real code though. Alok -- Alok Singhal (as8ca@virginia.edu) * * Graduate Student, dept. of Astronomy * * * University of Virginia http://www.astro.virginia.edu/~as8ca/ * *
![](https://secure.gravatar.com/avatar/911703b2b4ecdd4d0579b516ff31756a.jpg?s=120&d=mm&r=g)
On 26/05/04: 11:24, Perry Greenfield wrote:
(due to confusions with "a" in text I'll use x in place of "a") I believe the problem you are seeing (I'm not 100% certain yet) is that although it is possible to assign to an array-indexed array, that doing that twice over doesn't work since Python is, in effect, treating x[m1] as an expression even though it is on the left side. That expression results in a new array that the second indexing updates, but then is thrown away since it is not assigned to anything else.
Your second try creates a temporary t which is also not a view into a so when you update t, a is not updated.
Thanks or this info. It makes sense now. I suspected earlier that t was not a view but a copy, but didn't realise that the same thing was happening with x[m1][m2].
try
x[m1[0][m2]] = array([10,20])
instead. The intent here is to provide x with the net index array by indexing m1 first rather than indexing x first. (note the odd use of m1[0]; this is necessary since where() will return a tuple of index arrays (to allow use in multidimensional cases as indices, so the m1[0] extracts the array from the tuple; Since m1 is a tuple, indexing it with another index array (well, tuple containing an index array) doesn't work).
This works, but for the fact that in my real code I *am* dealing with multidimensional arrays. But this is a nice trick to remember. (So, the following "does not work": x = arange(9) x.shape=(3,3) m1 = where(x > 4) m2 = where(x[m1] < 7) x[m1[0][m2]] ) On 26/05/04: 11:41, Todd Miller wrote:
Here's how I did it (there was an easier way I overlooked):
a = arange(10) m1 = where(a > 5, 1, 0).astype('Bool') m2 = where(a < 8, 1, 0).astype('Bool') a[m1 & m2] = array([10, 20])
Ah. This works! Even for multidimensional arrays. On 26/05/04: 18:06, Francesc Alted wrote:
Perhaps the easier way looks like this?
a = arange(10) a[(a>5) & (a<8)] = array([10, 20]) a array([ 0, 1, 2, 3, 4, 5, 10, 20, 8, 9])
Indexing is a very powerful (and fun) thing, indeed :)
I like this too. Thank you all for the help! Alok -- Alok Singhal (as8ca@virginia.edu) __ Graduate Student, dept. of Astronomy / _ University of Virginia \_O \ http://www.astro.virginia.edu/~as8ca/ __/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
try
x[m1[0][m2]] = array([10,20])
instead. The intent here is to provide x with the net index array by indexing m1 first rather than indexing x first. (note the odd use of m1[0]; this is necessary since where() will return a tuple of index arrays (to allow use in multidimensional cases as indices, so the m1[0] extracts the array from the tuple; Since m1 is a tuple, indexing it with another index array (well, tuple containing an index array) doesn't work).
This works, but for the fact that in my real code I *am* dealing with multidimensional arrays. But this is a nice trick to remember.
(So, the following "does not work":
x = arange(9) x.shape=(3,3) m1 = where(x > 4) m2 = where(x[m1] < 7) x[m1[0][m2]] )
correct. You'd have to break apart the m1 tuple and index all the components, e.g., m11, m12 = m1 x[m11[m2],m12[m2]] = ... This gets clumsier with the more dimensions that must be handled, but you still can do it. It would be most useful if the indexed array is very large, the number of items selected is relatively small and one doesn't want to incur the memory overhead of all the mask arrays of the admittedly much nicer notational approach that Francesc illustrated. Perry
![](https://secure.gravatar.com/avatar/81b3970c8247b2521d2f814de5b24475.jpg?s=120&d=mm&r=g)
A Dimecres 26 Maig 2004 21:01, Perry Greenfield va escriure:
correct. You'd have to break apart the m1 tuple and index all the components, e.g.,
m11, m12 = m1 x[m11[m2],m12[m2]] = ...
This gets clumsier with the more dimensions that must be handled, but you still can do it. It would be most useful if the indexed array is very large, the number of items selected is relatively small and one doesn't want to incur the memory overhead of all the mask arrays of the admittedly much nicer notational approach that Francesc illustrated.
Well, boolean arrays have the property that they use very little memory (only 1 byte / element), and normally perform quite well doing indexing. Some timings:
import timeit t1 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2],m12[m2]]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t2 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t1.repeat(3,1000) [3.1320240497589111, 3.1235389709472656, 3.1198310852050781] t2.repeat(3,1000) [1.1218469142913818, 1.117638111114502, 1.1156759262084961]
i.e. using boolean arrays for indexing is roughly 3 times faster. For larger arrays this difference is even more noticeable:
t3 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2],m12[m2]]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t4 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t3.repeat(3,10) [3.1818649768829346, 3.20477294921875, 3.190640926361084] t4.repeat(3,10) [0.42328095436096191, 0.42140507698059082, 0.41979002952575684]
as you see, now the difference is almost an order of magnitude (!). So, perhaps assuming the small memory overhead, in most of cases it is better to use boolean selections. However, it would be nice to know the ultimate reason of why this happens, because the Perry approach seems intuitively faster. -- Francesc Alted
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
Francesc Alted va escriure:
A Dimecres 26 Maig 2004 21:01, Perry Greenfield va escriure:
correct. You'd have to break apart the m1 tuple and index all the components, e.g.,
m11, m12 = m1 x[m11[m2],m12[m2]] = ...
This gets clumsier with the more dimensions that must be handled, but you still can do it. It would be most useful if the indexed array is very large, the number of items selected is relatively small and one doesn't want to incur the memory overhead of all the mask arrays of the admittedly much nicer notational approach that Francesc illustrated.
Well, boolean arrays have the property that they use very little memory (only 1 byte / element), and normally perform quite well doing indexing. Some timings:
import timeit t1 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2] ,m12[m2]]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t2 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=3;x=arange(dim*dim);x.shape=(dim,dim)") t1.repeat(3,1000) [3.1320240497589111, 3.1235389709472656, 3.1198310852050781] t2.repeat(3,1000) [1.1218469142913818, 1.117638111114502, 1.1156759262084961]
i.e. using boolean arrays for indexing is roughly 3 times faster.
For larger arrays this difference is even more noticeable:
t3 = timeit.Timer("m1=where(x>4);m2=where(x[m1]<7);m11,m12=m1;x[m11[m2] ,m12[m2]]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t4 = timeit.Timer("x[(x>4) & (x<7)]","from numarray import arange,where;dim=1000;x=arange(dim*dim);x.shape=(dim,dim)") t3.repeat(3,10) [3.1818649768829346, 3.20477294921875, 3.190640926361084] t4.repeat(3,10) [0.42328095436096191, 0.42140507698059082, 0.41979002952575684]
as you see, now the difference is almost an order of magnitude (!).
So, perhaps assuming the small memory overhead, in most of cases it is better to use boolean selections. However, it would be nice to know the ultimate reason of why this happens, because the Perry approach seems intuitively faster.
Yes I agree. It was good of you to post these timings. I don't think we had actually compared the two approaches though the results don't surprise me (though I suspect the results may change if the first mask has a very small percentage of elements; the large timing test has nearly all elements selected for the first mask). Perry
participants (5)
-
Alok Singhal
-
Andrew Straw
-
Francesc Alted
-
Perry Greenfield
-
Todd Miller