DEP: Deprecate boolean array indices with non-matching shape #4353
Hi All, I've not strong feelings one way or the other on this proposed deprecation for numpy 1.10 and would like some feedback from interested users. Chuck
On Thu, Jun 4, 2015 at 6:26 PM, Charles R Harris
Hi All,
I've not strong feelings one way or the other on this proposed deprecation for numpy 1.10 and would like some feedback from interested users.
Umm, link is #4353 https://github.com/numpy/numpy/pull/4353. Chuck
So specifically the question is -- if you have an array with five items,
and a Boolean array with three items, then currently you can use the later
to index the former:
arr = np.arange(5)
mask = np.asarray([True, False, True])
arr[mask] # returns array([0, 2])
This is justified by the rule that indexing with a Boolean array should be
the same as indexing with the same array that's been passed to
np.nonzero(). Empirically, though, this causes constant confusion and does
not seen very useful, so the question is whether we should deprecate it.
-n
On Jun 4, 2015 5:30 PM, "Charles R Harris"
On Thu, Jun 4, 2015 at 6:26 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Hi All,
I've not strong feelings one way or the other on this proposed deprecation for numpy 1.10 and would like some feedback from interested users.
Umm, link is #4353 https://github.com/numpy/numpy/pull/4353.
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith
So specifically the question is -- if you have an array with five items, and a Boolean array with three items, then currently you can use the later to index the former:
arr = np.arange(5) mask = np.asarray([True, False, True]) arr[mask] # returns array([0, 2])
This is justified by the rule that indexing with a Boolean array should be the same as indexing with the same array that's been passed to np.nonzero(). Empirically, though, this causes constant confusion and does not seen very useful, so the question is whether we should deprecate it.
One place where the current behavior is particularly baffling and annoying is when you have multiple boolean masks in the same indexing operation. I think everyone would expect this to index separately on each axis ("outer product indexing" style, like slices do), and that's really the only useful interpretation, but that's not what it does...: In [3]: a = np.arange(9).reshape((3, 3)) In [4]: a Out[4]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [6]: a[np.asarray([True, False, True]), np.asarray([False, True, True])] Out[6]: array([1, 8]) In [7]: a[np.asarray([True, False, True]), np.asarray([False, False, True])] Out[7]: array([2, 8]) In [8]: a[np.asarray([True, False, True]), np.asarray([True, True, True])] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-8-30b3427bec2a> in <module>() ----> 1 a[np.asarray([True, False, True]), np.asarray([True, True, True])] IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) -n -- Nathaniel J. Smith -- http://vorpus.org
On Thu, Jun 4, 2015 at 9:04 PM, Nathaniel Smith
On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith
wrote: One place where the current behavior is particularly baffling and annoying is when you have multiple boolean masks in the same indexing operation. I think everyone would expect this to index separately on each axis ("outer product indexing" style, like slices do), and that's really the only useful interpretation, but that's not what it does...:
As a huge user of boolean indexes, I have never expected this to work in any way, shape or form. I don't think it works in matlab (but someone should probably check that), so you wouldn't have to worry about converts missing a feature from there. I have always been told that boolean indexing will produce a flattened array, and I wouldn't want to be dealing with magic when the array does not match up right. Now, what if the boolean array is broadcastable (dimension-wise, not length-wise)? I do see some uses there. Modulo that, my vote is to deprecate. Ben Root
On Thu, Jun 4, 2015 at 6:22 PM, Benjamin Root
On Thu, Jun 4, 2015 at 9:04 PM, Nathaniel Smith
wrote: On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith
wrote: One place where the current behavior is particularly baffling and annoying is when you have multiple boolean masks in the same indexing operation. I think everyone would expect this to index separately on each axis ("outer product indexing" style, like slices do), and that's really the only useful interpretation, but that's not what it does...:
As a huge user of boolean indexes, I have never expected this to work in any way, shape or form. I don't think it works in matlab (but someone should probably check that), so you wouldn't have to worry about converts missing a feature from there. I have always been told that boolean indexing will produce a flattened array, and I wouldn't want to be dealing with magic when the array does not match up right.
Note that there are two types of boolean indexing: type 1: arr[mask] where mask is n-d (ideally the same shape as "arr", but I think that it *is* broadcast if not). This always produces 1-d output. type 2: arr[..., mask, ...], where mask is 1-d and only applies to the given dimension. My comment was about the second type. Are your comments about the second type? The second type definitely does not produce a flattened array: In [7]: a = np.arange(9).reshape(3, 3) In [8]: a[np.asarray([True, False, True]), :] Out[8]: array([[0, 1, 2], [6, 7, 8]]) -n -- Nathaniel J. Smith -- http://vorpus.org
On Thu, Jun 4, 2015 at 10:41 PM, Nathaniel Smith
My comment was about the second type. Are your comments about the second type? The second type definitely does not produce a flattened array:
I was talking about the second type in that I never even knew it existed. My understanding of boolean indexing has always been that it flattens, so the second type is a surprise to me. Ben Root
On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith
wrote: So specifically the question is -- if you have an array with five items, and a Boolean array with three items, then currently you can use the later to index the former:
arr = np.arange(5) mask = np.asarray([True, False, True]) arr[mask] # returns array([0, 2])
This is justified by the rule that indexing with a Boolean array should be the same as indexing with the same array that's been passed to np.nonzero(). Empirically, though, this causes constant confusion and does not seen very useful, so the question is whether we should deprecate it.
One place where the current behavior is particularly baffling and annoying is when you have multiple boolean masks in the same indexing operation. I think everyone would expect this to index separately on each axis ("outer product indexing" style, like slices do), and that's really the only useful interpretation, but that's not what it does...:
This is not being deprecated in there for the moment, it is a different discussion. Though maybe we can improve the error message to mention that the array was originally boolean, has always been bugging me a bit (it used to mention for some cases it is not anymore). - Sebastian
In [3]: a = np.arange(9).reshape((3, 3))
In [4]: a Out[4]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
In [6]: a[np.asarray([True, False, True]), np.asarray([False, True, True])] Out[6]: array([1, 8])
In [7]: a[np.asarray([True, False, True]), np.asarray([False, False, True])] Out[7]: array([2, 8])
In [8]: a[np.asarray([True, False, True]), np.asarray([True, True, True])] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-8-30b3427bec2a> in <module>() ----> 1 a[np.asarray([True, False, True]), np.asarray([True, True, True])]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Jun 5, 2015 at 3:16 AM, Sebastian Berg
On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith
wrote: So specifically the question is -- if you have an array with five items, and a Boolean array with three items, then currently you can use the later to index the former:
arr = np.arange(5) mask = np.asarray([True, False, True]) arr[mask] # returns array([0, 2])
This is justified by the rule that indexing with a Boolean array should be the same as indexing with the same array that's been passed to np.nonzero(). Empirically, though, this causes constant confusion and does not seen very useful, so the question is whether we should deprecate it.
One place where the current behavior is particularly baffling and annoying is when you have multiple boolean masks in the same indexing operation. I think everyone would expect this to index separately on each axis ("outer product indexing" style, like slices do), and that's really the only useful interpretation, but that's not what it does...:
This is not being deprecated in there for the moment, it is a different discussion. Though maybe we can improve the error message to mention that the array was originally boolean, has always been bugging me a bit (it used to mention for some cases it is not anymore).
- Sebastian
In [3]: a = np.arange(9).reshape((3, 3))
In [4]: a Out[4]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
In [6]: a[np.asarray([True, False, True]), np.asarray([False, True, True])] Out[6]: array([1, 8])
In [7]: a[np.asarray([True, False, True]), np.asarray([False, False, True])] Out[7]: array([2, 8])
In [8]: a[np.asarray([True, False, True]), np.asarray([True, True, True])]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last) <ipython-input-8-30b3427bec2a> in <module>() ----> 1 a[np.asarray([True, False, True]), np.asarray([True, True, True])]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
What is actually being deprecated? It looks like there are different examples. wrong length: Nathaniels first example above, where the mask is not broadcastable to original array because mask is longer or shorter than shape[axis]. I also wouldn't have expected this to work, although I use np.nozero and boolean mask indexing interchangeably, I would assume we need the correct length for the mask. The second case where the boolean mask has an extra dimension of length one, or several boolean arrays might need more checking. I'm pretty sure I used various version, assuming they are a feature, and when I see arrays, I usually don't assume "outer product indexing" (that might lead to a similar discussion as the recent fancy versus orthogonal indexing) Josef
On Fr, 2015-06-05 at 08:36 -0400, josef.pktd@gmail.com wrote:
<snip>
What is actually being deprecated? It looks like there are different examples.
wrong length: Nathaniels first example above, where the mask is not broadcastable to original array because mask is longer or shorter than shape[axis]. I also wouldn't have expected this to work, although I use np.nozero and boolean mask indexing interchangeably, I would assume we need the correct length for the mask.
For the moment we are only talking about wrong length (along a given dimension). Not about wrong number of dimensions or multiple boolean indices. As a side note: I don't think the single boolean index behaviour needs change, it is ok. Yes, it is not quite broadcasting, but there is no help considering transparent multidimensional indexing. As for multiple booleans, I think is more part of the "outer" indexing discussion, which is interesting but not here :). - Sebastian
The second case where the boolean mask has an extra dimension of length one, or several boolean arrays might need more checking. I'm pretty sure I used various version, assuming they are a feature, and when I see arrays, I usually don't assume "outer product indexing" (that might lead to a similar discussion as the recent fancy versus orthogonal indexing)
Josef _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg
On Fr, 2015-06-05 at 08:36 -0400, josef.pktd@gmail.com wrote:
<snip>
What is actually being deprecated? It looks like there are different examples.
wrong length: Nathaniels first example above, where the mask is not broadcastable to original array because mask is longer or shorter than shape[axis]. I also wouldn't have expected this to work, although I use np.nozero and boolean mask indexing interchangeably, I would assume we need the correct length for the mask.
For the moment we are only talking about wrong length (along a given dimension). Not about wrong number of dimensions or multiple boolean indices.
I am pro-deprecation then, definitely. I don't see a use case for padding a wrong-shaped boolean array with Falses, and the padding has burned me in the past. It's not orthogonal to the wrong-number-of-dimensions issue, though, because if your Boolean array has a dimension of length 1, broadcasting says duplicate it along that axis to match the indexee, and wrong-length says pad it with Falses. This ambiguity/pitfall disappears if the padding never happens, and that kind of broadcasting is very useful. Anne
On Fri, Jun 5, 2015 at 11:50 AM, Anne Archibald
On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg
wrote: On Fr, 2015-06-05 at 08:36 -0400, josef.pktd@gmail.com wrote:
<snip>
What is actually being deprecated? It looks like there are different examples.
wrong length: Nathaniels first example above, where the mask is not broadcastable to original array because mask is longer or shorter than shape[axis]. I also wouldn't have expected this to work, although I use np.nozero and boolean mask indexing interchangeably, I would assume we need the correct length for the mask.
For the moment we are only talking about wrong length (along a given dimension). Not about wrong number of dimensions or multiple boolean indices.
I am pro-deprecation then, definitely. I don't see a use case for padding a wrong-shaped boolean array with Falses, and the padding has burned me in the past.
It's not orthogonal to the wrong-number-of-dimensions issue, though, because if your Boolean array has a dimension of length 1, broadcasting says duplicate it along that axis to match the indexee, and wrong-length says pad it with Falses. This ambiguity/pitfall disappears if the padding never happens, and that kind of broadcasting is very useful.
Good argument, now I understand why we only get a single column
x = np.arange(4*5).reshape(4,5) mask = np.array([1,0,1,0,1], bool)
padding with False, this would also be deprecated AFAIU, and Anna pointed out
x[mask[:4][:,None]] array([ 0, 10]) x[mask[None,:]] array([0, 2, 4])
masks can only be combined with slices, so no "fancy masking" allowed nor defined (yet)
x[mask[:4][:,None], mask[None,:]] Traceback (most recent call last): File "
", line 1, in <module> x[mask[:4][:,None], mask[None,:]] IndexError: too many indices for array
I'm using 1d masks quite often to select rows or columns, which seems to work in more than two dimensions (Benjamin's surprise)
x[:, mask] array([[ 0, 2, 4], [ 5, 7, 9], [10, 12, 14], [15, 17, 19]])
x[mask[:4][:,None] * mask[None,:]] array([ 0, 2, 4, 10, 12, 14]) x[:,:,None][mask[:4][:,None] * mask[None,:]] array([[ 0], [ 2], [ 4], [10], [12], [14]])
Josef
Anne
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (6)
-
Anne Archibald
-
Benjamin Root
-
Charles R Harris
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Sebastian Berg