Indexing changes/deprecations
Hey, since I am working on the indexing. I was wondering about a few smaller things: * 0-d boolean array, `np.array(0)[True]` (will work now) would give np.array([0]) as a copy, instead of the original array. I guess I could add a FutureWarning or so, but I am not sure and overall the chance of creating bugs seems low. (The boolean index should always add 1 dimension and here, remove 0 dimensions -> 1-d result.) * All index operations return a view; never the object. This means that `v = arr[...]` is slightly slower. But since it does not affect `arr[...] = vals`, I think the speed implications are negligible. * Does anyone have an idea if there is a way to change the subclass logic that view based item setting is implemented as: np.asarray(subclass[index]) = vals I somewhat think the subclass should rather implement `__setitem__` instead of relying on numpy calling its `__getitem__`, but I don't see how it can be changed. * Still thinking a bit about implementing a keepdims keyword or function, to handle matrix type logic mostly in the C-code. And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now? - Sebastian
On Fri, Sep 27, 2013 at 8:27 AM, Sebastian Berg <sebastian@sipsolutions.net>wrote:
Hey,
since I am working on the indexing. I was wondering about a few smaller things:
* 0-d boolean array, `np.array(0)[True]` (will work now) would give np.array([0]) as a copy, instead of the original array. I guess I could add a FutureWarning or so, but I am not sure and overall the chance of creating bugs seems low.
(The boolean index should always add 1 dimension and here, remove 0 dimensions -> 1-d result.)
* All index operations return a view; never the object. This means that `v = arr[...]` is slightly slower. But since it does not affect `arr[...] = vals`, I think the speed implications are negligible.
* Does anyone have an idea if there is a way to change the subclass logic that view based item setting is implemented as: np.asarray(subclass[index]) = vals
I somewhat think the subclass should rather implement `__setitem__` instead of relying on numpy calling its `__getitem__`, but I don't see how it can be changed.
* Still thinking a bit about implementing a keepdims keyword or function, to handle matrix type logic mostly in the C-code.
And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?
- Sebastian
Boolean indexing could use a facelift. First, consider the following (albeit minor) annoyance:
import numpy as np a = np.arange(5) a[[True, False, True, False, True]] array([1, 0, 1, 0, 1]) b = np.array([True, False, True, False, True]) a[b] array([0, 2, 4])
Next, it would be nice if boolean indexing returned a view (wishful thinking, I know):
c = a[b] c array([0, 2, 4]) c[1] = 7 c array([0, 7, 4]) a array([0, 1, 2, 3, 4])
Cheers! Ben Root
On Fri, 2013-09-27 at 09:26 -0400, Benjamin Root wrote: <snip>
Boolean indexing could use a facelift. First, consider the following (albeit minor) annoyance:
Done. Well will be deprecation warnings for the time being, though. <snip>
Next, it would be nice if boolean indexing returned a view (wishful thinking, I know):
Yeah, that is impossible unless you create some intermediate non-array object, so that is out of the scope of things for now I think.
c = a[b] c array([0, 2, 4]) c[1] = 7 c array([0, 7, 4]) a array([0, 1, 2, 3, 4])
Cheers! Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Sep 27, 2013 at 5:27 AM, Sebastian Berg <sebastian@sipsolutions.net>wrote:
And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?
I find this behavior of boolean indexing a little bit annoying:
a = np.arange(12).reshape(3, 4) a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) row_idx = np.array([True, True, False]) col_idx = np.array([False, True, True, False])
This shouldn't work, but it does, because there are the same number of Trues in both indexing arrays. Do we really want this to happen?:
a[row_idx, col_idx] array([1, 6])
This shouldn't work, and it doesn't:
col_idx = np.array([False, True, True, True]) a[row_idx, col_idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape
It would be nice if something like this worked, or at least it should raise a different error, because those arrays **can** be broadcast to a single shape:
a[row_idx[:, np.newaxis], col_idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape
For this there is the following workaround, although it does creation of a fully expanded boolean indexing array, which I was hoping the previous non-working code would avoid:
a[row_idx[:, np.newaxis] & col_idx] array([1, 2, 3, 5, 6, 7])
Jaime
On Fri, 2013-09-27 at 08:45 -0700, Jaime Fernández del Río wrote:
On Fri, Sep 27, 2013 at 5:27 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?
I find this behavior of boolean indexing a little bit annoying:
a = np.arange(12).reshape(3, 4) a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) row_idx = np.array([True, True, False])
col_idx = np.array([False, True, True, False])
This shouldn't work, but it does, because there are the same number of Trues in both indexing arrays. Do we really want this to happen?:
a[row_idx, col_idx] array([1, 6])
I agree that this can be confusing, but I think the fancy indexing logic (plus the "boolean is much like a nonzero(boolean) call") dictates this. One could think about doing something here, but it would basically require a secondary indexing mechanism like `arr.no_broadcast_fancy[index]`. `np.ix_` currently implements a conversion logic for this, though it cannot support slices.
This shouldn't work, and it doesn't:
col_idx = np.array([False, True, True, True])
a[row_idx, col_idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape
In [11]: a[np.ix_(row_idx, col_idx)] Out[11]: array([[1, 2, 3], [5, 6, 7]]) <snip>
However, there is one further point here that I think is likely worth changing. And that is adding a check that the boolean array has the correct shape. The `nonzero` logic works good, but it allows things like: np.array([1, 2])[np.array([True, False, False, False])] Which is a bit weird, though maybe not harmful. - Sebastian
Jaime
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 27 September 2013 13:27, Sebastian Berg <sebastian@sipsolutions.net>wrote:
And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?
Well, since you asked... I'd *love* to see the fancy indexing behaviour moved to a separate method(s). Yes, I know! I'm not realistically expecting that to be tackled right now. And it sometimes seems like something of a sacred idol that one is not supposed to question. But I've kept quiet on the issue for too long and would love to know if anyone else thinks the same. It confuses people. Actually, it confuses the hell out of people. I'm *still* finding out new quirks of its behaviour and I've been using NumPy in a professional role for years... although you should bear in mind I could just be a slow learner. ;-)
participants (4)
-
Benjamin Root
-
Jaime Fernández del Río
-
Richard Hattersley
-
Sebastian Berg