Mailman 3 Indexing changes/deprecations - NumPy-Discussion

newer
Re: [Numpy-discussion] Baffling...

Indexing changes/deprecations

Sebastian Berg

27 Sep 2013 27 Sep '13

7:27 a.m.

Hey, since I am working on the indexing. I was wondering about a few smaller things: * 0-d boolean array, `np.array(0)[True]` (will work now) would give np.array([0]) as a copy, instead of the original array. I guess I could add a FutureWarning or so, but I am not sure and overall the chance of creating bugs seems low. (The boolean index should always add 1 dimension and here, remove 0 dimensions -> 1-d result.) * All index operations return a view; never the object. This means that `v = arr[...]` is slightly slower. But since it does not affect `arr[...] = vals`, I think the speed implications are negligible. * Does anyone have an idea if there is a way to change the subclass logic that view based item setting is implemented as: np.asarray(subclass[index]) = vals I somewhat think the subclass should rather implement `__setitem__` instead of relying on numpy calling its `__getitem__`, but I don't see how it can be changed. * Still thinking a bit about implementing a keepdims keyword or function, to handle matrix type logic mostly in the C-code. And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now? - Sebastian

Show replies by date

Benjamin Root

27 Sep 27 Sep

8:26 a.m.

On Fri, Sep 27, 2013 at 8:27 AM, Sebastian Berg <sebastian@sipsolutions.net>wrote:

...

Hey,

since I am working on the indexing. I was wondering about a few smaller things:

* 0-d boolean array, `np.array(0)[True]` (will work now) would give np.array([0]) as a copy, instead of the original array. I guess I could add a FutureWarning or so, but I am not sure and overall the chance of creating bugs seems low.

(The boolean index should always add 1 dimension and here, remove 0 dimensions -> 1-d result.)

* All index operations return a view; never the object. This means that `v = arr[...]` is slightly slower. But since it does not affect `arr[...] = vals`, I think the speed implications are negligible.

* Does anyone have an idea if there is a way to change the subclass logic that view based item setting is implemented as: np.asarray(subclass[index]) = vals

I somewhat think the subclass should rather implement `__setitem__` instead of relying on numpy calling its `__getitem__`, but I don't see how it can be changed.

* Still thinking a bit about implementing a keepdims keyword or function, to handle matrix type logic mostly in the C-code.

And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?

- Sebastian

Boolean indexing could use a facelift. First, consider the following (albeit minor) annoyance:

...

...
...
import numpy as np a = np.arange(5) a[[True, False, True, False, True]] array([1, 0, 1, 0, 1]) b = np.array([True, False, True, False, True]) a[b] array([0, 2, 4])

Next, it would be nice if boolean indexing returned a view (wishful thinking, I know):

...

...
...
c = a[b] c array([0, 2, 4]) c[1] = 7 c array([0, 7, 4]) a array([0, 1, 2, 3, 4])

Cheers! Ben Root

Sebastian Berg

8:35 a.m.

On Fri, 2013-09-27 at 09:26 -0400, Benjamin Root wrote: <snip>

...

Boolean indexing could use a facelift. First, consider the following (albeit minor) annoyance:

Done. Well will be deprecation warnings for the time being, though. <snip>

...

Next, it would be nice if boolean indexing returned a view (wishful thinking, I know):

Yeah, that is impossible unless you create some intermediate non-array object, so that is out of the scope of things for now I think.

...

...
...
...
c = a[b] c array([0, 2, 4]) c[1] = 7 c array([0, 7, 4]) a array([0, 1, 2, 3, 4])

Cheers! Ben Root

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Jaime Fernández del Río

10:45 a.m.

On Fri, Sep 27, 2013 at 5:27 AM, Sebastian Berg <sebastian@sipsolutions.net>wrote:

...

And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?

I find this behavior of boolean indexing a little bit annoying:

...

...
...
a = np.arange(12).reshape(3, 4) a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) row_idx = np.array([True, True, False]) col_idx = np.array([False, True, True, False])

This shouldn't work, but it does, because there are the same number of Trues in both indexing arrays. Do we really want this to happen?:

...

...
...
a[row_idx, col_idx] array([1, 6])

This shouldn't work, and it doesn't:

...

...
...
col_idx = np.array([False, True, True, True]) a[row_idx, col_idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape

It would be nice if something like this worked, or at least it should raise a different error, because those arrays **can** be broadcast to a single shape:

...

...
...
a[row_idx[:, np.newaxis], col_idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape

For this there is the following workaround, although it does creation of a fully expanded boolean indexing array, which I was hoping the previous non-working code would avoid:

...

...
...
a[row_idx[:, np.newaxis] & col_idx] array([1, 2, 3, 5, 6, 7])

Jaime

Sebastian Berg

11:36 a.m.

On Fri, 2013-09-27 at 08:45 -0700, Jaime Fernández del Río wrote:

...

On Fri, Sep 27, 2013 at 5:27 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:

And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?

I find this behavior of boolean indexing a little bit annoying:

...
...
...
a = np.arange(12).reshape(3, 4) a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) row_idx = np.array([True, True, False])

...
...
...
col_idx = np.array([False, True, True, False])

This shouldn't work, but it does, because there are the same number of Trues in both indexing arrays. Do we really want this to happen?:

...
...
...
a[row_idx, col_idx] array([1, 6])

I agree that this can be confusing, but I think the fancy indexing logic (plus the "boolean is much like a nonzero(boolean) call") dictates this. One could think about doing something here, but it would basically require a secondary indexing mechanism like `arr.no_broadcast_fancy[index]`. `np.ix_` currently implements a conversion logic for this, though it cannot support slices.

...

This shouldn't work, and it doesn't:

...
...
...
col_idx = np.array([False, True, True, True])

...
...
...
a[row_idx, col_idx] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape

In [11]: a[np.ix_(row_idx, col_idx)] Out[11]: array([[1, 2, 3], [5, 6, 7]]) <snip>

...

However, there is one further point here that I think is likely worth changing. And that is adding a check that the boolean array has the correct shape. The `nonzero` logic works good, but it allows things like: np.array([1, 2])[np.array([True, False, False, False])] Which is a bit weird, though maybe not harmful. - Sebastian

...

Jaime

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Richard Hattersley

1:14 p.m.

On 27 September 2013 13:27, Sebastian Berg <sebastian@sipsolutions.net>wrote:

...

And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now?

Well, since you asked... I'd *love* to see the fancy indexing behaviour moved to a separate method(s). Yes, I know! I'm not realistically expecting that to be tackled right now. And it sometimes seems like something of a sacred idol that one is not supposed to question. But I've kept quiet on the issue for too long and would love to know if anyone else thinks the same. It confuses people. Actually, it confuses the hell out of people. I'm *still* finding out new quirks of its behaviour and I've been using NumPy in a professional role for years... although you should bear in mind I could just be a slow learner. ;-)

4060

Age (days ago)

4060

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Benjamin Root
Jaime Fernández del Río
Richard Hattersley
Sebastian Berg

Indexing changes/deprecations

Sebastian Berg

Benjamin Root

Sebastian Berg

Jaime Fernández del Río

Sebastian Berg

Richard Hattersley

tags

participants (4)