Mailman 3 New Indexing Methods Revival #N (subclasses!) - NumPy-Discussion

New Indexing Methods Revival #N (subclasses!)

Sebastian Berg

4 Sep 2016 4 Sep '16

12:38 a.m.

Hi all, not that I am planning to spend much time on this right now, however, I did a small rebase of the stuff I had (did not push yet) on oindex and remembered the old problem ;). The one remaining issue I have with adding things like (except making the code prettier and writing tests): arr.oindex[...] # outer/orthogonal indexing arr.vindex[...] # Picking of elements (much like current) arr.lindex[...] # current behaviour for backward compat is what to do about subclasses. Now what I can do (and have currently in my branch) is to tell someone on `subclass.oindex[...]`: This won't work, the subclass implements `__getitem__` or `__setitem__` so I don't know if the result would be correct (its a bit annoying if you also warn about using those attributes, but...). However, with or without such error, we need a nice way for subclasses to define these attributes! This is important even within numpy at least for masked arrays (possibly also matrix and memmap). They (typically) do some stuff before or after the plain indexing operation, so how do we make it convenient to allow them to do the same stuff for the special indexing attributes without weird code duplication? I can think of things, but nothing too great yet so maybe you guys got an elegant idea. - Sebastian

Attachments:

signature.asc (application/pgp-signature — 819 bytes)

Show replies by date

Sebastian Berg

4 Sep 4 Sep

5:40 p.m.

On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:

...

Hi all,

not that I am planning to spend much time on this right now, however, I did a small rebase of the stuff I had (did not push yet) on oindex and remembered the old problem ;).

The one remaining issue I have with adding things like (except making the code prettier and writing tests):

arr.oindex[...] # outer/orthogonal indexing arr.vindex[...] # Picking of elements (much like current) arr.lindex[...] # current behaviour for backward compat

is what to do about subclasses. Now what I can do (and have currently in my branch) is to tell someone on `subclass.oindex[...]`: This won't work, the subclass implements `__getitem__` or `__setitem__` so I don't know if the result would be correct (its a bit annoying if you also warn about using those attributes, but...).

Hmm, I am considering to expose a new indexing helper object. So that subclasses could implement something like `__numpy_getitem__` and `__numpy_setitem__` and if they do (and preferably nothing else) they would get back passed a small object with some information about the indexing operation. So that the subclass would implement: ``` def __numpy_setitem__(self, indexer, values): indexer.method # one of {"plain", "oindex", "vindex", "lindex"} indexer.scalar # Will the result be a scalar? indexer.view # Will the result be a view or a copy? # More information might be possible (note that not all checks are # done at this point, just basic checks will have happened already). # Do some code, that prepares self or values, could also use # indexer for another array (e.g. mask) of the same shape. result = indexer(self, values) # Do some coded to fixup the result if necessary. # Should discuss whether result is first a plain ndarray or # already wrapped. ``` This could be implemented in the C-side without much hassle, I think. Of course it adds some new API which we would have to support indefinitely. But it seems something like this would also fix the hassle of identifying e.g. if the result should be a scalar for a subclass (which may even be impossible in some cases). Would be very happy about feedback from subclassers! - Sebastian

...

However, with or without such error, we need a nice way for subclasses to define these attributes! This is important even within numpy at least for masked arrays (possibly also matrix and memmap).

They (typically) do some stuff before or after the plain indexing operation, so how do we make it convenient to allow them to do the same stuff for the special indexing attributes without weird code duplication? I can think of things, but nothing too great yet so maybe you guys got an elegant idea.

- Sebastian _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

5:53 p.m.

On So, 2016-09-04 at 14:10 +0200, Sebastian Berg wrote:

...

On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:

...
Hi all,

not that I am planning to spend much time on this right now, however, I did a small rebase of the stuff I had (did not push yet) on oindex and remembered the old problem ;).

The one remaining issue I have with adding things like (except making the code prettier and writing tests):

arr.oindex[...] # outer/orthogonal indexing arr.vindex[...] # Picking of elements (much like current) arr.lindex[...] # current behaviour for backward compat

is what to do about subclasses. Now what I can do (and have currently in my branch) is to tell someone on `subclass.oindex[...]`: This won't work, the subclass implements `__getitem__` or `__setitem__` so I don't know if the result would be correct (its a bit annoying if you also warn about using those attributes, but...).

Hmm, I am considering to expose a new indexing helper object. So that subclasses could implement something like `__numpy_getitem__` and `__numpy_setitem__` and if they do (and preferably nothing else) they would get back passed a small object with some information about the indexing operation. So that the subclass would implement:

``` def __numpy_setitem__(self, indexer, values): indexer.method # one of {"plain", "oindex", "vindex", "lindex"} indexer.scalar # Will the result be a scalar? indexer.view # Will the result be a view or a copy? # More information might be possible (note that not all checks are # done at this point, just basic checks will have happened already).

# Do some code, that prepares self or values, could also use # indexer for another array (e.g. mask) of the same shape.

result = indexer(self, values)

# Do some coded to fixup the result if necessary. # Should discuss whether result is first a plain ndarray or # already wrapped. ```

Hmm, field access is a bit annoying, but I guess can/has to be included.

...

This could be implemented in the C-side without much hassle, I think. Of course it adds some new API which we would have to support indefinitely. But it seems something like this would also fix the hassle of identifying e.g. if the result should be a scalar for a subclass (which may even be impossible in some cases).

Would be very happy about feedback from subclassers!

- Sebastian

...
However, with or without such error, we need a nice way for subclasses to define these attributes! This is important even within numpy at least for masked arrays (possibly also matrix and memmap).

They (typically) do some stuff before or after the plain indexing operation, so how do we make it convenient to allow them to do the same stuff for the special indexing attributes without weird code duplication? I can think of things, but nothing too great yet so maybe you guys got an elegant idea.

- Sebastian _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

8:50 p.m.

Hi Sebastian, I haven't given this as much thought as it deserves, but thought I would comment from the astropy perspective, where we both have direct subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and classes that store their data internally as ndarray (subclass) instances (`Time`, `SkyCoord`, ...). One comment would be that if one were to introduce a special method, one should perhaps think a bit more broadly, and capture more than the indexing methods with it. I wonder about this because for the array-holding classes mentioned above, we initially just had `__getitem__`, which got the relevant items from the underlying arrays, and then constructed a new instance with those. But recently we realised that methods like `reshape`, `transpose`, etc., require essentially the same steps, and so we constructed a new `ShapedLikeNDArray` mixin, which provides all of those [1] as long as one defines a single `_apply` method. (Indeed, it turns out that the same technique works without any real change for some numpy functions such as `np.broadcast_to`.) That said, in the actual ndarray subclasses, we have not found a need to overwrite any of the reshaping methods, since those methods are all handled OK via `__array_finalize__`. We do overwrite `__getitem__` (and `item`) as we need to take care of scalars. And we would obviously have to overwrite `oindex`, etc., as well, for the same reason, so in that respect a common method might be useful. However, perhaps it is worth considering that the only reason we need to overwrite them in the first place, unlike what is the case for all the shape-changing methods, is that scalar output does not get put through `__array_finalize__`. Might it be an idea to have the new indexing methods return array scalars instead of normal ones so we can get rid of this? All the best, Marten [1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L856

Sebastian Berg

5 Sep 5 Sep

11:18 a.m.

On So, 2016-09-04 at 11:20 -0400, Marten van Kerkwijk wrote:

...

Hi Sebastian,

I haven't given this as much thought as it deserves, but thought I would comment from the astropy perspective, where we both have direct subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and classes that store their data internally as ndarray (subclass) instances (`Time`, `SkyCoord`, ...).

One comment would be that if one were to introduce a special method, one should perhaps think a bit more broadly, and capture more than the indexing methods with it. I wonder about this because for the array-holding classes mentioned above, we initially just had `__getitem__`, which got the relevant items from the underlying arrays, and then constructed a new instance with those. But recently we realised that methods like `reshape`, `transpose`, etc., require essentially the same steps, and so we constructed a new `ShapedLikeNDArray` mixin, which provides all of those [1] as long as one defines a single `_apply` method. (Indeed, it turns out that the same technique works without any real change for some numpy functions such as `np.broadcast_to`.)

That said, in the actual ndarray subclasses, we have not found a need to overwrite any of the reshaping methods, since those methods are all handled OK via `__array_finalize__`. We do overwrite `__getitem__` (and `item`) as we need to take care of scalars. And we would obviously have to overwrite `oindex`, etc., as well, for the same reason, so in that respect a common method might be useful.

However, perhaps it is worth considering that the only reason we need to overwrite them in the first place, unlike what is the case for all the shape-changing methods, is that scalar output does not get put through `__array_finalize__`. Might it be an idea to have the new indexing methods return array scalars instead of normal ones so we can get rid of this?

I did not realize the new numpys are special with the scalar handling? The indexing (already before 1.9. I believe) always goes through PyArray_ScalarReturn or so, which I thought was used by almost all functions. If you mean the attributes (oindex, etc.), they could behave a bit different of course, though not sure to what it extend it actually helps since that would also create disparity. If we implement a new special method (__numpy_getitem__), they definitely should behave slightly different in some places. One option might be to not even do the wrapping, but leave it to the subclass. However, if you have an array with arrays inside, knowing whether to return a scalar correctly would have to rely on inspecting the index object, which is why I suggested the indexer to give a few extra informations (such as this one). Of course, since the scalar return goes through a ScalarReturn function, that function could maybe also be tought to indicate the scalar to `__array_finalize__`/`__array_wrap__` (not sure what exactly applies). - Sebastian

...

All the best,

Marten

[1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc .py#L856 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

6 Sep 6 Sep

12:24 a.m.

Hi Sebastian, Indeed, having the scalar pass through `__array_wrap__` would have been useful (_finalize__ is too late, since one cannot change the class any more, just set attributes). But that is water under the bridge, since we're stuck with people not expecting that. I think the slightly larger question, but one somewhat orthogonal to your suggestion of a new dundermethod, is whether one cannot avoid more such methods by the new indexing routines returning array scalars instead of regular ones. Obviously, though, this has larger scope, as it might be part of the merging of the now partially separate code paths for scalar and array arithmetic, etc. All the best, Marten

Sebastian Berg

12:52 a.m.

On Mo, 2016-09-05 at 14:54 -0400, Marten van Kerkwijk wrote:

...

Hi Sebastian,

Indeed, having the scalar pass through `__array_wrap__` would have been useful (_finalize__ is too late, since one cannot change the class any more, just set attributes). But that is water under the bridge, since we're stuck with people not expecting that.

I think the slightly larger question, but one somewhat orthogonal to your suggestion of a new dundermethod, is whether one cannot avoid more such methods by the new indexing routines returning array scalars instead of regular ones.

Obviously, though, this has larger scope, as it might be part of the merging of the now partially separate code paths for scalar and array arithmetic, etc.

Thanks for the input. I am not quite sure about all of the things. Calling array wrap for the scalar returns does not sound like a problem (it would also effect other code paths). Calling it only for the new methods creates a bit of branching, but is not a big deal. Would it help you though? You could avoid implementing all the new indexing methods for many/most subclasses, but how do you tell numpy that you are supporting them? Right now I thought it would make sense to give an error if you try `subclass.vindex[...]` but the subclass has `__getitem__` implemented (and not overwritten vindex). The dundermethod gives a way to tell numpy: you know what to do. For the sake of masked arrays it is also convenient (you can use the indexer also on the mask), but masked arrays are rather special. It would be interesting if there are more complex subclasses out there, which implement `__getitem__` or `__setitem__`. Maybe all we need is some new trick for the scalars and most subclasses can just remove their `__getitem__` methods.... - Sebastian

...

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

3:54 a.m.

Hi Sebastian, It would seem to me that any subclass has to keep up to date with new features in ndarray, and while I think ndarray has a responsibility not to break backward compatibility, I do not think it has to protect against new features possibly not working as expected in subclasses. In particular, I think it is overly complicated (and an unnecessary maintenance burden) to error out if a subclass has `__getitem__` overwritten, but not `oindex`. For somewhat similar reasons, I'm not too keen on a new `__numpy_getitem__` method; I realise it might reduce complexity for some ndarray subclasses eventually, but it also is an additional maintenance burden. If you really think it is useful, I think it might be more helpful to define a new mixin class which provides a version of all indexing methods that just call `__numpy_getitem__` if that is provided by the class that uses the mixin. I would *not* put it in `ndarray` proper. Indeed, the above might even be handier for subclasses, since they can choose, if they wish, to implement a similar mixin for older numpy versions, so that all the numpy version stuff can be moved to a single location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.) Overall, my sense would be to keep your PR to just implementing the various new index methods (which are great -- I still don't really like the names, but sadly don't have better suggestions...). But it might be good if others pipe in here too, in particular those maintaining ndarray subclasses! All the best, Marten

Marten van Kerkwijk

4:01 a.m.

Actually, on those names: an alternative to your proposal would be to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ``` -- Marten

Sebastian Berg

12:23 p.m.

On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:

...

Actually, on those names: an alternative to your proposal would be to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ```

Yeah we can do that easily. The only disadvantage is losing the square bracket notation.

...

-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

1:07 p.m.

On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:

...

Actually, on those names: an alternative to your proposal would be to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ```

Have I been overthinking this, eh? Just making it `__getitem__(self, index, mode=...)` and then from `vindex` calling the subclasses `__getitem__(self, index, mode="vector")` or so would already solve the issue almost fully? Only thing I am not quite sure about: 1. Is `__getitem__` in some way special to make this difficult (also considering some new ideas like allowing object[a=4]? 2. Do we currently have serious deficiencies we want to fix, and could maybe not fix like that.

...

-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

1:16 p.m.

On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:

...

On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:

...
Actually, on those names: an alternative to your proposal would be to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ```

Have I been overthinking this, eh? Just making it `__getitem__(self, index, mode=...)` and then from `vindex` calling the subclasses `__getitem__(self, index, mode="vector")` or so would already solve the issue almost fully? Only thing I am not quite sure about:

1. Is `__getitem__` in some way special to make this difficult (also considering some new ideas like allowing object[a=4]?

OK; I think the C-side slot cannot get the kwarg likely, but probably you can find a solution for that....

...

2. Do we currently have serious deficiencies we want to fix, and could maybe not fix like that.

...
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

3:27 p.m.

On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg wrote:

...

On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:

...
On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:

...
Actually, on those names: an alternative to your proposal would be to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ```

Have I been overthinking this, eh? Just making it `__getitem__(self, index, mode=...)` and then from `vindex` calling the subclasses `__getitem__(self, index, mode="vector")` or so would already solve the issue almost fully? Only thing I am not quite sure about:

1. Is `__getitem__` in some way special to make this difficult (also considering some new ideas like allowing object[a=4]?

OK; I think the C-side slot cannot get the kwarg likely, but probably you can find a solution for that....

Well, the solution is to use a different name, I think. -- Robert Kern

Sebastian Berg

9:56 p.m.

On Di, 2016-09-06 at 10:57 +0100, Robert Kern wrote:

...

On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg wrote:

...
On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:

...
On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:

...
Actually, on those names: an alternative to your proposal would

...
...
...
to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ``` Have I been overthinking this, eh? Just making it `__getitem__(self, index, mode=...)` and then from `vindex` calling the subclasses `__getitem__(self, index, mode="vector")` or so would already solve the issue almost fully? Only thing I am not quite sure about:

1. Is `__getitem__` in some way special to make this difficult (also considering some new ideas like allowing object[a=4]?

OK; I think the C-side slot cannot get the kwarg likely, but

be probably

...
you can find a solution for that....

Well, the solution is to use a different name, I think.

Yeah :). Which goes back to `__numpy_getitem__` or so, just with a slightly different (simpler) API. Something more along: 1. If subclass has `__numpy_getitem__` call it with the method keyword. Or just add the argument to `__getitem__` which should likely work as well. 2. Implement `ndarray.__numpy_getitem__` which takes the method keyword and subclasses would call it instead of `ndarray.__getitem__` their base class call. The option I first mentioned would be similar but allows to give a bit of extra information to the subclass which may be useful. But if no one actually needs that information (this information would be things available after inspection of the indexing object) it just adds quite a bit of code and thus a maintenance burden). Such a new method could of course do things slightly different (such as the scalar cases, I really have to understand that wrapping thing, I am always worried about the array of array case as well. Or that annoying setitem calls getitem. Or maybe not wrap the array at all.). - Sebastian

...

-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathan Goldbaum

4:49 a.m.

On Monday, September 5, 2016, Marten van Kerkwijk wrote:

...

Hi Sebastian,

It would seem to me that any subclass has to keep up to date with new features in ndarray, and while I think ndarray has a responsibility not to break backward compatibility, I do not think it has to protect against new features possibly not working as expected in subclasses. In particular, I think it is overly complicated (and an unnecessary maintenance burden) to error out if a subclass has `__getitem__` overwritten, but not `oindex`.

For somewhat similar reasons, I'm not too keen on a new `__numpy_getitem__` method; I realise it might reduce complexity for some ndarray subclasses eventually, but it also is an additional maintenance burden. If you really think it is useful, I think it might be more helpful to define a new mixin class which provides a version of all indexing methods that just call `__numpy_getitem__` if that is provided by the class that uses the mixin. I would *not* put it in `ndarray` proper.

I disagree that multiple inheritance (i.e. with your proposed mixin and ndarray) is something that numpy should enshrine in its API for subclasses. As the maintainer of an ndarray subclass, I'd much rather prefer just to implement a new duner method that older numpy versions will ignore.

...

Indeed, the above might even be handier for subclasses, since they can choose, if they wish, to implement a similar mixin for older numpy versions, so that all the numpy version stuff can be moved to a single location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)

Overall, my sense would be to keep your PR to just implementing the various new index methods (which are great -- I still don't really like the names, but sadly don't have better suggestions...).

But it might be good if others pipe in here too, in particular those maintaining ndarray subclasses!

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org javascript:; https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

6:30 a.m.

Hi Nathan, The question originally posed is whether `ndarray` should provide that single method as a convenience already, even though it doesn't actually use it itself. Do you think that is useful, i.e., a big advantage over overwriting the new oindex, vindex, and another that I forget? My own feeling is that while it is good to provide some hooks for subclasses (__array_prepare__, wrap, finalize, numpy_ufunc), this one is too fine-grained and the benefits do not outweigh the cost, especially since it could easily be done with a mixin (unlike those other cases, which are not used to cover ndarray methods, but rather numpy functions, i.e., they provide subclasses with hooks into those functions, which no mixin could possibly do). All the best, Marten

Marten van Kerkwijk

6:32 a.m.

p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods!

Sebastian Berg

12:22 p.m.

On Mo, 2016-09-05 at 21:02 -0400, Marten van Kerkwijk wrote:

...

p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods!

Hehe, yes but see MaskedArrays. They need logic to also index the mask, so `__getitem__`, etc. actually do a lot of things. Without any complex changes (implementing a unified method within that class or similar). The new indexing attributes simply cannot work right. And I think at least a warning might be in order (from the numpy side) for such a subclass. But maybe MaskedArrays are pretty much the most complex subclass available in that regard....

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

10:40 p.m.

On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods!

+1 I don't maintain ndarray subclasses (I prefer composition), but I don't think it's too difficult to require implementing vindex and oindex properties from scratch. Side note: I would prefer the more verbose "legacy_index" to "lindex". We really want to discourage this one, and two new abbreviations are bad enough.

Nathan Goldbaum

10:41 p.m.

On Tuesday, September 6, 2016, Stephan Hoyer wrote:

...

On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com javascript:_e(%7B%7D,'cvml','m.h.vankerkwijk@gmail.com');> wrote:

...
p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods!

+1

I don't maintain ndarray subclasses (I prefer composition), but I don't think it's too difficult to require implementing vindex and oindex properties from scratch.

Side note: I would prefer the more verbose "legacy_index" to "lindex". We really want to discourage this one, and two new abbreviations are bad enough.

Very much agreed.

CJ Carey

10:53 p.m.

I'm also in the non-subclass array-like camp, and I'd love to just write vindex and oindex methods, then have: def __getitem__(self, idx): return np.dispatch_getitem(self, idx) Where "dispatch_getitem" does some basic argument checking and calls either vindex or oindex as appropriate. Maybe that fits better as a mixin; I don't really mind either way. On Tue, Sep 6, 2016 at 12:11 PM, Nathan Goldbaum wrote:

...

On Tuesday, September 6, 2016, Stephan Hoyer wrote:

...
On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods!

+1

I don't maintain ndarray subclasses (I prefer composition), but I don't think it's too difficult to require implementing vindex and oindex properties from scratch.

Side note: I would prefer the more verbose "legacy_index" to "lindex". We really want to discourage this one, and two new abbreviations are bad enough.

Very much agreed.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

11:26 p.m.

I'd love to solve it with `__getitem__`... Since most subclasses will have that defined that with just a single argument, calling it from `oindex` with an extra mode argument set will properly fail, so good in that sense (although one better ensure a useful error message...). Another option would be to store the mode as an additional part in the index tuple (perhaps as a dict), i.e., in python form do something like: ``` def __getitem__(self, index): if isinstance(index, tuple) and isinstance(index[-1], dict): *index, kwargs = index else: kwargs = {} return self._internal_indexer(index, **kwargs) ``` This way, if all a subclass does is to call `super(SubClass, self).__getitem__(index)` (as does `Quantity`; it does not look at the index at all), it would work automagically. Indeed, one could them decide the type of index even by regular slicing in the following way ``` array[:, 10, {'mode': 'vector'}] ``` which one could of course have a special token for (like `np.newaxis` for `None`), so that it would be something like ``` array[:, 10, np.vector_index] ``` However, looking at the above, I fear this is too baroque even by my standards! -- Marten

Marten van Kerkwijk

11:29 p.m.

In a separate message, since perhaps a little less looney: yet another option would be work by analogy with np.ix_ and define pre-dispatch index preparation routines np.ox_ and np.vx_ (say), which would be used as in: ``` array[np.ox_[:, 10]] -- or -- array[np.vx_[:, 10]] ``` This could work if those functions each return something appropriate for the legacy indexer, or, if that is not possible, a specific subclass of tuple as a marker that gets interpreted further up. In the end, though, probably also too complicated. It may remain best to simply implement the new methods instead and keep it at that! -- Marten

Sebastian Berg

7 Sep 7 Sep

2:59 a.m.

On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote:

...

In a separate message, since perhaps a little less looney: yet another option would be work by analogy with np.ix_ and define pre-dispatch index preparation routines np.ox_ and np.vx_ (say), which would be used as in: ``` array[np.ox_[:, 10]] -- or -- array[np.vx_[:, 10]] ``` This could work if those functions each return something appropriate for the legacy indexer, or, if that is not possible, a specific subclass of tuple as a marker that gets interpreted further up.

Sure, it would be a solution, but not sure it is any better implementation wise then just passing an extra argument. As for the syntax for plain arrays, I am not convinced to be honest. - Sebastian

...

In the end, though, probably also too complicated. It may remain best to simply implement the new methods instead and keep it at that!

-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

11 Sep 11 Sep

6:37 p.m.

On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote:

...

In a separate message, since perhaps a little less looney: yet another option would be work by analogy with np.ix_ and define pre-dispatch index preparation routines np.ox_ and np.vx_ (say), which would be used as in: ``` array[np.ox_[:, 10]] -- or -- array[np.vx_[:, 10]] ``` This could work if those functions each return something appropriate for the legacy indexer, or, if that is not possible, a specific subclass of tuple as a marker that gets interpreted further up.

A specific subclass of tuple.... Part of me thinks this is horrifying, but it actually would solve some of the subclassing issues if `arr.vindex[...]` could end up calling `__getitem__` with a bit special indexing tuple value. I simply can't quite find the end of subclassing issues. We have tests for things like masked array correctly calling the `_data` subclass, but if the `_data` subclass does not implement the new method, numpy would have to run in circles (or something).... - Sebastian

...

In the end, though, probably also too complicated. It may remain best to simply implement the new methods instead and keep it at that!

-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

8:49 p.m.

There remains the option to just let subclasses deal with new ndarray features... Certainly, for `Quantity`, I'll quite happily do that. And if it alllows the ndarray code to remain simple and efficient, it is probably the best solution. -- Marten

Sebastian Berg

9:58 p.m.

On So, 2016-09-11 at 11:19 -0400, Marten van Kerkwijk wrote:

...

There remains the option to just let subclasses deal with new ndarray features... Certainly, for `Quantity`, I'll quite happily do that. And if it alllows the ndarray code to remain simple and efficient, it is probably the best solution. -- Marten

Maybe, but I can't quite shake the feeling that we would see a lot of annoying bugs for subclasses that don't adept very quickely.

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

7 Sep 7 Sep

2:53 a.m.

On Di, 2016-09-06 at 10:10 -0700, Stephan Hoyer wrote:

...

On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk wrote:

...
p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods! +1

I don't maintain ndarray subclasses (I prefer composition), but I don't think it's too difficult to require implementing vindex and oindex properties from scratch.

Well, in some sense why I brought it up is masked arrays. They have quite a bit of code in `__getitem__` including doing an identical indexing operation to the mask. I suppose you can always solve it with such code: def __special_getitem__(self, index, method=None): # do magic if method is None: # (not sure if this gets passed the base class res = super()[index] elif method == "outer": res = super().oindex[index] # ... # more magic. def __getitem__(self, index): self.__special_getitem__(index) @property def oindex(self): # define class elsewhere I guess class _indexer(object): def __init__(self, arr): self.arr = arr def __getitem__(self, index) arr.__special_getitem__(index, method='oter') return _indexer(self) Though I am not 100% sure without testing, a superclass method that understands the `method` kwarg might work better. We can teach numpy to pass in that `method` to getitem so that you don't have to implement that `_indexer` class for the special attribute. I first started to do that for MaskedArrays, and while it is not hard, it seemed a bit tedious. If we move this to a method with a new name, a slight advantage would be that other oddities could be removed maybe. By now it seems to me that nobody really needs the extra information (i.e. preprocessing information of the indexing tuple)...? I thought it could be useful to know things about the result, but I suppose you can check most things (view vs. no view; field access; scalar access) also afterwards?

...

Side note: I would prefer the more verbose "legacy_index" to "lindex". We really want to discourage this one, and two new abbreviations are bad enough.

Sounds good to me.

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

6 Sep 6 Sep

12:19 p.m.

On Mo, 2016-09-05 at 18:19 -0500, Nathan Goldbaum wrote:

...

...
Hi Sebastian,

It would seem to me that any subclass has to keep up to date with new features in ndarray, and while I think ndarray has a responsibility not to break backward compatibility, I do not think it has to protect against new features possibly not working as expected in subclasses. In particular, I think it is overly complicated (and an unnecessary maintenance burden) to error out if a subclass has `__getitem__` overwritten, but not `oindex`.

For somewhat similar reasons, I'm not too keen on a new `__numpy_getitem__` method; I realise it might reduce complexity for some ndarray subclasses eventually, but it also is an additional maintenance burden. If you really think it is useful, I think it might be more helpful to define a new mixin class which provides a version of all indexing methods that just call `__numpy_getitem__` if that is provided by the class that uses the mixin. I would *not* put it in `ndarray` proper. I disagree that multiple inheritance (i.e. with your proposed mixin and ndarray) is something that numpy should enshrine in its API for subclasses. As the maintainer of an ndarray subclass, I'd much rather

On Monday, September 5, 2016, Marten van Kerkwijk wrote: prefer just to implement a new duner method that older numpy versions will ignore.

Hmm, OK, so that would be a + for the method solution even without the need of any of the extra capabilities that may be possible.

...

...
Indeed, the above might even be handier for subclasses, since they can choose, if they wish, to implement a similar mixin for older numpy versions, so that all the numpy version stuff can be moved to a single location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)

Overall, my sense would be to keep your PR to just implementing the various new index methods (which are great -- I still don't really like the names, but sadly don't have better suggestions...).

But it might be good if others pipe in here too, in particular those maintaining ndarray subclasses!

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Sebastian Berg

12:18 p.m.

On Mo, 2016-09-05 at 18:24 -0400, Marten van Kerkwijk wrote:

...

Hi Sebastian,

It would seem to me that any subclass has to keep up to date with new features in ndarray, and while I think ndarray has a responsibility not to break backward compatibility, I do not think it has to protect against new features possibly not working as expected in subclasses. In particular, I think it is overly complicated (and an unnecessary maintenance burden) to error out if a subclass has `__getitem__` overwritten, but not `oindex`.

It is not complicated implementation wise to check for `__getitem__` existence. However, I start to agree that a warning icould be the better option. It might work after all.

...

For somewhat similar reasons, I'm not too keen on a new `__numpy_getitem__` method; I realise it might reduce complexity for some ndarray subclasses eventually, but it also is an additional maintenance burden. If you really think it is useful, I think it might be more helpful to define a new mixin class which provides a version of all indexing methods that just call `__numpy_getitem__` if that is provided by the class that uses the mixin. I would *not* put it in `ndarray` proper.

Yes, that is maybe a simplier option (in the sense of maintainability), the other would have a bit of extra information available. If this extra information is unnecessary, a MixIn is probably a bit simpler.

...

Indeed, the above might even be handier for subclasses, since they can choose, if they wish, to implement a similar mixin for older numpy versions, so that all the numpy version stuff can be moved to a single location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)

You can always implement a mixin for older verions if you do all the logic yourself, but I would prefer not to duplicate that logic (Jaime wrote a python class that does it for normal arrays -- not sure if its 100% the same as I did, but you could probably use it in a subclass). So that a numpy provided mixin would not help with that supporting it in old numpy versions, I think.

...

Overall, my sense would be to keep your PR to just implementing the various new index methods (which are great -- I still don't really like the names, but sadly don't have better suggestions...).

Well... The thing is that we have to fix the subclasses within numpy as well (most importantly MaskedArrays). Of course you could delay things a bit, but in the end whatever we use internally could likely also be whatever subclasses might end up using.

...

But it might be good if others pipe in here too, in particular those maintaining ndarray subclasses!

Yeah :).

...

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

2777

Age (days ago)

2785

Last active (days ago)

List overview

Download

29 comments

6 participants

participants (6)

CJ Carey
Marten van Kerkwijk
Nathan Goldbaum
Robert Kern
Sebastian Berg
Stephan Hoyer

New Indexing Methods Revival #N (subclasses!)

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

tags

participants (6)