[Numpy-discussion] Allow __getitem__ to support custom objects
Sebastian Berg
sebastian at sipsolutions.net
Thu Oct 29 20:03:52 EDT 2020
On Tue, 2020-10-27 at 17:15 -0600, Aaron Meurer wrote:
> For ndindex (https://quansight.github.io/ndindex/), the biggest issue
> with the API is that to use an ndindex object to actually index an
> array, you have to use a[idx.raw] instead of a[idx]. This is because
> for NumPy arrays, you cannot allow custom objects to be indices. The
> exception is objects that define __index__, but this only works for
> integer indices. If __index__ returns anything other than an integer,
> you get an IndexError. This is annoying because it's easy to forget
> to
> do this when working with the ndindex API, and the error message from
> NumPy isn't informative about what went wrong unless you know to
> expect it.
>
> I'd like to propose an API that would allow custom objects to define
> how they should be converted to a standard NumPy index, similar to
> __index__ but that supports all index types. I think there are two
> options here:
>
> - Allow __index__ to return any index type, not just integers. This
> is
> the simplest because it reuses an existing API, and __index__ is the
> best possible name for this API. However, I'm not sure, but this may
> actually conflict with the text of PEP 357
> (https://www.python.org/dev/peps/pep-0357/). Also, some other APIs
> use
> __index__ to check if something is an indexable integer, which
> wouldn't accept generic index. For example, elements of a slice can
> be
> any object that defines __index__.
>
Index converts to an integer (safely). There is an assumptions that
the integer is good for indexing, but I the name shouldn't be taken to
mean it is specific to indexing (even if that was the main motivation).
> - Add a new __numpy_index__ API that works like
>
> def __numpy_index__(self):
> return <tuple, integer, slice, newaxis, ellipsis, or integer or
> boolean array>
>
> In NumPy, __getitem__ and __setitem__ on ndarray would first check if
> the input index type is one of the known types as it currently does,
> then it would try __index__, and if neither of those fails, it would
> call __numpy_index__(index) and use that.
Do you anticipate just:
arr[index]
or also:
arr[index1, index2]
Would you expect pandas or array-like objects to support this as well?
If we only do `arr[index]` might subclassing tuple be sufficient? Do
you have any thought on how this might play out with a potential
`arr.oindex[...]`?
Adding either to NumPy is probably fairly straight forward, although I
prefer either not slow down every single indexing operation for an
extremely niche use-case (which is likely possible) or timing that it
is insignificant.
What might help me is understanding that `ndindex` itself better. Since
it seems like asking to add a protocol that may very well be used by
only this one project?
>
> Note: there is a more general way that NumPy arrays could allow
> __getitem__ to be defined on custom objects, which I am NOT
> proposing.
> Instead of an API that returns one of the current predefined index
> types (tuple, integer, slice, newaxis, ellipsis, or integer or
> boolean
> array), there could instead be an API that takes the array as input
> and returns another array (or view) as an output. This would allow an
> object to define itself as an index in arbitrary ways, even if such
> an
> index would not actually be possible via traditional indexing. There
> are definitely some interesting ideas that could be done with this,
> but this idea would be much more complicated, and isn't something
> that
> I need. Unless the community feels that a more general API like this
> would be preferred, I would suggest deferring something like it to a
> later discussion.
>
> What would be the best way to go about getting something like this
> implemented? Is it simple enough that we can just work out the
> details
> here and on a pull request, or should I write a NEP?
A short NEP may make sense, at least if this is supposed to be a
generic protocol for general array-likes, which I guess it would have
to be ready for.
Cheers,
Sebastian
>
> Aaron Meurer
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201029/20944d9a/attachment.sig>
More information about the NumPy-Discussion
mailing list