Numpy helper function for __getitem__?
![](https://secure.gravatar.com/avatar/2ae98b91fa25b0220f7f804541996ce4.jpg?s=120&d=mm&r=g)
Folks, My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for. I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable. I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with. in pseudo code: def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc. Is there such a function in numpy? I hope I have been clear enough... Thanks a lot! Fabien
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
I don't think NumPy has a function like this (at least, not exposed to Python), but I wrote one for xray, "expanded_indexer", that you are welcome to borrow: https://github.com/xray/xray/blob/v0.6.0/xray/core/indexing.py#L10 Stephan On Sunday, Aug 23, 2015 at 7:54 PM, Fabien <fabien.maussion@gmail.com>, wrote: Folks, My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for. I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable. I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with. in pseudo code: def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc. Is there such a function in numpy? I hope I have been clear enough... Thanks a lot! Fabien _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On So, 2015-08-23 at 11:08 -0700, Stephan Hoyer wrote:
Yeah, we have no such functionality. We do have a function which does all of this in C but it is somewhat more complex not exposed in any case. That function seems nice, though on its own not complete? It does not seem to handle `np.newaxis`/`None` or boolean indexing arrays well. One other thing which is not really important, we are deprecating the use of multiple ellipsis. Fabien, just to make sure you are aware. If you are overriding `__getitem__`, you should also implement `__setitem__`. NumPy does some magic if you do not. That will seem to make `__setitem__` work fine, but breaks down if you have advanced indexing involved (or if you return copies, though it spits warnings in that case). - Sebastian
![](https://secure.gravatar.com/avatar/2ae98b91fa25b0220f7f804541996ce4.jpg?s=120&d=mm&r=g)
On 08/24/2015 10:23 AM, Sebastian Berg wrote:
Hi Sebastian, thanks for the info. I am writing a duck NetCDF4 Variable object, and therefore I am not trying to override Numpy arrays. I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends. Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None)) With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1)) But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python). Cheers and thanks, Fabien
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
Indeed, the helper function I wrote for xray was not designed to handle None/np.newaxis or non-1d Boolean indexers, because those are not valid indexers for xray objects. I think it could be straightforwardly extended to handle None simply by not counting them towards the total number of dimensions. On Tue, Aug 25, 2015 at 8:41 AM, Fabien <fabien.maussion@gmail.com> wrote:
Glad you found it helpful! Python's slice object has the indices method which implements this logic, e.g., In [15]: s = slice(None, 10) In [16]: s.indices(100) Out[16]: (0, 10, 1) Cheers, Stephan
![](https://secure.gravatar.com/avatar/a3a8307ffcd780e908d042bcc97cd2f8.jpg?s=120&d=mm&r=g)
Biggus also has such a function: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L2878 It handles newaxis outside of that function in: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L537. Again, it only aims to deal with orthogonal array indexing, not numpy fancy indexing. I'd be surprised if Dask.array didn't have a similar function too. HTH On 26 August 2015 at 18:59, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sat, Aug 29, 2015 at 12:55 AM, Phil Elson <pelson.pub@gmail.com> wrote:
This all indicates to me that this would be a great thing to have as a stand alone project, or a utility shipped with numpy. It's been said that you really don't want to subclass ndarray, and should rather, wrap and delicate (or duck-type) -- maybe this is a good time to provide utilities to make it easier to do so. -Chris ----- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
I don't think NumPy has a function like this (at least, not exposed to Python), but I wrote one for xray, "expanded_indexer", that you are welcome to borrow: https://github.com/xray/xray/blob/v0.6.0/xray/core/indexing.py#L10 Stephan On Sunday, Aug 23, 2015 at 7:54 PM, Fabien <fabien.maussion@gmail.com>, wrote: Folks, My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for. I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable. I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with. in pseudo code: def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc. Is there such a function in numpy? I hope I have been clear enough... Thanks a lot! Fabien _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On So, 2015-08-23 at 11:08 -0700, Stephan Hoyer wrote:
Yeah, we have no such functionality. We do have a function which does all of this in C but it is somewhat more complex not exposed in any case. That function seems nice, though on its own not complete? It does not seem to handle `np.newaxis`/`None` or boolean indexing arrays well. One other thing which is not really important, we are deprecating the use of multiple ellipsis. Fabien, just to make sure you are aware. If you are overriding `__getitem__`, you should also implement `__setitem__`. NumPy does some magic if you do not. That will seem to make `__setitem__` work fine, but breaks down if you have advanced indexing involved (or if you return copies, though it spits warnings in that case). - Sebastian
![](https://secure.gravatar.com/avatar/2ae98b91fa25b0220f7f804541996ce4.jpg?s=120&d=mm&r=g)
On 08/24/2015 10:23 AM, Sebastian Berg wrote:
Hi Sebastian, thanks for the info. I am writing a duck NetCDF4 Variable object, and therefore I am not trying to override Numpy arrays. I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends. Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None)) With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1)) But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python). Cheers and thanks, Fabien
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
Indeed, the helper function I wrote for xray was not designed to handle None/np.newaxis or non-1d Boolean indexers, because those are not valid indexers for xray objects. I think it could be straightforwardly extended to handle None simply by not counting them towards the total number of dimensions. On Tue, Aug 25, 2015 at 8:41 AM, Fabien <fabien.maussion@gmail.com> wrote:
Glad you found it helpful! Python's slice object has the indices method which implements this logic, e.g., In [15]: s = slice(None, 10) In [16]: s.indices(100) Out[16]: (0, 10, 1) Cheers, Stephan
![](https://secure.gravatar.com/avatar/a3a8307ffcd780e908d042bcc97cd2f8.jpg?s=120&d=mm&r=g)
Biggus also has such a function: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L2878 It handles newaxis outside of that function in: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L537. Again, it only aims to deal with orthogonal array indexing, not numpy fancy indexing. I'd be surprised if Dask.array didn't have a similar function too. HTH On 26 August 2015 at 18:59, Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sat, Aug 29, 2015 at 12:55 AM, Phil Elson <pelson.pub@gmail.com> wrote:
This all indicates to me that this would be a great thing to have as a stand alone project, or a utility shipped with numpy. It's been said that you really don't want to subclass ndarray, and should rather, wrap and delicate (or duck-type) -- maybe this is a good time to provide utilities to make it easier to do so. -Chris ----- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (5)
-
Chris Barker
-
Fabien
-
Phil Elson
-
Sebastian Berg
-
Stephan Hoyer