Numpy helper function for __getitem__?
Folks,
My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for.
I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable.
I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with.
in pseudo code:
def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc.
Is there such a function in numpy?
I hope I have been clear enough... Thanks a lot!
Fabien
I don't think NumPy has a function like this (at least, not exposed to Python), but I wrote one for xray, "expanded_indexer", that you are welcome to borrow: https://github.com/xray/xray/blob/v0.6.0/xray/core/indexing.py#L10
Stephan
On Sunday, Aug 23, 2015 at 7:54 PM, Fabien fabien.maussion@gmail.com, wrote: Folks,
My search engine was not able to help me on this one, possibly because I
don't know exactly *what* I am looking for.
I need to override __getitem__ for a class that wrapps a numpy array. I
know the dimensions of my array (which can be variable from instance to
instance), and I know what I want to do: for one preselected dimension,
I need to select another slice than requested by the user, do something
with the data, and return the variable.
I am looking for a function that helps me to "clean" the input of
__getitem__. There are so many possible cases, when the user uses [:] or
[..., 1:2] or [0, ..., :] and so forth. But all these cases have an
equivalent index array of len(ndimensions) with only valid slice()
objects in it. This array would be much easier for me to work with.
in pseudo code:
def __getitem__(self, item):
# clean input
item = np.clean_item(item, ndimensions=4)
# Ok now item is guaranteed to be of len 4
item[2] = slice()
# Continue
etc.
Is there such a function in numpy?
I hope I have been clear enough... Thanks a lot!
Fabien
_______________________________________________
NumPyDiscussion mailing list
NumPyDiscussion@scipy.org
On So, 20150823 at 11:08 0700, Stephan Hoyer wrote:
I don't think NumPy has a function like this (at least, not exposed to Python), but I wrote one for xray, "expanded_indexer", that you are welcome to borrow: https://github.com/xray/xray/blob/v0.6.0/xray/core/indexing.py#L10
Yeah, we have no such functionality. We do have a function which does all of this in C but it is somewhat more complex not exposed in any case. That function seems nice, though on its own not complete? It does not seem to handle `np.newaxis`/`None` or boolean indexing arrays well. One other thing which is not really important, we are deprecating the use of multiple ellipsis.
Fabien, just to make sure you are aware. If you are overriding `__getitem__`, you should also implement `__setitem__`. NumPy does some magic if you do not. That will seem to make `__setitem__` work fine, but breaks down if you have advanced indexing involved (or if you return copies, though it spits warnings in that case).
 Sebastian
Stephan
On Sunday, Aug 23, 2015 at 7:54 PM, Fabien fabien.maussion@gmail.com, wrote: Folks,
My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for. I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable. I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with. in pseudo code: def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc. Is there such a function in numpy? I hope I have been clear enough... Thanks a lot! Fabien _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On 08/24/2015 10:23 AM, Sebastian Berg wrote:
Fabien, just to make sure you are aware. If you are overriding `__getitem__`, you should also implement `__setitem__`. NumPy does some magic if you do not. That will seem to make `__setitem__` work fine, but breaks down if you have advanced indexing involved (or if you return copies, though it spits warnings in that case).
Hi Sebastian,
thanks for the info. I am writing a duck NetCDF4 Variable object, and therefore I am not trying to override Numpy arrays.
I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends.
Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None))
With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1))
But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python).
Cheers and thanks,
Fabien
Indeed, the helper function I wrote for xray was not designed to handle None/np.newaxis or non1d Boolean indexers, because those are not valid indexers for xray objects. I think it could be straightforwardly extended to handle None simply by not counting them towards the total number of dimensions.
On Tue, Aug 25, 2015 at 8:41 AM, Fabien fabien.maussion@gmail.com wrote:
I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends.
Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None))
With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1))
But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python).
Glad you found it helpful!
Python's slice object has the indices method which implements this logic, e.g.,
In [15]: s = slice(None, 10)
In [16]: s.indices(100) Out[16]: (0, 10, 1)
Cheers, Stephan
Biggus also has such a function: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L2878 It handles newaxis outside of that function in: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L537.
Again, it only aims to deal with orthogonal array indexing, not numpy fancy indexing.
I'd be surprised if Dask.array didn't have a similar function too.
HTH
On 26 August 2015 at 18:59, Stephan Hoyer shoyer@gmail.com wrote:
Indeed, the helper function I wrote for xray was not designed to handle None/np.newaxis or non1d Boolean indexers, because those are not valid indexers for xray objects. I think it could be straightforwardly extended to handle None simply by not counting them towards the total number of dimensions.
On Tue, Aug 25, 2015 at 8:41 AM, Fabien fabien.maussion@gmail.com wrote:
I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends.
Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None))
With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1))
But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python).
Glad you found it helpful!
Python's slice object has the indices method which implements this logic, e.g.,
In [15]: s = slice(None, 10)
In [16]: s.indices(100) Out[16]: (0, 10, 1)
Cheers, Stephan
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Sat, Aug 29, 2015 at 12:55 AM, Phil Elson pelson.pub@gmail.com wrote:
Biggus also has such a function: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L2878 It handles newaxis outside of that function in: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L537.
Again, it only aims to deal with orthogonal array indexing, not numpy fancy indexing.
I'd be surprised if Dask.array didn't have a similar function too.
This all indicates to me that this would be a great thing to have as a stand alone project, or a utility shipped with numpy.
It's been said that you really don't want to subclass ndarray, and should rather, wrap and delicate (or ducktype)  maybe this is a good time to provide utilities to make it easier to do so.
Chris

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov
participants (5)

Chris Barker

Fabien

Phil Elson

Sebastian Berg

Stephan Hoyer