np.nonzero behavior with multidimensional arrays
This was raised in SO today: http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-onl... np.nonzero (and np.where for boolean arrays) behave differently for 1-D and higher dimensional arrays: In the first case, a tuple with a single behaved base ndarray is returned:
a = np.ma.array(range(6)) np.where(a > 3) (array([4, 5]),) np.where(a > 3)[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In the second, a tuple with as many arrays as dimensions in the passed array is returned, but the arrays are not base ndarrays, but of the same subtype as was passed to the function. These arrays are also set as non-writeable:
np.where(a.reshape(2, 3) > 3) (masked_array(data = [1 1], mask = False, fill_value = 999999) , masked_array(data = [1 2], mask = False, fill_value = 999999) ) np.where(a.reshape(2, 3) > 3)[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False
I can't think of any reason that justifies this difference, and believe they should be made to return similar results. My feeling is that the proper behavior is the 1-D one, and that the behavior for multidimensional arrays should match it. Anyone can think of any reason that justifies the current behavior? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On 23.02.2015 08:52, Jaime Fernández del Río wrote:
This was raised in SO today:
http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-onl...
np.nonzero (and np.where for boolean arrays) behave differently for 1-D and higher dimensional arrays:
In the first case, a tuple with a single behaved base ndarray is returned:
In the second, a tuple with as many arrays as dimensions in the passed array is returned, but the arrays are not base ndarrays, but of the same subtype as was passed to the function. These arrays are also set as non-writeable:
The non-writeable looks like a bug too me, it should probably just use PyArray_FLAGS(self) instead of 0. We had a similar one with the new indexing, its easy to forget this. Concerning subtypes, I don't think there is a good reason to preserve them here and it should just return an ndarray. where with one argument returns a new object that indexes the input object so it is not really related anymore to what it indexes and there is no information that numpy could reasonably propagate. (where with three arguments make sense with subtypes and fixing that is on my todo list)
On Mon, Feb 23, 2015 at 12:12 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 23.02.2015 08:52, Jaime Fernández del Río wrote:
This was raised in SO today:
http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-onl...
np.nonzero (and np.where for boolean arrays) behave differently for 1-D and higher dimensional arrays:
In the first case, a tuple with a single behaved base ndarray is
returned:
In the second, a tuple with as many arrays as dimensions in the passed array is returned, but the arrays are not base ndarrays, but of the same subtype as was passed to the function. These arrays are also set as non-writeable:
The non-writeable looks like a bug too me, it should probably just use PyArray_FLAGS(self) instead of 0. We had a similar one with the new indexing, its easy to forget this.
Concerning subtypes, I don't think there is a good reason to preserve them here and it should just return an ndarray. where with one argument returns a new object that indexes the input object so it is not really related anymore to what it indexes and there is no information that numpy could reasonably propagate.
That was my thinking when I sent that message last night: add the PyArray_FLAGS argument, and pass the type of the return array rather than the input array when creating the views. I tried to put that in a PR, but it fails a number of tests, as the return of np.nonzero is specifically checked to return the subtype of the passed in array, both in matrixlib, as well as in core/test_regression.py, related to Trac #791: https://github.com/numpy/numpy/issues/1389 So it seems that 7 years ago they had a different view on this, perhaps Chuck remembers what the rationale was, but this seems like a weird requirement for index returning functions: nonzero, argmin/max, argsort, argpartition and the like. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Mon, Feb 23, 2015 at 2:29 PM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Mon, Feb 23, 2015 at 12:12 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 23.02.2015 08:52, Jaime Fernández del Río wrote:
This was raised in SO today:
http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-onl...
np.nonzero (and np.where for boolean arrays) behave differently for 1-D and higher dimensional arrays:
In the first case, a tuple with a single behaved base ndarray is
returned:
In the second, a tuple with as many arrays as dimensions in the passed array is returned, but the arrays are not base ndarrays, but of the same subtype as was passed to the function. These arrays are also set as non-writeable:
The non-writeable looks like a bug too me, it should probably just use PyArray_FLAGS(self) instead of 0. We had a similar one with the new indexing, its easy to forget this.
Concerning subtypes, I don't think there is a good reason to preserve them here and it should just return an ndarray. where with one argument returns a new object that indexes the input object so it is not really related anymore to what it indexes and there is no information that numpy could reasonably propagate.
That was my thinking when I sent that message last night: add the PyArray_FLAGS argument, and pass the type of the return array rather than the input array when creating the views.
I tried to put that in a PR, but it fails a number of tests, as the return of np.nonzero is specifically checked to return the subtype of the passed in array, both in matrixlib, as well as in core/test_regression.py, related to Trac #791:
https://github.com/numpy/numpy/issues/1389
So it seems that 7 years ago they had a different view on this, perhaps Chuck remembers what the rationale was, but this seems like a weird requirement for index returning functions: nonzero, argmin/max, argsort, argpartition and the like.
That would be, what, 2008? That was way long ago, back around 1.1, and before I was much involved. I don't know what the rational was at that time, but it may have been inherited from Numeric or Numarray, or just seemed like the right thing to do. Chuck
participants (3)
-
Charles R Harris
-
Jaime Fernández del Río
-
Julian Taylor