Mailman 3 Bug in np.nonzero / Should index returning functions return ndarray subclasses? - NumPy-Discussion

newer
[JOB] Work full time on Project...

Bug in np.nonzero / Should index returning functions return ndarray subclasses?

older
Proposed deprecations for 1.10:...

Jaime Fernández del Río

9 May 2015 9 May '15

5:48 p.m.

There is a reported bug (issue #5837 https://github.com/numpy/numpy/issues/5837) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output:

...

...
...
class C(np.ndarray): pass ... a = np.arange(6).view(C) b = np.arange(6).reshape(2, 3).view(C) anz = a.nonzero() bnz = b.nonzero()

...

...
...
type(anz[0]) anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False anz[0].base

...

...
...
type(bnz[0]) bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]])

The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable. I have a branch that attempts to fix this by making both 1-D and n-D arrays: 1. return a view, never the base array, 2. return an ndarray, never a subclass, and 3. return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions. Since we are changing the returns of a few other functions in 1.10 (diagonal, diag, ravel), it may be a good moment to revisit the behavior for these other functions. Any thoughts? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

Attachments:

attachment.htm (text/html — 4.1 KB)

Show replies by date

Nathaniel Smith

9 May 9 May

6:42 p.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

On May 9, 2015 10:48 AM, "Jaime Fernández del Río" wrote:

...

There is a reported bug (issue #5837) regarding different returns from

np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output:

...

...
...
...
class C(np.ndarray): pass ... a = np.arange(6).view(C) b = np.arange(6).reshape(2, 3).view(C) anz = a.nonzero() bnz = b.nonzero()

...
...
...
type(anz[0]) anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False anz[0].base

...
...
...
type(bnz[0]) bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]])

The original bug report was only concerned with the non-writeability of

higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable.

...

I have a branch that attempts to fix this by making both 1-D and n-D

arrays:

...

return a view, never the base array,

This doesn't matter, does it? "View" isn't a thing, only "view of" is meaningful. And in this case, none of the returned arrays share any memory with any other arrays that the user has access to... so whether they were created as a view or not should be an implementation detail that's transparent to the user?

...

return an ndarray, never a subclass, and return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions.

I also can't see any logical reason why the return type of these functions has anything to do with the type of the inputs. You can index me with my phone number but my phone number is not a person. OTOH logic and ndarray subclassing don't have much to do with each other; the practical effect is probably more important. Looking at the subclasses I know about (masked arrays, np.matrix, and astropy quantities), though, I also can't see much benefit in copying the subclass of the input, and the fact that we were never consistent about this suggests that people probably aren't depending on it too much. So in summary my feeling is: +1 to making then writable, no objection to the view thing (though I don't see how it matters), and provisional +1 to consistently returning ndarray (to be revised if the people who use the subclassing functionality disagree). -n

Benjamin Root

7:53 p.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

Absolutely, it should be writable. As for subclassing, that might be messy. Consider the following: inds = np.where(data > 5) In that case, I'd expect a normal, bog-standard ndarray because that is what you use for indexing (although pandas might have a good argument for having it return one of their special indexing types if "data" was a pandas array...). Next: foobar = np.where(data > 5, 1, 2) Again, I'd expect a normal, bog-standard ndarray because the scalar elements are very simple. This question gets very complicated when considering array arguments. Consider: merged_data = np.where(data > 5, data, data2) So, what should "merged_data" be? If both "data" and "data2" are the same types, then it would be reasonable to return the same type, if possible. But what if they aren't the same? Maybe use array_priority to determine the return type? Or, perhaps it does make sense to say "sod it all" and always return an ndarray? I don't know the answer. I do find it interesting that the result from a multi-dimensional array is not writable. I don't know why I have never encountered that. Ben Root On Sat, May 9, 2015 at 2:42 PM, Nathaniel Smith wrote:

...

On May 9, 2015 10:48 AM, "Jaime Fernández del Río" wrote:

...
There is a reported bug (issue #5837) regarding different returns from

np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output:

...
...
...
...
class C(np.ndarray): pass ... a = np.arange(6).view(C) b = np.arange(6).reshape(2, 3).view(C) anz = a.nonzero() bnz = b.nonzero()

...
...
...
type(anz[0]) anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False anz[0].base

...
...
...
type(bnz[0]) bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]])

The original bug report was only concerned with the non-writeability of

higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable.

...
I have a branch that attempts to fix this by making both 1-D and n-D

arrays:

...
return a view, never the base array,

This doesn't matter, does it? "View" isn't a thing, only "view of" is meaningful. And in this case, none of the returned arrays share any memory with any other arrays that the user has access to... so whether they were created as a view or not should be an implementation detail that's transparent to the user?

...
return an ndarray, never a subclass, and return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions.

I also can't see any logical reason why the return type of these functions has anything to do with the type of the inputs. You can index me with my phone number but my phone number is not a person. OTOH logic and ndarray subclassing don't have much to do with each other; the practical effect is probably more important. Looking at the subclasses I know about (masked arrays, np.matrix, and astropy quantities), though, I also can't see much benefit in copying the subclass of the input, and the fact that we were never consistent about this suggests that people probably aren't depending on it too much.

So in summary my feeling is: +1 to making then writable, no objection to the view thing (though I don't see how it matters), and provisional +1 to consistently returning ndarray (to be revised if the people who use the subclassing functionality disagree).

-n

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

8:03 p.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

On May 9, 2015 12:54 PM, "Benjamin Root" wrote:

...

Absolutely, it should be writable. As for subclassing, that might be

messy. Consider the following:

...

inds = np.where(data > 5)

In that case, I'd expect a normal, bog-standard ndarray because that is

what you use for indexing (although pandas might have a good argument for having it return one of their special indexing types if "data" was a pandas array...). Pandas doesn't subclass ndarray (anymore), so they're irrelevant to this particular discussion :-). Of course they're an argument for having a cleaner more general way of allowing non-ndarray array-like objects, but the legacy subclassing system will never be that.

...

Next:

foobar = np.where(data > 5, 1, 2)

Again, I'd expect a normal, bog-standard ndarray because the scalar elements are very simple. This question gets very complicated when considering array arguments. Consider:

merged_data = np.where(data > 5, data, data2)

So, what should "merged_data" be? If both "data" and "data2" are the same types, then it would be reasonable to return the same type, if possible. But what if they aren't the same? Maybe use array_priority to determine the return type? Or, perhaps it does make sense to say "sod it all" and always return an ndarray?

Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems. -n

Benjamin Root

8:27 p.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith wrote:

...

Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems.

Because my train of thought started at np.nonzero(), which I have always just mentally mapped to np.where(), and then... squirrel! Indeed, np.where() has no bearing here. Ben Root

Nathaniel Smith

8:56 p.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

On Sat, May 9, 2015 at 1:27 PM, Benjamin Root wrote:

...

On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith wrote:

...
Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems.

Because my train of thought started at np.nonzero(), which I have always just mentally mapped to np.where(), and then... squirrel!

Indeed, np.where() has no bearing here.

Ah, gotcha :-). There is an argument that we should try to reduce this confusion by nudging people to use np.nonzero() consistently instead of np.where(), via the documentation and/or a warning message... -- Nathaniel J. Smith -- http://vorpus.org

Stephan Hoyer

10 May 10 May

1:53 a.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

With regards to np.where -- shouldn't where be a ufunc, so subclasses or other array-likes can be control its behavior with __numpy_ufunc__? As for the other indexing functions, I don't have a strong opinion about how they should handle subclasses. But it is certainly tricky to attempt to handle handle arbitrary subclasses. I would agree that the least error prone thing to do is usually to return base ndarrays. Better to force subclasses to override methods explicitly.

Marten van Kerkwijk

12 May 12 May

3:49 p.m.

New subject: Bug in np.nonzero / Should index returning functions return ndarray subclasses?

Agreed that indexing functions should return bare `ndarray`. Note that in Jaime's PR one can override it anyway by defining __nonzero__. -- Marten On Sat, May 9, 2015 at 9:53 PM, Stephan Hoyer wrote:

...

With regards to np.where -- shouldn't where be a ufunc, so subclasses or other array-likes can be control its behavior with __numpy_ufunc__?

As for the other indexing functions, I don't have a strong opinion about how they should handle subclasses. But it is certainly tricky to attempt to handle handle arbitrary subclasses. I would agree that the least error prone thing to do is usually to return base ndarrays. Better to force subclasses to override methods explicitly.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

3271

Age (days ago)

3274

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

Benjamin Root
Jaime Fernández del Río
Marten van Kerkwijk
Nathaniel Smith
Stephan Hoyer

Bug in np.nonzero / Should index returning functions return ndarray subclasses?

Jaime Fernández del Río

Nathaniel Smith

Benjamin Root

Nathaniel Smith

Benjamin Root

Nathaniel Smith

Stephan Hoyer

Marten van Kerkwijk

tags

participants (5)