[Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

Nathaniel Smith njs at pobox.com
Sat May 9 14:42:50 EDT 2015

On May 9, 2015 10:48 AM, "Jaime Fernández del Río" <jaime.frio at gmail.com>
> There is a reported bug (issue #5837) regarding different returns from
np.nonzero with 1-D vs higher dimensional arrays. A full summary of the
differences can be seen from the following output:
> >>> class C(np.ndarray): pass
> ...
> >>> a = np.arange(6).view(C)
> >>> b = np.arange(6).reshape(2, 3).view(C)
> >>> anz = a.nonzero()
> >>> bnz = b.nonzero()
> >>> type(anz[0])
> <type 'numpy.ndarray'>
> >>> anz[0].flags
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
> >>> anz[0].base
> >>> type(bnz[0])
> <class '__main__.C'>
> >>> bnz[0].flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : False
>   OWNDATA : False
>   WRITEABLE : False
>   ALIGNED : True
> >>> bnz[0].base
> array([[0, 1],
>        [0, 2],
>        [1, 0],
>        [1, 1],
>        [1, 2]])
> The original bug report was only concerned with the non-writeability of
higher dimensional array returns, but there are more differences: 1-D
always returns an ndarray that owns its memory and is writeable, but higher
dimensional arrays return views, of the type of the original array, that
are non-writeable.
> I have a branch that attempts to fix this by making both 1-D and n-D
> return a view, never the base array,

This doesn't matter, does it? "View" isn't a thing, only "view of" is
meaningful. And in this case, none of the returned arrays share any memory
with any other arrays that the user has access to... so whether they were
created as a view or not should be an implementation detail that's
transparent to the user?

> return an ndarray, never a subclass, and
> return a writeable view.
> I guess the most controversial choice is #2, and in fact making that
change breaks a few tests. I nevertheless think that all of the index
returning functions (nonzero, argsort, argmin, argmax, argpartition) should
always return a bare ndarray, not a subclass. I'd be happy to be corrected,
but I can't think of any situation in which preserving the subclass would
be needed for these functions.

I also can't see any logical reason why the return type of these functions
has anything to do with the type of the inputs. You can index me with my
phone number but my phone number is not a person. OTOH logic and ndarray
subclassing don't have much to do with each other; the practical effect is
probably more important. Looking at the subclasses I know about (masked
arrays, np.matrix, and astropy quantities), though, I also can't see much
benefit in copying the subclass of the input, and the fact that we were
never consistent about this suggests that people probably aren't depending
on it too much.

So in summary my feeling is: +1 to making then writable, no objection to
the view thing (though I don't see how it matters), and provisional +1 to
consistently returning ndarray (to be revised if the people who use the
subclassing functionality disagree).

