[Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

Benjamin Root ben.root at ou.edu
Sat May 9 15:53:31 EDT 2015


Absolutely, it should be writable. As for subclassing, that might be messy.
Consider the following:

inds = np.where(data > 5)

In that case, I'd expect a normal, bog-standard ndarray because that is
what you use for indexing (although pandas might have a good argument for
having it return one of their special indexing types if "data" was a pandas
array...). Next:

foobar = np.where(data > 5, 1, 2)

Again, I'd expect a normal, bog-standard ndarray because the scalar
elements are very simple. This question gets very complicated when
considering array arguments. Consider:

merged_data = np.where(data > 5, data, data2)

So, what should "merged_data" be? If both "data" and "data2" are the same
types, then it would be reasonable to return the same type, if possible.
But what if they aren't the same? Maybe use array_priority to determine the
return type? Or, perhaps it does make sense to say "sod it all" and always
return an ndarray?

I don't know the answer. I do find it interesting that the result from a
multi-dimensional array is not writable. I don't know why I have never
encountered that.


Ben Root


On Sat, May 9, 2015 at 2:42 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On May 9, 2015 10:48 AM, "Jaime Fernández del Río" <jaime.frio at gmail.com>
> wrote:
> >
> > There is a reported bug (issue #5837) regarding different returns from
> np.nonzero with 1-D vs higher dimensional arrays. A full summary of the
> differences can be seen from the following output:
> >
> > >>> class C(np.ndarray): pass
> > ...
> > >>> a = np.arange(6).view(C)
> > >>> b = np.arange(6).reshape(2, 3).view(C)
> > >>> anz = a.nonzero()
> > >>> bnz = b.nonzero()
> >
> > >>> type(anz[0])
> > <type 'numpy.ndarray'>
> > >>> anz[0].flags
> >   C_CONTIGUOUS : True
> >   F_CONTIGUOUS : True
> >   OWNDATA : True
> >   WRITEABLE : True
> >   ALIGNED : True
> >   UPDATEIFCOPY : False
> > >>> anz[0].base
> >
> > >>> type(bnz[0])
> > <class '__main__.C'>
> > >>> bnz[0].flags
> >   C_CONTIGUOUS : False
> >   F_CONTIGUOUS : False
> >   OWNDATA : False
> >   WRITEABLE : False
> >   ALIGNED : True
> >   UPDATEIFCOPY : False
> > >>> bnz[0].base
> > array([[0, 1],
> >        [0, 2],
> >        [1, 0],
> >        [1, 1],
> >        [1, 2]])
> >
> > The original bug report was only concerned with the non-writeability of
> higher dimensional array returns, but there are more differences: 1-D
> always returns an ndarray that owns its memory and is writeable, but higher
> dimensional arrays return views, of the type of the original array, that
> are non-writeable.
> >
> > I have a branch that attempts to fix this by making both 1-D and n-D
> arrays:
> > return a view, never the base array,
>
> This doesn't matter, does it? "View" isn't a thing, only "view of" is
> meaningful. And in this case, none of the returned arrays share any memory
> with any other arrays that the user has access to... so whether they were
> created as a view or not should be an implementation detail that's
> transparent to the user?
>
> > return an ndarray, never a subclass, and
> > return a writeable view.
> > I guess the most controversial choice is #2, and in fact making that
> change breaks a few tests. I nevertheless think that all of the index
> returning functions (nonzero, argsort, argmin, argmax, argpartition) should
> always return a bare ndarray, not a subclass. I'd be happy to be corrected,
> but I can't think of any situation in which preserving the subclass would
> be needed for these functions.
>
> I also can't see any logical reason why the return type of these functions
> has anything to do with the type of the inputs. You can index me with my
> phone number but my phone number is not a person. OTOH logic and ndarray
> subclassing don't have much to do with each other; the practical effect is
> probably more important. Looking at the subclasses I know about (masked
> arrays, np.matrix, and astropy quantities), though, I also can't see much
> benefit in copying the subclass of the input, and the fact that we were
> never consistent about this suggests that people probably aren't depending
> on it too much.
>
> So in summary my feeling is: +1 to making then writable, no objection to
> the view thing (though I don't see how it matters), and provisional +1 to
> consistently returning ndarray (to be revised if the people who use the
> subclassing functionality disagree).
>
> -n
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150509/bfc9d5a8/attachment.html>


More information about the NumPy-Discussion mailing list