[Numpy-discussion] What should np.ndarray.contains do

Tue Feb 26 05:21:29 EST 2013

On Mon, 2013-02-25 at 16:33 +0000, Nathaniel Smith wrote:
> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on arrays, does
> > not return what the user would expect when in the operation `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
> 
> True, I did not expect that!
> 
<snip>

> The two approaches that I can see, and which generalize the behaviour
> of simple Python lists in natural ways, are:
> 
> a) the left argument is coerced to a scalar of the appropriate type,
> then we check if that value appears anywhere in the array (basically
> raveling the right argument).
> 

How did I misread that? I guess you mean element and never subarray
matching. Actually I am starting to think that is best. Subarray
matching may be useful, but would probably be better off inside its own
function.
That also might be best with object arrays, since it is difficult to
know if the user means a tuple as an element or a two element subarray,
unless you say "input is array-like", which is possible (or more
sensible) for a function.

That would mean just make the use cases that current give weird results
into errors. And maybe those errors hint to np.in1d and if numpy would
get it, some dedicated subarray matching function.

-- Sebastian

> b) for an array with shape (n1, n2, n3, ...), the left argument is
> treated as an array of shape (n2, n3, ...), and we check if that
> subarray (as a whole) appears anywhere in the array. Or in other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[i], A).
> 
> Question 1: are there any other sensible options that aren't on this list?
> 
> Question 2: if not, then which should we choose? (Or we could choose
> both, I suppose, depending on what the left argument looks like.)
> 
> Between these two options, I like (a) and don't like (b). The
> pretending-to-be-a-list-of-lists special case behaviour for
> multidimensional arrays is already weird and confusing, and besides,
> I'd expect equality comparison on arrays to use ==, not array_equals.
> So (b) feels pretty inconsistent with other numpy conventions to me.
> 
> -n
> 
> > I have opened an issue for it:
> > https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
> >
> >
> > Regards,
> >
> > Sebastian
> >
> > In [1]: a = np.array([0, 2])
> >
> > In [2]: b = np.arange(10).reshape(5,2)
> >
> > In [3]: b
> > Out[3]:
> > array([[0, 1],
> >        [2, 3],
> >        [4, 5],
> >        [6, 7],
> >        [8, 9]])
> >
> > In [4]: a in b
> > Out[4]: True
> >
> > In [5]: (b == a).any()
> > Out[5]: True
> >
> > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
> > Out[6]: False
> >
> > In [7]: a_2d = a[None,:]
> >
> > In [8]: a_2d in b # broadcast dimension means "any" -> True
> > Out[8]: True
> >
> > In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
> > Out[9]: True
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

[Numpy-discussion] What should np.ndarray.__contains__ do

[Numpy-discussion] What should np.ndarray.contains do