[Numpy-discussion] What should np.ndarray.contains do

Mon Feb 25 12:09:48 EST 2013

On Mon, 2013-02-25 at 16:33 +0000, Nathaniel Smith wrote:
> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on arrays, does
> > not return what the user would expect when in the operation `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
> 
> True, I did not expect that!
> 

<snip>

> The two approaches that I can see, and which generalize the behaviour
> of simple Python lists in natural ways, are:
> 
> a) the left argument is coerced to a scalar of the appropriate type,
> then we check if that value appears anywhere in the array (basically
> raveling the right argument).
> 
> b) for an array with shape (n1, n2, n3, ...), the left argument is
> treated as an array of shape (n2, n3, ...), and we check if that
> subarray (as a whole) appears anywhere in the array. Or in other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[i], A).
> 
> Question 1: are there any other sensible options that aren't on this list?
> 
> Question 2: if not, then which should we choose? (Or we could choose
> both, I suppose, depending on what the left argument looks like.)
> 
> Between these two options, I like (a) and don't like (b). The
> pretending-to-be-a-list-of-lists special case behaviour for
> multidimensional arrays is already weird and confusing, and besides,
> I'd expect equality comparison on arrays to use ==, not array_equals.
> So (b) feels pretty inconsistent with other numpy conventions to me.
> 

I agree with rejecting (b). (a) seems a good way to think about the
problem and I don't see other sensible options. The question is, lets
say you have an array b = [[0, 1], [2, 3]] and a = [[0, 1]] since they
are both 2d, should b be interpreted as two 2d elements? Another way of
seeing this would be ignoring one sized dimensions in `a` for the sake
of defining its "element". This would allow:

In [1]: b = np.arange(10).reshape(5,2)

In [2]: b
Out[2]: 
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [3]: a = np.array([[0, 1]]) # extra dimensions at the start

In [4]: a in b
Out[4]: True

# But would also allow transpose, since now the last axes is a dummy:
In [5]: a.T in b.T
Out[5]: True

Those two examples could also be a shape mismatch error, I tend to think
they are reasonable enough to work, but then the user could just
reshape/transpose to achieve the same.

I also wondered about b having i.e. b.shape = (5,1) with a.shape = (1,2)
being sensible enough to be not an error, but this element thinking is a
good reasoning for rejecting it IMO.

Maybe this is clearer,

Sebastian

> -n
> 
> > I have opened an issue for it:
> > https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
> >
> >
> > Regards,
> >
> > Sebastian
> >
> > In [1]: a = np.array([0, 2])
> >
> > In [2]: b = np.arange(10).reshape(5,2)
> >
> > In [3]: b
> > Out[3]:
> > array([[0, 1],
> >        [2, 3],
> >        [4, 5],
> >        [6, 7],
> >        [8, 9]])
> >
> > In [4]: a in b
> > Out[4]: True
> >
> > In [5]: (b == a).any()
> > Out[5]: True
> >
> > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
> > Out[6]: False
> >
> > In [7]: a_2d = a[None,:]
> >
> > In [8]: a_2d in b # broadcast dimension means "any" -> True
> > Out[8]: True
> >
> > In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
> > Out[9]: True
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

[Numpy-discussion] What should np.ndarray.__contains__ do

[Numpy-discussion] What should np.ndarray.contains do