[Numpy-discussion] what does "in" do with numpy arrays?

Robert Kern robert.kern at gmail.com
Tue May 31 20:31:21 EDT 2011


On Tue, May 31, 2011 at 11:25, Christopher Barker <Chris.Barker at noaa.gov> wrote:
> Hi folks,
>
> I've re-titled this thread, as it's about a new question, now:
>
> What does:
>
> something in a_numpy_array
>
> mean? i.e. how has __contains__ been defined?
>
> A couple of us have played with it, and can't make sense of it:
>
>> In [24]: a
>> Out[24]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>
>> In [25]: 3 in a
>> Out[25]: True
>>
>> So the simple case works just like a list. But what If I look for an array in another array?
>
>> In [26]: b
>> Out[26]: array([3, 6, 4])
>>
>> In [27]: b in a
>> Out[27]: False
>>
>> OK, so the full b array is not in a, and it doesn't "vectorize" it,
>> either. But:
>>
>> In [29]: a
>> Out[29]:
>> array([[ 0,  1,  2],
>>          [ 3,  4,  5],
>>          [ 6,  7,  8],
>>          [ 9, 10, 11]])
>>
>> In [30]: b in a
>> Out[30]: True
>>
>> HUH?
>>
>> I'm not sure by what definition we would say that b is contained in a.
>>
>> but maybe..
>>
>> In [41]: b
>> Out[41]: array([  4,   2, 345])
>>
>> In [42]: b in a
>> Out[42]: False
>>
>> so it's "are all of the elements in b in a somewhere?" but only for 2-d
>> arrays?

It dates back to Numeric's semantics for bool(some_array), which would
be True if any of the elements were nonzero. Just like any other
iterable container in Python, `x in y` will essentially do

  for row in y:
    if x == row:
      return True
  return False

Iterate along the first axis of y and compare by boolean equality. In
Numeric/numpy's case, this comparison is broadcasted. So that's why
[3,6,4] works, because there is one row where 3 is in the first
column. [4,2,345] doesn't work because the 4 and the 2 are not in
those columns.

Probably, this should be considered a mistake during the transition to
numpy's semantics of having bool(some_array) raise an exception.
`scalar in array` should probably work as-is for an ND array, but
there are several different possible semantics for `array in array`
that should be explicitly spelled out, much like bool(some_array).

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list