[Numpy-discussion] indices of values contained in a list

Keith Goodman kwgoodman at gmail.com
Sat Dec 12 11:16:42 EST 2009


2009/12/12 Ernest Adrogué <eadrogue at gmx.net>:
> Hi,
>
> Suppose I have a flat array, and I want to know the
> indices corresponding to values contained in a list
> of arbitrary lenght.
>
> Intuitively I would have done:
>
> a = np.array([1,2,3,4])
> np.nonzero(a in (0,2,4))
>
> However the "in" operator doesn't work element-wise,
> instead it compares the whole array with each member
> of the list.
>
> I have found that this does the trick:
>
> b = (0,2,4)
> reduce(np.logical_or, [a == i for i in b])
>
> then pass the result to np.nonzero to get the indices,
> but, is there a numpy function that can handle this
> situation?

If a and b are as short as in your example, which I doubt, here's a faster way:

>> timeit np.nonzero(reduce(np.logical_or, [a == i for i in b]))
100000 loops, best of 3: 14 µs per loop
>> timeit [i for i, z in enumerate(a) if z in b]
100000 loops, best of 3: 3.43 µs per loop

Looping over a instead of b is faster if len(a) is much less than len(b):

>> a = np.random.randint(0,100,10000)
>> b = tuple(set(a[:50].tolist()))
>> len(b)
   41
>> timeit np.nonzero(reduce(np.logical_or, [a == i for i in b]))
100 loops, best of 3: 2.65 ms per loop
>> timeit [i for i, z in enumerate(a) if z in b]
10 loops, best of 3: 37.7 ms per loop

>> b, a = a, b
>> timeit np.nonzero(reduce(np.logical_or, [a == i for i in b]))
10 loops, best of 3: 165 ms per loop
>> timeit [i for i, z in enumerate(a) if z in b]
1000 loops, best of 3: 597 µs per loop



More information about the NumPy-Discussion mailing list