[Numpy-discussion] Get the index of a comparison of two lists

Fri Feb 11 09:48:06 EST 2011

On 11 February 2011 09:01, FRENK Andreas <Andreas.FRENK at 3ds.com> wrote:
> Hi,
>
> I need to create a construct that returns the index of entries of the first
> list, if values in the first and second list are equal.
>
> Take
>
> valA = [1,2,3,4,20,21,22,23,24]
> valB = [1,2,3,4,  5,21,22,23]
> The correct solution is: [0,1,2,3,5,6,7]
>
> A potential loop can be:
> takeList=[]
> for j,a in enumerate(valA):
>     if a in valB:
>         takeList.append(j)
>
> Please note, valA can have entries like [1,10000000,1000000001,…..], i.e. it
> can be very sparse.
> I also thought about using bincount, but due to the sparse nature the return
> values from bincount would allocate too much memory.
>
> Any idea how to do it fast using numpy?

This probably isn't optimal yet, but seems to perform better than your
for loop for large array sizes, but is less good at very small sizes.

In [11]: def test(a, b):
   ....:     takeList = []
   ....:     for j, A in enumerate(a):
   ....:         if A in b:
   ....:             takeList.append(j)
   ....:     return takeList

In [24]: a = np.random.randint(10, size=10)
In [25]: b = np.random.randint(10, size=10)

In [26]: %timeit test(a,b)
10000 loops, best of 3: 55.4 µs per loop

In [27]: %timeit np.arange(a.size)[np.lib.setmember1d(a,b)]
10000 loops, best of 3: 92.9 µs per loop

In [19]: a = np.random.randint(10000, size=10000)
In [20]: b = np.random.randint(10000, size=10000)

In [21]: %timeit np.arange(a.size)[np.lib.setmember1d(a,b)]
100 loops, best of 3: 7.99 ms per loop

In [22]: %timeit test(a,b)
10 loops, best of 3: 787 ms per loop

Hope that's useful,

Angus
-- 
AJC McMorland
Post-doctoral research fellow
Neurobiology, University of Pittsburgh