[Numpy-discussion] Get the index of a comparison of two lists

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Feb 11 09:59:13 EST 2011


On Fri, Feb 11, 2011 at 9:48 AM, Angus McMorland <amcmorl at gmail.com> wrote:
> On 11 February 2011 09:01, FRENK Andreas <Andreas.FRENK at 3ds.com> wrote:
>> Hi,
>>
>> I need to create a construct that returns the index of entries of the first
>> list, if values in the first and second list are equal.
>>
>> Take
>>
>> valA = [1,2,3,4,20,21,22,23,24]
>> valB = [1,2,3,4,  5,21,22,23]
>> The correct solution is: [0,1,2,3,5,6,7]
>>
>> A potential loop can be:
>> takeList=[]
>> for j,a in enumerate(valA):
>>     if a in valB:
>>         takeList.append(j)
>>
>> Please note, valA can have entries like [1,10000000,1000000001,…..], i.e. it
>> can be very sparse.
>> I also thought about using bincount, but due to the sparse nature the return
>> values from bincount would allocate too much memory.
>>
>> Any idea how to do it fast using numpy?
>
> This probably isn't optimal yet, but seems to perform better than your
> for loop for large array sizes, but is less good at very small sizes.
>
> In [11]: def test(a, b):
>   ....:     takeList = []
>   ....:     for j, A in enumerate(a):
>   ....:         if A in b:
>   ....:             takeList.append(j)
>   ....:     return takeList
>
> In [24]: a = np.random.randint(10, size=10)
> In [25]: b = np.random.randint(10, size=10)
>
> In [26]: %timeit test(a,b)
> 10000 loops, best of 3: 55.4 µs per loop
>
> In [27]: %timeit np.arange(a.size)[np.lib.setmember1d(a,b)]
> 10000 loops, best of 3: 92.9 µs per loop
>
> In [19]: a = np.random.randint(10000, size=10000)
> In [20]: b = np.random.randint(10000, size=10000)
>
> In [21]: %timeit np.arange(a.size)[np.lib.setmember1d(a,b)]
> 100 loops, best of 3: 7.99 ms per loop
>
> In [22]: %timeit test(a,b)
> 10 loops, best of 3: 787 ms per loop
>
> Hope that's useful,
>
> Angus
> --
> AJC McMorland
> Post-doctoral research fellow
> Neurobiology, University of Pittsburgh
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

maybe this does what you want

>>> help(np.in1d)

>>> valA = [1,2,3,4,20,21,22,23,24]
>>> valB = [1,2,3,4,  5,21,22,23]
>>> np.in1d(valA, valB)
array([ True,  True,  True,  True, False,  True,  True,  True, False],
dtype=bool)

one of the next two should be correct (if you change the test case)

>>> np.nonzero(np.in1d(valA, valB))[0]
array([0, 1, 2, 3, 5, 6, 7])
>>> np.nonzero(np.in1d(valB, valA))[0]
array([0, 1, 2, 3, 5, 6, 7])

Josef



More information about the NumPy-Discussion mailing list