Hi all, I have 2 arrays  A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the ith value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to 1. Now I'm dealing with it using which: ... import numpy as np for i, row in enumerate(B): try: C[i] = np.which(np.all(A == row, axis = 1))[0][0] except (IndexError, ): C[i] = 1 ... but that's very slow (it consumes 70% of cpu time needed by the whole program). I guess that it's because of a slow pythonic loop, but I just can't get how to get rid of it. Any suggestions would be appreciated. Thanks in advance, Andrey  Researcher, General and theoretical physics dept., South Ural State University 454080, Pr. Lenina, 76, Chelyabinsk, Russia Tel: +7 351 2654713 andrey@physics.susu.ac.ru
On 10/10/11 09:53, Andrey N. Sobolev wrote:
I have 2 arrays  A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the ith value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to 1.
May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?)  Bob Dowling
В Mon, 10 Oct 2011 10:03:48 +0100
Bob Dowling
On 10/10/11 09:53, Andrey N. Sobolev wrote:
I have 2 arrays  A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the ith value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to 1.
May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?)
Yes, rows in A are unique and sorted. One more typo found  instead of np.which in the previous email it has to be np.where, I don't know what I thought about :) Thanks in advance! Andrey
2011/10/10 Andrey N. Sobolev
В Mon, 10 Oct 2011 10:03:48 +0100 Bob Dowling
пишет: On 10/10/11 09:53, Andrey N. Sobolev wrote:
I have 2 arrays  A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the ith value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to 1.
May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?)
Yes, rows in A are unique and sorted. One more typo found  instead of np.which in the previous email it has to be np.where, I don't know what I thought about :)
Thanks in advance! Andrey
The following doesn't use numpy but seems to be about 20x faster: A_rows = {} for i, row in enumerate(A): A_rows[tuple(row)] = i for i, row in enumerate(B): C[i] = A_rows.get(tuple(row), 1) = Olivier
В Mon, 10 Oct 2011 11:20:08 0400
Olivier Delalleau
The following doesn't use numpy but seems to be about 20x faster:
A_rows = {} for i, row in enumerate(A): A_rows[tuple(row)] = i for i, row in enumerate(B): C[i] = A_rows.get(tuple(row), 1)
= Olivier
Thanks a lot, Olivier, that's makes my program like 3x faster. One lesson I can draw from this  don't try to use NumPy in situations it doesn't fit :) WBR, Andrey  Researcher, General and theoretical physics dept., South Ural State University 454080, Pr. Lenina, 76, Chelyabinsk, Russia Tel: +7 351 2654713 andrey@physics.susu.ac.ru
participants (3)

Andrey N. Sobolev

Bob Dowling

Olivier Delalleau