
Hi all, I have 2 arrays - A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the i-th value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to -1. Now I'm dealing with it using which: ... import numpy as np for i, row in enumerate(B): try: C[i] = np.which(np.all(A == row, axis = 1))[0][0] except (IndexError, ): C[i] = -1 ... but that's very slow (it consumes 70% of cpu time needed by the whole program). I guess that it's because of a slow pythonic loop, but I just can't get how to get rid of it. Any suggestions would be appreciated. Thanks in advance, Andrey -- Researcher, General and theoretical physics dept., South Ural State University 454080, Pr. Lenina, 76, Chelyabinsk, Russia Tel: +7 351 265-47-13 andrey@physics.susu.ac.ru

On 10/10/11 09:53, Andrey N. Sobolev wrote:
I have 2 arrays - A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the i-th value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to -1.
May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?) -- Bob Dowling

В Mon, 10 Oct 2011 10:03:48 +0100 Bob Dowling <rjd4+numpy@cam.ac.uk> пишет:
On 10/10/11 09:53, Andrey N. Sobolev wrote:
I have 2 arrays - A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the i-th value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to -1.
May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?)
Yes, rows in A are unique and sorted. One more typo found - instead of np.which in the previous e-mail it has to be np.where, I don't know what I thought about :) Thanks in advance! Andrey

2011/10/10 Andrey N. Sobolev <inconnu@list.ru>
В Mon, 10 Oct 2011 10:03:48 +0100 Bob Dowling <rjd4+numpy@cam.ac.uk> пишет:
On 10/10/11 09:53, Andrey N. Sobolev wrote:
I have 2 arrays - A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the i-th value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to -1.
May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?)
Yes, rows in A are unique and sorted. One more typo found - instead of np.which in the previous e-mail it has to be np.where, I don't know what I thought about :)
Thanks in advance! Andrey
The following doesn't use numpy but seems to be about 20x faster: A_rows = {} for i, row in enumerate(A): A_rows[tuple(row)] = i for i, row in enumerate(B): C[i] = A_rows.get(tuple(row), -1) -=- Olivier

В Mon, 10 Oct 2011 11:20:08 -0400 Olivier Delalleau <shish@keba.be> пишет:
The following doesn't use numpy but seems to be about 20x faster:
A_rows = {} for i, row in enumerate(A): A_rows[tuple(row)] = i for i, row in enumerate(B): C[i] = A_rows.get(tuple(row), -1)
-=- Olivier
Thanks a lot, Olivier, that's makes my program like 3x faster. One lesson I can draw from this - don't try to use NumPy in situations it doesn't fit :) WBR, Andrey -- Researcher, General and theoretical physics dept., South Ural State University 454080, Pr. Lenina, 76, Chelyabinsk, Russia Tel: +7 351 265-47-13 andrey@physics.susu.ac.ru
participants (3)
-
Andrey N. Sobolev
-
Bob Dowling
-
Olivier Delalleau