I’m scratching my head around a small problem but I can’t find a vectorized solution. I have 2 arrays A and B and I would like to get the indices (relative to B) of elements of A that are in B:
A = np.array([2,0,1,4]) B = np.array([1,2,0]) print (some_function(A,B))
[1,2,0]
# A[0] == 2 is in B and 2 == B[1] -> 1 # A[1] == 0 is in B and 0 == B[2] -> 2 # A[2] == 1 is in B and 1 == B[0] -> 0
Any idea ? I tried numpy.in1d with no luck.
Nicolas
Using pandas one can do:
A = np.array([2,0,1,4]) B = np.array([1,2,0]) s = pd.Series(range(len(B)), index=B) s[A].values
array([ 1., 2., 0., nan])
Maybe use searchsorted()? I will note that I have needed to do something like this once before, and I found that the list comprehension form of calling .index() for each item was faster than jumping through hoops to vectorize it using searchsorted (needing to sort and then map the sorted indices to the original indices), and was certainly clearer, but that might depend upon the problem size.
Cheers! Ben Root
Thanks for the quick answers. I think I will go with the .index and list comprehension. But if someone finds with a vectorised solution for the numpy 100 exercises...
Nicolas
Maybe use searchsorted()? I will note that I have needed to do something like this once before, and I found that the list comprehension form of calling .index() for each item was faster than jumping through hoops to vectorize it using searchsorted (needing to sort and then map the sorted indices to the original indices), and was certainly clearer, but that might depend upon the problem size.

Cheers! Ben Root
Cheers! Ben Root
Thanks for the quick answers. I think I will go with the .index and list comprehension. But if someone finds with a vectorised solution for the numpy 100 exercises...

Nicolas
Yeah, I doubt you can get very pretty, though maybe there is some great trick. This is one way:
In [67]: A = np.array([2,0,1,4]) In [68]: B = np.array([1,2,0]) In [69]: B_sorter = np.argsort(B) In [70]: B_index = np.searchsorted(B, A, sorter=B_sorter) In [71]: invalid = B[B_sorter].take(s, mode='clip') != A In [72]: B_index[invalid] = -1 # mark invalids with -1 In [73]: B_index Out[73]: array([ 2, 0, 1, -1])
Anyway, I guess the arrays would likely have to be quite large for this to beat list comprehension. And maybe doing the searchsorted the other way around could be faster, no idea.
- Sebastian
Maybe use searchsorted()? I will note that I have needed to do something like this once before, and I found that the list comprehension form of calling .index() for each item was faster than jumping through hoops to vectorize it using searchsorted (needing to sort and then map the sorted indices to the original indices), and was certainly clearer, but that might depend upon the problem size.
Cheers! Ben Root
Thanks, I will make some benchmark and post results.
Thanks for the quick answers. I think I will go with the .index and list comprehension. But if someone finds with a vectorised solution for the numpy 100 exercises...

Nicolas
Yeah, I doubt you can get very pretty, though maybe there is some great trick. This is one way:
In [67]: A = np.array([2,0,1,4]) In [68]: B = np.array([1,2,0]) In [69]: B_sorter = np.argsort(B) In [70]: B_index = np.searchsorted(B, A, sorter=B_sorter) In [71]: invalid = B[B_sorter].take(s, mode='clip') != A In [72]: B_index[invalid] = -1 # mark invalids with -1 In [73]: B_index Out[73]: array([ 2, 0, 1, -1])
Anyway, I guess the arrays would likely have to be quite large for this to beat list comprehension. And maybe doing the searchsorted the other way around could be faster, no idea.
- Sebastian
Maybe use searchsorted()? I will note that I have needed to do something like this once before, and I found that the list comprehension form of calling .index() for each item was faster than jumping through hoops to vectorize it using searchsorted (needing to sort and then map the sorted indices to the original indices), and was certainly clearer, but that might depend upon the problem size.
Cheers! Ben Root
I'm not 100% sure that I get the question, but does this help at all?
a = numpy.array([3,2,8,7]) b = numpy.array([1,3,2,4,5,7,6,8,9]) c = set(a) & set(b) c #contains elements of a that are in b (and vice versa)
set([8, 2, 3, 7])
indices = numpy.where([x in c for x in b])[0] indices #indices of b where the elements of a in b occur
array([1, 2, 5, 7], dtype=int64)
-Mark
Yes, it is the expected result. Thanks. Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?
I'm not 100% sure that I get the question, but does this help at all?
a = numpy.array([3,2,8,7]) b = numpy.array([1,3,2,4,5,7,6,8,9]) c = set(a) & set(b) c #contains elements of a that are in b (and vice versa)
set([8, 2, 3, 7])
indices = numpy.where([x in c for x in b])[0] indices #indices of b where the elements of a in b occur
array([1, 2, 5, 7], dtype=int64)
-Mark
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.
numpy.where(numpy.in1d(b, a))
(array([1, 2, 5, 7], dtype=int64),) It would be interesting to see the benchmarks.
Yes, it is the expected result. Thanks. Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?
I'm not 100% sure that I get the question, but does this help at all?
a = numpy.array([3,2,8,7]) b = numpy.array([1,3,2,4,5,7,6,8,9]) c = set(a) & set(b) c #contains elements of a that are in b (and vice versa)
set([8, 2, 3, 7])
indices = numpy.where([x in c for x in b])[0] indices #indices of b where the elements of a in b occur
array([1, 2, 5, 7], dtype=int64)
-Mark
Unfortunately, this does not handle repeated entries in a.
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.
numpy.where(numpy.in1d(b, a))
(array([1, 2, 5, 7], dtype=int64),) It would be interesting to see the benchmarks.
Yes, it is the expected result. Thanks. Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?
I'm not 100% sure that I get the question, but does this help at all?
a = numpy.array([3,2,8,7]) b = numpy.array([1,3,2,4,5,7,6,8,9]) c = set(a) & set(b) c #contains elements of a that are in b (and vice versa)
set([8, 2, 3, 7])
indices = numpy.where([x in c for x in b])[0] indices #indices of b where the elements of a in b occur
array([1, 2, 5, 7], dtype=int64)
-Mark
In the end, I’ve only the list comprehension to work as expected
A = [0,0,1,3] B = np.arange(8) np.random.shuffle(B) I = [list(B).index(item) for item in A if item in B]
But Mark's and Sebastian's methods do not seem to work...
Unfortunately, this does not handle repeated entries in a.
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.
numpy.where(numpy.in1d(b, a))
(array([1, 2, 5, 7], dtype=int64),) It would be interesting to see the benchmarks.
Yes, it is the expected result. Thanks. Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?
I'm not 100% sure that I get the question, but does this help at all?
a = numpy.array([3,2,8,7]) b = numpy.array([1,3,2,4,5,7,6,8,9]) c = set(a) & set(b) c #contains elements of a that are in b (and vice versa)
set([8, 2, 3, 7])
indices = numpy.where([x in c for x in b])[0] indices #indices of b where the elements of a in b occur
array([1, 2, 5, 7], dtype=int64)
-Mark
In the end, I’ve only the list comprehension to work as expected
A = [0,0,1,3] B = np.arange(8) np.random.shuffle(B) I = [list(B).index(item) for item in A if item in B]
But Mark's and Sebastian's methods do not seem to work...
Yeah, sorry had a mind slip with the sorter since it returns the sorted version. I think this should do the correct thing (throws away invalid ones as default, though I think it is a bad idea in general).
def index(A, B, fill_invalid=None): B_sorter = np.argsort(B) B_sorted = B[B_sorter] B_sorted_index = np.searchsorted(B_sorted, A) # Go back into the original index: B_index = B_sorter[B_sorted_index]
if fill_invalid is None: valid = B.take(B_index, mode='clip') == A return B_index[valid] else: invalid = B.take(B_index, mode='clip') != A
B_index[invalid] = fill_invalid return B_index
Unfortunately, this does not handle repeated entries in a.
I was not familiar with the .in1d function. That's pretty handy.
Yes...it looks like numpy.where(numpy.in1d(b, a)) does what you need.
numpy.where(numpy.in1d(b, a))
(array([1, 2, 5, 7], dtype=int64),) It would be interesting to see the benchmarks.
Yes, it is the expected result. Thanks. Maybe the set(a) & set(b) can be replaced by np.where[np.in1d(a,b)], no ?
I'm not 100% sure that I get the question, but does this help at all?
> a = numpy.array([3,2,8,7]) > b = numpy.array([1,3,2,4,5,7,6,8,9]) > c = set(a) & set(b) > c #contains elements of a that are in b (and vice versa)
set([8, 2, 3, 7])
> indices = numpy.where([x in c for x in b])[0] > indices #indices of b where the elements of a in b occur
array([1, 2, 5, 7], dtype=int64)
-Mark
> A = np.array([2,0,1,4]) > B = np.array([1,2,0]) > print (some_function(A,B))
[1,2,0]
# A[0] == 2 is in B and 2 == B[1] -> 1 # A[1] == 0 is in B and 0 == B[2] -> 2 # A[2] == 1 is in B and 1 == B[0] -> 0
Any idea ? I tried numpy.in1d with no luck.
Nicolas
