[Numpy-discussion] intersect1d for N input arrays
Martin Spacek
numpy at mspacek.mm.st
Fri Oct 16 18:01:54 EDT 2009
Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:
>
> Hi Martin,
>
> thanks for your ideas and contribution.
>
> A few notes: I would let intersect1d as it is, and created a new function with
another name for that (any
> proposals?). Considering that most of arraysetops functions are based on sort,
and in particular here
> that an intersection array is (usually) smaller than each of the input arrays,
it might be better just to
> call intersect1d repeatedly for each array and the result of the previous
call, accumulating the intersection.
>
> r.
Hi Robert,
Yeah, I suppose sorting will get progressively slower the more input arrays
there are, and the longer each one gets. There's probably some crossover point
where the cost of doing a Python loop over the input arrays to accumulate the
intersection is less than the cost of doing a big sort. That would take some
benchmarking...
I forgot to handle the cases where the number of arrays passed is 0 or 1. Here's
an updated version:
def intersect1d(arrays, assume_unique=False):
"""Find the intersection of any number of 1D arrays.
Return the sorted, unique values that are in all of the input arrays.
Adapted from numpy.lib.arraysetops.intersect1d"""
N = len(arrays)
if N == 0:
return np.asarray(arrays)
arrays = list(arrays) # allow assignment
if not assume_unique:
for i, arr in enumerate(arrays):
arrays[i] = np.unique(arr)
aux = np.concatenate(arrays) # one long 1D array
aux.sort() # sorted
if N == 1:
return aux
shift = N-1
return aux[aux[shift:] == aux[:-shift]]
More information about the NumPy-Discussion
mailing list