[Numpy-discussion] intersect1d for N input arrays

Fri Oct 16 18:01:54 EDT 2009

Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:

> 
> Hi Martin,
> 
> thanks for your ideas and contribution.
> 
> A few notes: I would let intersect1d as it is, and created a new function with
another name for that (any
> proposals?). Considering that most of arraysetops functions are based on sort,
and in particular here
> that an intersection array is (usually) smaller than each of the input arrays,
it might be better just to
> call intersect1d repeatedly for each array and the result of the previous
call, accumulating the intersection.
> 
> r.

Hi Robert,

Yeah, I suppose sorting will get progressively slower the more input arrays
there are, and the longer each one gets. There's probably some crossover point
where the cost of doing a Python loop over the input arrays to accumulate the
intersection is less than the cost of doing a big sort. That would take some
benchmarking...

I forgot to handle the cases where the number of arrays passed is 0 or 1. Here's
an updated version:

def intersect1d(arrays, assume_unique=False):
    """Find the intersection of any number of 1D arrays.
    Return the sorted, unique values that are in all of the input arrays.
    Adapted from numpy.lib.arraysetops.intersect1d"""
    N = len(arrays)
    if N == 0:
        return np.asarray(arrays)
    arrays = list(arrays) # allow assignment
    if not assume_unique:
        for i, arr in enumerate(arrays):
            arrays[i] = np.unique(arr)
    aux = np.concatenate(arrays) # one long 1D array
    aux.sort() # sorted
    if N == 1:
        return aux
    shift = N-1
    return aux[aux[shift:] == aux[:-shift]]