[Numpy-discussion] intersect1d and setmember1d

Fri Feb 27 04:15:59 EST 2009

Zachary Pincus wrote:
> Hi,
> 
>> intersect1d and setmember1d doesn't give expected results in case
>>  there are duplicate values in either array becuase it works by 
>> sorting data and substracting previous value. Is there an 
>> alternative in numpy to get indices of intersected values.
> 
> From the docstring for setmember1d (and other set operations), you 
> are only supposed to pass it arrays with unique values (i.e. arrays
>  that represent sets in the mathematical sense):
> 
>>>> print numpy.setmember1d.__doc__
> Return a boolean array set True where first element is in second 
> array.
> 
> Boolean array is the shape of `ar1` containing True where the 
> elements of `ar1` are in `ar2` and False otherwise.
> 
> Use unique1d() to generate arrays with only unique elements to use as
>  inputs to this function. [...]
> 
> As stated, use unique1d to generate set-arrays from your input.
> 
> On the other hand, intersect1d is supposed to work with repeated 
> elements:
>>>> print numpy.intersect1d.__doc__
> Intersection returning repeated or unique elements common to both 
> arrays.
> 
> Parameters ---------- ar1,ar2 : array_like Input arrays.
> 
> Returns ------- out : ndarray, shape(N,) Sorted 1D array of common
> elements with repeating elements.
> 
> See Also -------- intersect1d_nu : Returns only unique common
> elements. [...]
> 
> Do you have an example of intersect1d not working right? If so, what
>  version of numpy are you using (print numpy.version.version)?
> 
> Zach

Hi,

yes, many functions in arraysetops.py ('intersect1d', 'setxor1d',
'setmember1d', 'union1d', 'setdiff1d') were originally meant to work 
with arrays of unique elements as inputs. I have just noticed, that the 
docstring of intersect1d says that it works for non-unique arrays and 
contains the following example:

 >>> np.intersect1d([1,3,3],[3,1,1])
     array([1, 1, 3, 3])

I am not sure if this is a useful behaviour - does anybody uses this 
"feature" (or better, side-effect)?

I would like to change the example to the usual use case:
In [9]: np.intersect1d([1,2,4,3],[3,1,5])
Out[9]: array([1, 3])

For arrays with non-unique elements, there is:

In [11]: np.intersect1d_nu([1,3,3],[3,1,1])
Out[11]: array([1, 3])

which just calls unique1d() for its arguments.

cheers,
r.