[Numpy-discussion] member1d and unique elements

Robert Cimrman cimrman3 at ntc.zcu.cz
Tue Aug 5 12:00:08 EDT 2008


Greg Novak wrote:
> I have two arrays of integers, and would like to know _where_ they
> have elements in common, not just _which_ elements are in common.
> This is because the entries in the integer array are aligned with
> other arrays.  This seems very close to what member1d advertises as
> its function.  However, member1d says that it expects arrays with only
> unique elements.
> 
> First of all, my desired operation is well-posed:  I'd like f(ar1,
> ar2) to return something in the shape of ar1 with True if the value at
> that position appears anywhere in ar2 (regardless of duplication) and
> False otherwise.
> 
> So I looked at the code and have two questions:
> 1) What is this code trying to achieve?
>     aux = perm[ii+1]
>     perm[ii+1] = perm[ii]
>     perm[ii] = aux
> 
> Here perm is the stable argsort of the two concatenated arguments:
> perm = concatenate((ar1, ar2)).argsort(kind='mergesort').
> arr is the array of combined inputs in sorted order:
> arr = concatenate((ar1, ar2))[perm]
> and ii is a list of indices into arr where the value of arr is equal
> to the next value in the array (arr[ii] == arr[ii+1]) _and_ arr[ii]
> came from the _second_ input (ar2).
> 
> Now, this last bit (looking for elements of arr that are equal and
> both came from the second array) is clearly trying to deal with
> duplication, which is why I'm interested...
> 
> So, the code snippet is trying to swap perm[ii+1] with perm[ii], but I
> don't see why.  Furthermore, there are funny results if a value is
> duplicated three times, not just twice -- perm is no longer a
> permutation vector.  Eg, member1d([1], [2,2,2]) results perm=[0,1,2,3]
> and ii=[1,2] before the above snippet, and the above snippet makes
> perm into [0,2,3,2]
> 
> I've commented those three lines, and I've never seen any changes to
> the output of member1d.  The new value of perm is used to compute the
> expression: perm.argsort(kind='mergesort')[:len( ar1 )], but the
> changes to that expression as a result of the above three lines are
> always at the high end of the array, which is sliced off by the last
> [:len(ar1)].
> 
> Finally, my second question is:
> 2) Does anyone have a test case where member1d fails as a result of
> duplicates in the input?  So far I haven't found any, with the above
> lines commented or not.
> 
> Upon reflection and review of the changelog, another theory occurs to
> me: member1d did not originally use a stable sort.  What I've written
> above for interpretation of the value ii (indicates duplication within
> ar2) is true for a stable sort, but for an unstable sort the same
> condition has the interpretation that ii holds the values where the
> sorting algorithm swapped the order of equal values unstably.  Then
> the code snippet in question 1) looks like an attempt to swap those
> values in the permutation array to make the sort stable again.  The
> attempt would fail if there was duplication in either array.
> 
> So, I would propose deleting those three lines (since they seem to be
> a non-functional relic) and declaring in the docstring that member1d
> doesn't require unique elements.
> 
> Also, if this is correct, then the function simplifies considerably
> since several values don't need to be computed anymore:
> 
> def setmember1d( ar1, ar2 ):
>     ar = nm.concatenate( (ar1, ar2 ) )
>     perm = ar.argsort(kind='mergesort')
>     aux = ar[perm]
>     flag = nm.concatenate( (aux[1:] == aux[:-1], [False] ) )
>     indx = perm.argsort(kind='mergesort')[:len( ar1 )]
>     return flag[indx]
> 
> Corrections to the above are welcome since I'm going to start using
> member1d without regard for uniqueness, and I'd like to know if I'm
> making a big mistake...

Hi Greg,

I do not have much time to investigate it in detail right now, but it 
does not work for repeated entries in ar1:

In [14]: nm.setmember1d( [1,2,3,2], [1, 3] )
Out[14]: array([ True,  True,  True, False], dtype=bool)

thanks for trying to enhance arraysetops!
r.



More information about the NumPy-Discussion mailing list