[Numpy-discussion] member1d and unique elements
Robert Cimrman
cimrman3 at ntc.zcu.cz
Tue Aug 5 12:00:08 EDT 2008
Greg Novak wrote:
> I have two arrays of integers, and would like to know _where_ they
> have elements in common, not just _which_ elements are in common.
> This is because the entries in the integer array are aligned with
> other arrays. This seems very close to what member1d advertises as
> its function. However, member1d says that it expects arrays with only
> unique elements.
>
> First of all, my desired operation is well-posed: I'd like f(ar1,
> ar2) to return something in the shape of ar1 with True if the value at
> that position appears anywhere in ar2 (regardless of duplication) and
> False otherwise.
>
> So I looked at the code and have two questions:
> 1) What is this code trying to achieve?
> aux = perm[ii+1]
> perm[ii+1] = perm[ii]
> perm[ii] = aux
>
> Here perm is the stable argsort of the two concatenated arguments:
> perm = concatenate((ar1, ar2)).argsort(kind='mergesort').
> arr is the array of combined inputs in sorted order:
> arr = concatenate((ar1, ar2))[perm]
> and ii is a list of indices into arr where the value of arr is equal
> to the next value in the array (arr[ii] == arr[ii+1]) _and_ arr[ii]
> came from the _second_ input (ar2).
>
> Now, this last bit (looking for elements of arr that are equal and
> both came from the second array) is clearly trying to deal with
> duplication, which is why I'm interested...
>
> So, the code snippet is trying to swap perm[ii+1] with perm[ii], but I
> don't see why. Furthermore, there are funny results if a value is
> duplicated three times, not just twice -- perm is no longer a
> permutation vector. Eg, member1d([1], [2,2,2]) results perm=[0,1,2,3]
> and ii=[1,2] before the above snippet, and the above snippet makes
> perm into [0,2,3,2]
>
> I've commented those three lines, and I've never seen any changes to
> the output of member1d. The new value of perm is used to compute the
> expression: perm.argsort(kind='mergesort')[:len( ar1 )], but the
> changes to that expression as a result of the above three lines are
> always at the high end of the array, which is sliced off by the last
> [:len(ar1)].
>
> Finally, my second question is:
> 2) Does anyone have a test case where member1d fails as a result of
> duplicates in the input? So far I haven't found any, with the above
> lines commented or not.
>
> Upon reflection and review of the changelog, another theory occurs to
> me: member1d did not originally use a stable sort. What I've written
> above for interpretation of the value ii (indicates duplication within
> ar2) is true for a stable sort, but for an unstable sort the same
> condition has the interpretation that ii holds the values where the
> sorting algorithm swapped the order of equal values unstably. Then
> the code snippet in question 1) looks like an attempt to swap those
> values in the permutation array to make the sort stable again. The
> attempt would fail if there was duplication in either array.
>
> So, I would propose deleting those three lines (since they seem to be
> a non-functional relic) and declaring in the docstring that member1d
> doesn't require unique elements.
>
> Also, if this is correct, then the function simplifies considerably
> since several values don't need to be computed anymore:
>
> def setmember1d( ar1, ar2 ):
> ar = nm.concatenate( (ar1, ar2 ) )
> perm = ar.argsort(kind='mergesort')
> aux = ar[perm]
> flag = nm.concatenate( (aux[1:] == aux[:-1], [False] ) )
> indx = perm.argsort(kind='mergesort')[:len( ar1 )]
> return flag[indx]
>
> Corrections to the above are welcome since I'm going to start using
> member1d without regard for uniqueness, and I'd like to know if I'm
> making a big mistake...
Hi Greg,
I do not have much time to investigate it in detail right now, but it
does not work for repeated entries in ar1:
In [14]: nm.setmember1d( [1,2,3,2], [1, 3] )
Out[14]: array([ True, True, True, False], dtype=bool)
thanks for trying to enhance arraysetops!
r.
More information about the NumPy-Discussion
mailing list