member1d and unique elements

I have two arrays of integers, and would like to know _where_ they have elements in common, not just _which_ elements are in common. This is because the entries in the integer array are aligned with other arrays. This seems very close to what member1d advertises as its function. However, member1d says that it expects arrays with only unique elements. First of all, my desired operation is well-posed: I'd like f(ar1, ar2) to return something in the shape of ar1 with True if the value at that position appears anywhere in ar2 (regardless of duplication) and False otherwise. So I looked at the code and have two questions: 1) What is this code trying to achieve? aux = perm[ii+1] perm[ii+1] = perm[ii] perm[ii] = aux Here perm is the stable argsort of the two concatenated arguments: perm = concatenate((ar1, ar2)).argsort(kind='mergesort'). arr is the array of combined inputs in sorted order: arr = concatenate((ar1, ar2))[perm] and ii is a list of indices into arr where the value of arr is equal to the next value in the array (arr[ii] == arr[ii+1]) _and_ arr[ii] came from the _second_ input (ar2). Now, this last bit (looking for elements of arr that are equal and both came from the second array) is clearly trying to deal with duplication, which is why I'm interested... So, the code snippet is trying to swap perm[ii+1] with perm[ii], but I don't see why. Furthermore, there are funny results if a value is duplicated three times, not just twice -- perm is no longer a permutation vector. Eg, member1d([1], [2,2,2]) results perm=[0,1,2,3] and ii=[1,2] before the above snippet, and the above snippet makes perm into [0,2,3,2] I've commented those three lines, and I've never seen any changes to the output of member1d. The new value of perm is used to compute the expression: perm.argsort(kind='mergesort')[:len( ar1 )], but the changes to that expression as a result of the above three lines are always at the high end of the array, which is sliced off by the last [:len(ar1)]. Finally, my second question is: 2) Does anyone have a test case where member1d fails as a result of duplicates in the input? So far I haven't found any, with the above lines commented or not. Upon reflection and review of the changelog, another theory occurs to me: member1d did not originally use a stable sort. What I've written above for interpretation of the value ii (indicates duplication within ar2) is true for a stable sort, but for an unstable sort the same condition has the interpretation that ii holds the values where the sorting algorithm swapped the order of equal values unstably. Then the code snippet in question 1) looks like an attempt to swap those values in the permutation array to make the sort stable again. The attempt would fail if there was duplication in either array. So, I would propose deleting those three lines (since they seem to be a non-functional relic) and declaring in the docstring that member1d doesn't require unique elements. Also, if this is correct, then the function simplifies considerably since several values don't need to be computed anymore: def setmember1d( ar1, ar2 ): ar = nm.concatenate( (ar1, ar2 ) ) perm = ar.argsort(kind='mergesort') aux = ar[perm] flag = nm.concatenate( (aux[1:] == aux[:-1], [False] ) ) indx = perm.argsort(kind='mergesort')[:len( ar1 )] return flag[indx] Corrections to the above are welcome since I'm going to start using member1d without regard for uniqueness, and I'd like to know if I'm making a big mistake... Thanks, Greg

2008/8/3 Greg Novak <novak@ucolick.org>:
First of all, my desired operation is well-posed: I'd like f(ar1, ar2) to return something in the shape of ar1 with True if the value at that position appears anywhere in ar2 (regardless of duplication) and False otherwise.
Just because one-liners are so irresistable:
x array([1, 1, 2, 5, 7])
x2 array([1, 7])
np.logical_or.reduce(x[:,None] == x2, axis=1) array([ True, True, False, False, True], dtype=bool)
(x[:,None] == x2).sum(axis=1) > 0 array([ True, True, False, False, True], dtype=bool)
Creates an MnX temporary, but oh well. Thanks for taking a look at member1d. I'm swamped, so I cannot take part in the conversation, but I hope someone takes note. If no one does, please file a ticket so that we don't lose track of this conversation. Cheers Stéfan

On Sun, Aug 03, 2008 at 07:16:42PM +0200, Stéfan van der Walt wrote:
2008/8/3 Greg Novak <novak@ucolick.org>:
First of all, my desired operation is well-posed: I'd like f(ar1, ar2) to return something in the shape of ar1 with True if the value at that position appears anywhere in ar2 (regardless of duplication) and False otherwise.
Just because one-liners are so irresistable:
Damn, a perl programmer has sneeked in this mailing-list. Please call security. Gaël

Greg Novak wrote:
I have two arrays of integers, and would like to know _where_ they have elements in common, not just _which_ elements are in common. This is because the entries in the integer array are aligned with other arrays. This seems very close to what member1d advertises as its function. However, member1d says that it expects arrays with only unique elements.
First of all, my desired operation is well-posed: I'd like f(ar1, ar2) to return something in the shape of ar1 with True if the value at that position appears anywhere in ar2 (regardless of duplication) and False otherwise.
So I looked at the code and have two questions: 1) What is this code trying to achieve? aux = perm[ii+1] perm[ii+1] = perm[ii] perm[ii] = aux
Here perm is the stable argsort of the two concatenated arguments: perm = concatenate((ar1, ar2)).argsort(kind='mergesort'). arr is the array of combined inputs in sorted order: arr = concatenate((ar1, ar2))[perm] and ii is a list of indices into arr where the value of arr is equal to the next value in the array (arr[ii] == arr[ii+1]) _and_ arr[ii] came from the _second_ input (ar2).
Now, this last bit (looking for elements of arr that are equal and both came from the second array) is clearly trying to deal with duplication, which is why I'm interested...
So, the code snippet is trying to swap perm[ii+1] with perm[ii], but I don't see why. Furthermore, there are funny results if a value is duplicated three times, not just twice -- perm is no longer a permutation vector. Eg, member1d([1], [2,2,2]) results perm=[0,1,2,3] and ii=[1,2] before the above snippet, and the above snippet makes perm into [0,2,3,2]
I've commented those three lines, and I've never seen any changes to the output of member1d. The new value of perm is used to compute the expression: perm.argsort(kind='mergesort')[:len( ar1 )], but the changes to that expression as a result of the above three lines are always at the high end of the array, which is sliced off by the last [:len(ar1)].
Finally, my second question is: 2) Does anyone have a test case where member1d fails as a result of duplicates in the input? So far I haven't found any, with the above lines commented or not.
Upon reflection and review of the changelog, another theory occurs to me: member1d did not originally use a stable sort. What I've written above for interpretation of the value ii (indicates duplication within ar2) is true for a stable sort, but for an unstable sort the same condition has the interpretation that ii holds the values where the sorting algorithm swapped the order of equal values unstably. Then the code snippet in question 1) looks like an attempt to swap those values in the permutation array to make the sort stable again. The attempt would fail if there was duplication in either array.
So, I would propose deleting those three lines (since they seem to be a non-functional relic) and declaring in the docstring that member1d doesn't require unique elements.
Also, if this is correct, then the function simplifies considerably since several values don't need to be computed anymore:
def setmember1d( ar1, ar2 ): ar = nm.concatenate( (ar1, ar2 ) ) perm = ar.argsort(kind='mergesort') aux = ar[perm] flag = nm.concatenate( (aux[1:] == aux[:-1], [False] ) ) indx = perm.argsort(kind='mergesort')[:len( ar1 )] return flag[indx]
Corrections to the above are welcome since I'm going to start using member1d without regard for uniqueness, and I'd like to know if I'm making a big mistake...
Hi Greg, I do not have much time to investigate it in detail right now, but it does not work for repeated entries in ar1: In [14]: nm.setmember1d( [1,2,3,2], [1, 3] ) Out[14]: array([ True, True, True, False], dtype=bool) thanks for trying to enhance arraysetops! r.

Argh. I could swear that yesterday I typed test cases just like the one you provide, and it behaved correctly. Nevertheless, it clearly fails in spite of my memory, so attached is a version which I believe gives the correct behavior. Greg On Tue, Aug 5, 2008 at 9:00 AM, Robert Cimrman <cimrman3@ntc.zcu.cz> wrote:
I do not have much time to investigate it in detail right now, but it does not work for repeated entries in ar1:
In [14]: nm.setmember1d( [1,2,3,2], [1, 3] ) Out[14]: array([ True, True, True, False], dtype=bool)

Hi Greg, Greg Novak wrote:
Argh. I could swear that yesterday I typed test cases just like the one you provide, and it behaved correctly. Nevertheless, it clearly fails in spite of my memory, so attached is a version which I believe gives the correct behavior.
It looks ok now, although I did not check very carefully. Just make sure that it works for empty input arrays (all combinations). I would also set the default to handle_dupes to False to preserve previous functionality. I suggest you refresh your numpy repository from SVN - now np. prefix is used instead of nm. within the module. Thank you! r.
participants (4)
-
Gael Varoquaux
-
Greg Novak
-
Robert Cimrman
-
Stéfan van der Walt