Tim Hochberg wrote:

Norbert Nemec wrote:

unique1d is based on ediff1d, so it really calculates many differences and compares those to 0.0

This is inefficient, even though this is hidden by the general inefficiency of Python (It might be the reason for the two milliseconds, though)

What is more: subtraction works only for numbers, while the various proposed versions use only comparison which works for any data type (as long as it can be sorted)

My first question is: why? What's the attraction in returning a sorted answer here? Returning an unsorted array is potentially faster, depending on the algorithm chosen, and sorting after the fact is trivial. If one was going to spend extra complexity on something, I'd think it would be better spent on preserving the input order.

One issue is that uniquifying numpy arrays using Python dicts, while expected O(n) in terms of complexity, is really slow compared to sorting because of the overhead in getting the elements out of the numpy arrays and into Python objects. For the cases where sorting works (your caveat below is quite correct), it's really quite good for numpy arrays. OTOH, if one were to implement a hash table in C, that might potentially be faster and more robust, but that is spending a *large* amount of extra complexity.

Second, some objects can be compared for equality and hashed, but not sorted (Python's complex number's come to mind). If one is going to worry about subtraction so as to keep things general, it makes sense to also avoid sorting as well Sasha's slick algorithm not withstanding.

-- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco