Re: [Numpy-discussion] unique() should return a sorted array

July 11, 2006


      Norbert Nemec wrote:
...
unique1d is based on ediff1d, so it really calculates many differences
and compares those to 0.0
This is inefficient, even though this is hidden by the general
inefficiency of Python (It might be the reason for the two milliseconds,
though)
What is more: subtraction works only for numbers, while the various
proposed versions use only comparison which works for any data type (as
long as it can be sorted)
I agree that unique1d works only for numbers, but that is what it was
meant for... well for integers only, in fact - everyone here surely
knows, that comparing floats with != does not work well.
Note also that it was written before logical indexing and other neat
stuff were not possible in numpy - every improvement is welcome!

(btw. I cannot recall why I used subtraction and testing for zero
instead of just comparisons - maybe remnants from my old matlab days and
its logical arrays - ediff1d should disappear from the other functions
in arraysetops)
...
My own version tried to capture all possible cases that the current
unique captures.
Sasha's version only works for numpy arrays and has a problem for arrays
with all identical entries.
David's version only works for numpy arrays of types that can be
converted to float.
comparing floats...
...
I would once more propose to use my own version as given before:
def unique(arr,sort=True):
    if hasattr(arr,'flatten'):
        tmp = arr.flatten()
        tmp.sort()
        idx = concatenate([True],tmp[1:]!=tmp[:-1])
        return tmp[idx]
    else: # for compatibility:
        set = {}
        for item in inseq:
            set[item] = None
        if sort:
            return asarray(sorted(set.keys()))
       else:
            return asarray(set.keys())
Have you considered using set instead of dict? Just curious :-)

r.

Re: [Numpy-discussion] unique() should return a sorted array

Robert Cimrman