[Numpy-discussion] unique() should return a sorted array

Robert Cimrman cimrman3 at ntc.zcu.cz
Tue Jul 11 12:49:41 EDT 2006


Norbert Nemec wrote:
> unique1d is based on ediff1d, so it really calculates many differences
> and compares those to 0.0
> 
> This is inefficient, even though this is hidden by the general
> inefficiency of Python (It might be the reason for the two milliseconds,
> though)
> 
> What is more: subtraction works only for numbers, while the various
> proposed versions use only comparison which works for any data type (as
> long as it can be sorted)

I agree that unique1d works only for numbers, but that is what it was
meant for... well for integers only, in fact - everyone here surely
knows, that comparing floats with != does not work well.
Note also that it was written before logical indexing and other neat
stuff were not possible in numpy - every improvement is welcome!

(btw. I cannot recall why I used subtraction and testing for zero
instead of just comparisons - maybe remnants from my old matlab days and
its logical arrays - ediff1d should disappear from the other functions
in arraysetops)

> My own version tried to capture all possible cases that the current
> unique captures.
> 
> Sasha's version only works for numpy arrays and has a problem for arrays
> with all identical entries.
> 
> David's version only works for numpy arrays of types that can be
> converted to float.

comparing floats...

> I would once more propose to use my own version as given before:
> 
> def unique(arr,sort=True):
>     if hasattr(arr,'flatten'):
>         tmp = arr.flatten()
>         tmp.sort()
>         idx = concatenate([True],tmp[1:]!=tmp[:-1])
>         return tmp[idx]
>     else: # for compatibility:
>         set = {}
>         for item in inseq:
>             set[item] = None
>         if sort:
>             return asarray(sorted(set.keys()))
>        else:
>             return asarray(set.keys())

Have you considered using set instead of dict? Just curious :-)

r.





More information about the NumPy-Discussion mailing list