[Numpy-discussion] unique() should return a sorted array
Tim Hochberg
tim.hochberg at cox.net
Tue Jul 11 12:02:21 EDT 2006
Norbert Nemec wrote:
> unique1d is based on ediff1d, so it really calculates many differences
> and compares those to 0.0
>
> This is inefficient, even though this is hidden by the general
> inefficiency of Python (It might be the reason for the two milliseconds,
> though)
>
> What is more: subtraction works only for numbers, while the various
> proposed versions use only comparison which works for any data type (as
> long as it can be sorted)
>
My first question is: why? What's the attraction in returning a sorted
answer here? Returning an unsorted array is potentially faster,
depending on the algorithm chosen, and sorting after the fact is
trivial. If one was going to spend extra complexity on something, I'd
think it would be better spent on preserving the input order.
Second, some objects can be compared for equality and hashed, but not
sorted (Python's complex number's come to mind). If one is going to
worry about subtraction so as to keep things general, it makes sense to
also avoid sorting as well Sasha's slick algorithm not withstanding.
Third, I propose that whatever the outcome of the sorting issue, I would
propose that unique have the same interface as the other structural
array operations. That is:
unique(anarray, axis=0):
...
The default axis=0 is for compatibility with the other, somewhat similar
functions. Axis=None would return the flattened, uniquified data,
axis=# would uniquify the result along that axis.
Regards,
-tim
> My own version tried to capture all possible cases that the current
> unique captures.
>
> Sasha's version only works for numpy arrays and has a problem for arrays
> with all identical entries.
>
> David's version only works for numpy arrays of types that can be
> converted to float.
>
> I would once more propose to use my own version as given before:
>
> def unique(arr,sort=True):
> if hasattr(arr,'flatten'):
> tmp = arr.flatten()
> tmp.sort()
> idx = concatenate([True],tmp[1:]!=tmp[:-1])
> return tmp[idx]
> else: # for compatibility:
> set = {}
> for item in inseq:
> set[item] = None
> if sort:
> return asarray(sorted(set.keys()))
> else:
> return asarray(set.keys())
>
>
> Greetings,
> Norbert
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>
More information about the NumPy-Discussion
mailing list