Tim Hochberg wrote:
Norbert Nemec wrote:
unique1d is based on ediff1d, so it really calculates many differences and compares those to 0.0
This is inefficient, even though this is hidden by the general inefficiency of Python (It might be the reason for the two milliseconds, though)
What is more: subtraction works only for numbers, while the various proposed versions use only comparison which works for any data type (as long as it can be sorted)
My first question is: why? What's the attraction in returning a sorted answer here? Returning an unsorted array is potentially faster, depending on the algorithm chosen, and sorting after the fact is trivial. If one was going to spend extra complexity on something, I'd think it would be better spent on preserving the input order.
Second, some objects can be compared for equality and hashed, but not sorted (Python's complex number's come to mind). If one is going to worry about subtraction so as to keep things general, it makes sense to also avoid sorting as well Sasha's slick algorithm not withstanding.
Third, I propose that whatever the outcome of the sorting issue, I would propose that unique have the same interface as the other structural array operations. That is:
unique(anarray, axis=0): ...
The default axis=0 is for compatibility with the other, somewhat similar functions. Axis=None would return the flattened, uniquified data, axis=# would uniquify the result along that axis.
Hmmm. Of course that precludes it returning an actual array for axis!=None. That might be considered suboptimal... -tim
Regards,
-tim
My own version tried to capture all possible cases that the current unique captures.
Sasha's version only works for numpy arrays and has a problem for arrays with all identical entries.
David's version only works for numpy arrays of types that can be converted to float.
I would once more propose to use my own version as given before:
def unique(arr,sort=True): if hasattr(arr,'flatten'): tmp = arr.flatten() tmp.sort() idx = concatenate([True],tmp[1:]!=tmp[:-1]) return tmp[idx] else: # for compatibility: set = {} for item in inseq: set[item] = None if sort: return asarray(sorted(set.keys())) else: return asarray(set.keys())
Greetings, Norbert
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion