Here is a quick benchmark between numpy's unique, unique1d and sasha's unique:
x = rand(100000)*100
x = x.astype('i')
%timeit unique(x)
10 loops, best of 3: 525 ms per loop
%timeit unique_sasha(x)
100 loops, best of 3: 10.7 ms per loop
timeit unique1d(x)
100 loops, best of 3: 12.6 ms per loop
So I wonder what is the added value of unique?
Could unique1d simply become unique ?
Cheers,
David
P.S.
I modified sasha's version to account for the case where all elements are identical, which returned an empty array.
def unique_sasha(x):
s = sort(x)
r = empty(s.shape, float)
r[:-1] = s[1:]
r[-1] = NaN
return s[r != s]
Sasha wrote:
> On 7/2/06, Norbert Nemec <Norbert.Nemec.list@gmx.de> wrote:
>> ...
>> Does anybody know about the internals of the python "set"? How is
>> .keys() implemented? I somehow have really doubts about the efficiency
>> of this method.
>>
> Set implementation (Objects/setobject.c) is a copy and paste job from
> dictobject with values removed. As a result it is heavily optimized
> for the case of string valued keys - a case that is almost irrelevant
> for numpy.
>
> I think something like the following (untested, 1d only) will probably
> be much faster and sorted:
>
> def unique(x):
> s = sort(x)
> r = empty_like(s)
> r[:-1] = s[1:]
> r[-1] = s[0]
> return s[r != s]
There are 1d array set operations like this already in numpy
(numpy/lib/arraysetops.py - unique1d, ...)
r.
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion