[Numpy-discussion] Faster

Hoyt Koepke hoytak at gmail.com
Sat May 3 20:56:15 EDT 2008


You could also try complete linkage, where you merge two clusters
based on the farthest distance between points in two clusters instead
of the smallest.  This will tend to get clusters of equal size (which
isn't always ideal, either).  However, it also uses sufficient
statistics, so it will be trivial to change your code to use that
merge criteria if you want to try it.

--Hoyt





On Sat, May 3, 2008 at 5:31 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Sat, May 3, 2008 at 5:05 PM, Christopher Barker
>  <Chris.Barker at noaa.gov> wrote:
>
> > Robert Kern wrote:
>  >  > I can get a ~20% improvement with the following:
>  >
>  >
>  > > In [9]: def mycut(x, i):
>  >  >    ...:     A = x[:i,:i]
>  >  >    ...:     B = x[:i,i+1:]
>  >  >    ...:     C = x[i+1:,:i]
>  >  >    ...:     D = x[i+1:,i+1:]
>  >  >    ...:     return hstack([vstack([A,C]),vstack([B,D])])
>  >
>  >  Might it be a touch faster to built the final array first, then fill it:
>  >
>  >  def mycut(x, i):
>  >      r,c = x.shape
>  >      out = np.empty((r-1, c-1), dtype=x.dtype)
>  >      out[:i,:i] = x[:i,:i]
>  >      out[:i,i:] = x[:i,i+1:]
>  >      out[i:,:i] = x[i+1:,:i]
>  >      out[i:,i+1:] = x[i+1:,i+1:]
>  >      return out
>  >
>  >  totally untested.
>  >
>  >  That should save the creation of two temporaries.
>
>  Initializing the array makes sense. And it is super fast:
>
>  >> timeit mycut(x, 6)
>  100 loops, best of 3: 7.48 ms per loop
>  >> timeit mycut2(x, 6)
>  1000 loops, best of 3: 1.5 ms per loop
>
>  The time it takes to cluster went from about 1.9 seconds to 0.7
>  seconds! Thank you.
>
>  When I run the single linkage clustering on my data I get one big
>  cluster and a bunch of tiny clusters. So I need to try a different
>  linkage method. Average linkage sounds good, but it sounds hard to
>  code.
>
>
> _______________________________________________
>  Numpy-discussion mailing list
>  Numpy-discussion at scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
+++++++++++++++++++++++++++++++++++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
hoytak at gmail.com
+++++++++++++++++++++++++++++++++++



More information about the NumPy-Discussion mailing list