[Numpy-discussion] Faster

Hoyt Koepke hoytak at gmail.com
Sun May 4 11:11:42 EDT 2008


Another realization after looking at your code again -- you are doing
extra work recomputing the minimum distance over d at each step. Since
you are only updating one column and one row, you can have a length n
array that gives the minimum (or max) distance for each column, then,
after merging, you update the entry corresponding to the new column
and then update the others only if the row you're updating has that
minimum value in it.  Then, when scanning for the min dist, you only
need to scan O(n) rows.  This improves the runtime of your algorithm
from O(n^3) to O(n^2).  You will definitely notice that.

--Hoyt

On Sun, May 4, 2008 at 7:40 AM, Timothy Hochberg <tim.hochberg at ieee.org> wrote:
>
>
>
>
> On Sat, May 3, 2008 at 5:31 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> >
> > On Sat, May 3, 2008 at 5:05 PM, Christopher Barker
> > <Chris.Barker at noaa.gov> wrote:
> >
> > > Robert Kern wrote:
> > >  > I can get a ~20% improvement with the following:
> > >
> > >
> > > > In [9]: def mycut(x, i):
> > >  >    ...:     A = x[:i,:i]
> > >  >    ...:     B = x[:i,i+1:]
> > >  >    ...:     C = x[i+1:,:i]
> > >  >    ...:     D = x[i+1:,i+1:]
> > >  >    ...:     return hstack([vstack([A,C]),vstack([B,D])])
> > >
> > >  Might it be a touch faster to built the final array first, then fill
> it:
> > >
> > >  def mycut(x, i):
> > >      r,c = x.shape
> > >      out = np.empty((r-1, c-1), dtype=x.dtype)
> > >      out[:i,:i] = x[:i,:i]
> > >      out[:i,i:] = x[:i,i+1:]
> > >      out[i:,:i] = x[i+1:,:i]
> > >      out[i:,i+1:] = x[i+1:,i+1:]
> > >      return out
> > >
> > >  totally untested.
> > >
> > >  That should save the creation of two temporaries.
> >
> > Initializing the array makes sense. And it is super fast:
> >
> > >> timeit mycut(x, 6)
> > 100 loops, best of 3: 7.48 ms per loop
> > >> timeit mycut2(x, 6)
> > 1000 loops, best of 3: 1.5 ms per loop
>
> If you don't need the old array after the cut, I think that you could use
> the input array as the output array and then take a slice, saving a
> temporary and one-quarter of your assignments (on average). Something like.
>
> def destructive_cut(x, i): # Untested
>     out = x[:-1,:-1]
>
>      out[:i,i:] = x[:i,i+1:]
>      out[i:,:i] = x[i+1:,:i]
>      out[i:,i:] = x[i+1:,i+1:]
>      return out
>
> If you were really clever, you could take different initial slices based on
> i so that you always skipped at least one-quarter of the assignments, but
> that's probably not worth the effort.
>
>
>
>
> >
> >
> > The time it takes to cluster went from about 1.9 seconds to 0.7
> > seconds! Thank you.
> >
> > When I run the single linkage clustering on my data I get one big
> > cluster and a bunch of tiny clusters. So I need to try a different
> > linkage method. Average linkage sounds good, but it sounds hard to
> > code.
> >
> >
> >
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
>
> --
> . __
> . |-\
> .
> . tim.hochberg at ieee.org
> _______________________________________________
>  Numpy-discussion mailing list
>  Numpy-discussion at scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
+++++++++++++++++++++++++++++++++++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
hoytak at gmail.com
+++++++++++++++++++++++++++++++++++



More information about the NumPy-Discussion mailing list