[Numpy-discussion] Faster

Charles R Harris charlesr.harris at gmail.com
Fri May 2 21:29:51 EDT 2008


On Fri, May 2, 2008 at 7:16 PM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Fri, May 2, 2008 at 6:05 PM, Robert Kern <robert.kern at gmail.com> wrote:
> > On Fri, May 2, 2008 at 7:24 PM, Keith Goodman <kwgoodman at gmail.com>
> wrote:
> >
> >
> > > How can I make this function faster? It removes the i-th row and
> >  >  column from an array.
> >  >
> >  >  def cut(x, i):
> >  >     idx = range(x.shape[0])
> >  >     idx.remove(i)
> >  >     y = x[idx,:]
> >  >     y = y[:,idx]
> >  >     return y
> >  >
> >  >  >> import numpy as np
> >  >  >> x = np.random.rand(500,500)
> >  >  >> timeit cut(x, 100)
> >  >  100 loops, best of 3: 16.8 ms per loop
> >
> >  I can get a ~20% improvement with the following:
> >
> >  In [8]: %timeit cut(x, 100)
> >  10 loops, best of 3: 21.6 ms per loop
> >
> >  In [9]: def mycut(x, i):
> >    ...:     A = x[:i,:i]
> >    ...:     B = x[:i,i+1:]
> >    ...:     C = x[i+1:,:i]
> >    ...:     D = x[i+1:,i+1:]
> >    ...:     return hstack([vstack([A,C]),vstack([B,D])])
> >    ...:
> >
> >  In [10]: %timeit mycut(x, 100)
> >  10 loops, best of 3: 17.3 ms per loop
> >
> >  The hstack(vstack, vstack) seems to be somewhat better than
> >  vstack(hstack, hstack), at least for these sizes.
>
> Wow. I never would have come up with that. And I probably never will.
>
> Original code:
> >> cluster.test2(500)
> n = 500 took 5.28 seconds
>
> Your improvement:
> >> cluster.test2(500)
> n = 500 took 3.52 seconds
>
> Much more than a 20% improvement when used in the larger program. Thank
> you.


Isn't the lengthy part finding the distance between clusters?  I can think
of several ways to do that, but I think you will get a real speedup by doing
that in c or c++. I have a module made in boost python that holds clusters
and returns a list of lists containing their elements. Clusters are joined
by joining any two elements, one from each. It wouldn't take much to add a
distance function, but you could use the list of indices in each cluster to
pull a subset out of the distance matrix and then find the minimum function
in that. This also reminds me of Huffman codes.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080502/fff8937c/attachment.html>


More information about the NumPy-Discussion mailing list