[Numpy-discussion] Vectorizing code, for loops, and all that

Tue Oct 3 03:33:01 EDT 2006

Albert Strasheim wrote:
> Hello all
>
> I recently started looking at David Cournapeau's PyEM package, specifically
> his implementation of the K-Means algorithm. He implemented part of this
> algorithm with in pure Python version and also provided a Pyrex alternative
> that is significantly faster (about 10 times with the data I tested with). I
> tried to figure out why this is so.
>
>   
For what it worths, PyEM has been quietly updated in the past weeks: 
most responses I got initially were people reporting errors almost 
always related to change in numpy API changes, so I decided to keep 
quiet for a while, waiting for an API freeze on numpy side (last 
versions use vq from scipy.cluster, for example, the plotting have been 
much improved, a fast Ctype version for diagonal gaussian kernel 
enables  to  run EM on  hundred of thousand frames of several dozens 
dimensions  with several dozens of gaussian in the mixture in reasonable 
times)
> The job of this part of the algorithm is pretty simple: from a bunch of
> cluster means (the codebook) find the nearest cluster mean for each data
> point. The first attempt at implementing this algorithm might use two for
> loops, one over the data points and one over the cluster means, computing a
> Euclidean distance at each step. Slightly better is to use one loop and some
> broadcasting.
>
> By randomly trying some code, I arrived at the following one-liner:
>
> N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1)
>
> where
>
> data = N.random.randn(2000, 39)
> nclusters = 64
> code = data[0:nclusters,:]
>
> This code was still slow, so I mangled it a bit so that I could profile it
> using cProfile under Python 2.5 (use profile if you have an older version of
> Python):
>
> def kmean():
>     data = N.random.randn(2000, 39)
>     nclusters = 64
>     code = data[0:nclusters,:]
>     for i in xrange(10):
>         def a(): return data[...,N.newaxis,...]
>         z = a()
>         def b(): return z-code
>         z = b()
>         def c(): return z**2
>         z = c()
>         def d(): return N.sum(z, 2)
>         z = d()
>         def e(): return z.argmin(axis=1)
>         z = e()
>
> if __name__ == '__main__':
>     import cProfile
>     cProfile.run("kmean()")
>
> Profiler output:
>   
I got exactly the same kind of weird behaviour in other places of PyEM 
(apparently simple substractions taking a lot of time compared to other 
parts although it should theoratically be negligeable: eg X - mu where 
X  is (n, d) and mu (d, 1) takeing almost as much time as exp( X**2) !).

It would be great to at least know the origin of this non-intuitive result,

David