[Numpy-discussion] timing results (was: record arrays initialization)
Moroney, Catherine M (388D)
Catherine.M.Moroney at jpl.nasa.gov
Thu May 3 13:38:48 EDT 2012
On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote:
> A quick recap of the problem: a 128x512 array of 7-element vectors (element), and a 5000-vector
> training dataset (targets). For each vector in element, I want to find the best-match in targets,
> defined as minimizing the Euclidean distance.
>
> I coded it up three ways: (a) looping through each vector in element individually, (b) vectorizing
> the function in the previous step, and coding it up in Fortran. The heart of the "find-best-match"
> code in Python looks like so I'm not doing an individual loop through all 5000 vectors in targets:
>
> nlen = xelement.shape[0]
> nvec = targets.data.shape[0]
> x = xelement.reshape(1, nlen).repeat(nvec, axis=0)
>
> diffs = ((x - targets.data)**2).sum(axis=1)
> diffs = numpy.sqrt(diffs)
> return int(numpy.argmin(diffs, axis=0))
>
> Here are the results:
>
> (a) looping through each vector: 68 seconds
> (b) vectorizing this: 58 seconds
> (c) raw Fortran with loops: 26 seconds
>
> I was surprised to see that vectorizing didn't gain me that much time, and that the Fortran
> was so much faster than both python alternatives. So, there's a lot that I don't know about
> how the internals of numpy and python work.
>
> Why does the loop through 128x512 elements in python only take an additional 10 seconds? What
> is the main purpose of vectorizing - is it optimization by taking the looping step out of the
> Python and into the C-base or something different?
>
> And, why is the Fortran so much faster (even without optimization)?
>
> It looks like I'll be switching to Fortran after all.
>
> Catherine
>
Actually Fortran with correct array ordering - 13 seconds! What horrible python/numpy
mistake am I making to cause such a slowdown?
Catherine
More information about the NumPy-Discussion
mailing list