A quick recap of the problem:  a 128x512 array of 7-element vectors
(element), and a 5000-vector
training dataset (targets).  For each vector in element, I want to
find the best-match in targets,
defined as minimizing the Euclidean distance.
I coded it up three ways: (a) looping through each vector in
element individually, (b) vectorizing
the function in the previous step, and coding it up in Fortran.
The heart of the "find-best-match"
code in Python looks like so I'm not doing an individual loop
through all 5000 vectors in targets:
nlen = xelement.shape
nvec = targets.data.shape
x = xelement.reshape(1, nlen).repeat(nvec, axis=0)
>>
diffs = ((x - targets.data)**2).sum(axis=1)
diffs = numpy.sqrt(diffs)
return int(numpy.argmin(diffs, axis=0))
Here are the results:
>>
(a) looping through each vector:  68 seconds
(b) vectorizing this:             58 seconds
(c) raw Fortran with loops:       26 seconds
I was surprised to see that vectorizing didn't gain me that much
time, and that the Fortran
was so much faster than both python alternatives.  So, there's a
lot that I don't know about
how the internals of numpy and python work.
Why does the loop through 128x512 elements in python only take an
is the main purpose of vectorizing - is it optimization by taking
the looping step out of the
Python and into the C-base or something different?

Because for the size of the arrays being manipulated inside the loop,
the python/numpy loop overhead isn't all that big. If you were only
doing 100 vectors in target, you would see a big difference.

Perry

