# [Numpy-discussion] timing results (was: record arrays initialization)

Moroney, Catherine M (388D) Catherine.M.Moroney at jpl.nasa.gov
Thu May 3 18:12:25 EDT 2012

```On May 3, 2012, at 1:00 PM, <numpy-discussion-request at scipy.org>
wrote:

>> A quick recap of the problem:  a 128x512 array of 7-element vectors (element), and a 5000-vector
>> training dataset (targets).  For each vector in element, I want to find the best-match in targets,
>> defined as minimizing the Euclidean distance.
>>
>> I coded it up three ways: (a) looping through each vector in element individually, (b) vectorizing
>> the function in the previous step, and coding it up in Fortran.  The heart of the "find-best-match"
>> code in Python looks like so I'm not doing an individual loop through all 5000 vectors in targets:
>>
>>   nlen = xelement.shape
>>   nvec = targets.data.shape
>>   x = xelement.reshape(1, nlen).repeat(nvec, axis=0)
>>
>>   diffs = ((x - targets.data)**2).sum(axis=1)
>>   diffs = numpy.sqrt(diffs)
>>   return int(numpy.argmin(diffs, axis=0))
>>
>> Here are the results:
>>
>> (a) looping through each vector:  68 seconds
>> (b) vectorizing this:             58 seconds
>> (c) raw Fortran with loops:       26 seconds
>>
>> I was surprised to see that vectorizing didn't gain me that much time, and that the Fortran
>> was so much faster than both python alternatives.  So, there's a lot that I don't know about
>> how the internals of numpy and python work.
>>
>> Why does the loop through 128x512 elements in python only take an additional 10 seconds?  What
>> is the main purpose of vectorizing - is it optimization by taking the looping step out of the
>> Python and into the C-base or something different?
>
> If you're doing loops in python, python does all sort of magic for you. Type checking is one thing, and one of the simplest things to optimize away if you use cython. If you're writing an expression similar to this
>> ((x - targets.data)**2)
> where x and targets.data are vectors, the elements are subtracted and squared elementwise in C instead of in python. So yes, you've got the idea.
>
>> And, why is the Fortran so much faster (even without optimization)?
>
> Could you show us the code? It's hard to tell otherwise. As Keith Goodman pointed out, if he gets 7.5x with cython, it could be that the Fortran code could be improved as well. Fortran has a reputation of being the gold standard for performance in numerical computation, although one can often be surprised. Picking good algorithms is always more important than the language.
>
> Paul
>

I'd be very curious to know if the Fortran can be improved on a bit further.  The full scale of the
problem dwarfs what I describe here, so any additional speed I can wring out of this would be
very welcome.

Can somebody give me the "numpy for dummies" explanation as to what the main source of the timing
differences is?  What portion of the calculations is being done in "slow" python that's the
determining factor for the slowdown?

Here is the python code:

def single(element, targets):

if (isinstance(element, tuple)):
xelement = element
elif (isinstance(element, numpy.ndarray)):
xelement = element
else:
return FILL

nlen = xelement.shape
nvec = targets.data.shape
x = xelement.reshape(1, nlen).repeat(nvec, axis=0)

diffs = ((x - targets.data)**2).sum(axis=1)
diffs = numpy.sqrt(diffs)
return int(numpy.argmin(diffs, axis=0))

multiple = numpy.vectorize(single)
python_results = multiple(vectors, Target)

where vectors is a 65536*7 array and Target is 5000x7.  So I'm evaluating 5000 potential choices for each of 65536 vectors.

And here is the Fortran:

matches(:) = -9999

do iv = 1, nvectors
min_dist = 9999999999999.0
min_idx  = -9999
do it = 1, ntargets
dvector = targets(:,it) - vectors(:,iv)
dist = sqrt(sum(dvector*dvector))
if (dist < min_dist) then
min_dist = dist
min_idx  = it
endif
end do
matches(iv) = min_idx
end do

Catherine

```