[Numpy-discussion] timing results (was: record arrays initialization)

Fri May 4 02:19:23 EDT 2012

On Fri, May 4, 2012 at 12:49 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Thu, May 3, 2012 at 3:12 PM, Moroney, Catherine M (388D)
> <Catherine.M.Moroney at jpl.nasa.gov> wrote:
>
>> Here is the python code:
>>
>> def single(element, targets):
>>
>>    if (isinstance(element, tuple)):
>>        xelement = element[0]
>>    elif (isinstance(element, numpy.ndarray)):
>>        xelement = element
>>    else:
>>        return FILL
>>
>>    nlen = xelement.shape[0]
>>    nvec = targets.data.shape[0]
>>    x = xelement.reshape(1, nlen).repeat(nvec, axis=0)
>
> repeat is slow. I don't think you need it since broadcasting should
> take care of things. (But maybe I misunderstand the data.)
>
>>    diffs = ((x - targets.data)**2).sum(axis=1)
>
> You could try np.dot instead of sum: one = np.ones(7); diff =
> np.dot(diff, one). You could even pass one in.
>
>>    diffs = numpy.sqrt(diffs)
>
> Since you don't return the distance, no need for the sqrt. If you do
> need the sqrt only take the sqrt of one element instead of all
> elements.
>
>>    return int(numpy.argmin(diffs, axis=0))

As Keith says, you don't have to do the sqrt of everything.

I'd probably set the min_dist to be a negative value initially.
Perhaps you'll encounter large distances at some point, and negative
values are more obviously incorrect when you're debugging.

If you're 100% sure you're setting every single line of the array
matches, then you can skip this line:
matches(:) = -9999
It won't buy you much, but if it's for free...

Two suggestions. Either:
     dvector = targets(:,it) - vectors(:,iv)
     dist = sqrt(sum(dvector*dvector))
If you rewrite it in a loop as
     dist = 0
     do itemp = 1, size(targets, 1)
          dist = dist + (targets(itemp, it) - vectors(itemp, iv))**2
     end do
(I am skipping the sqrt as it is not needed.) That way you're not
looping over the array three times. You will be doing the sum, the
difference, and the exponentiation all in one loop. Moving memory
around is typically more expensive than computing. A good Fortran
compiler should optimize **2 into multiplication by itself instead of
calling a library routine, too.

Alternatively:
     dist = sqrt(sum(dvector*dvector))
Can be written as (skipping the sqrt again):
     dist = dot_product(dvector, dvector)
or:
     dist = DDOT(size(dvector), dvector, 1, dvector, 1)
In fortran there's the builtin dot_product function. If this is in
fact one of your bottlenecks, you should also try calling DDOT (= dot
product) from an optimized BLAS library. This is what numpy.dot does,
too, but in Fortran you avoid calling overhead.

Hope this helps - it's kind of hard to know exactly what works without
playing with the code yourself :) Also, sometimes your compiler has
some of these things already and you don't get anything by
hand-optimizing.

Paul