[Numpy-discussion] odd performance of sum?

Sturla Molden sturla at molden.no
Sun Feb 13 12:46:54 EST 2011

Den 13.02.2011 04:30, skrev Travis Oliphant:
> One of the advantages of an open source community is that different 
> needs get addressed by the people who need them.   So, if you really 
> need a faster sum and are willing to do the hard work to make it 
> happen (or convince someone else to do it for you), then there will be 
> many to say thank you when you are finished.    You should know though 
> that in my experience It is harder than you might think at first to 
> "convince someone else to do it for you."

For things like sum (and may other ufuncs), one could just use Fortran 
instrincs and let the Fortran compiler do the rest. (Obviously NumPy 
should not depend on a Fortran compiler, unlike SciPy, but that is 
another matter.)

Assuming a wrapper takes care of C and Fortran ordering, we can just 
f2py this:

subroutine fortran_sum_1d( m, arr, out )
     use, intrinsic :: sum
     integer :: m
     real :: arr(m), out
     out = sum(arr)
end subroutine

subroutine fortran_sum_2d( m, n, arr, k, out, dim )
     use, intrinsic :: sum
     integer :: m, n, k, dim
     real :: arr(m,n), out(k)
     out = sum(arr, dim+1) ! fortran dims start at 1
end subroutine

Similar things can be done for other ufuncs. We can also use Fortran's 
"elemental" keyword to create our own. The Fortran compilers from Absoft 
and Intel (and possibly Portland) can even take code like this and make 
it efficient for multicore CPUs. (I don't know if GNU gfortran can do 
this too, but I don't expect it too.)

One limitation is that the Fortran compiler must know the number of 
dimensions, while NumPy arrays are flexible in this respect. Another 
limitation is strides, but we can deal with them if they are a multiple 
of the dtype size, but not arbitrary number of bytes like NumPy.

subroutine fortran_sum_1d_with_strides( m, s, arr, out )
     use, intrinsic :: sum
     integer :: m, s
     real, target :: arr(m)
     real, pointer, dimension(:) :: parr
     real :: out
     parr => arr(::s)
     out = sum(parr)
end subroutine

I think for most people, Fortran is much easier than using NumPy's C 
API. Creating fast "ufunc" like functions is very easy with Fortran 95.

If we e.g. want the L2 norm of a vector, it looks like this in Fortran:

subroutine l2norm( m, arr, out )
     use, intrinsic :: sum, sqrt
     integer :: m
     real :: arr(m), out
     out = sqrt(sum(arr**2)) ! looks quite like Python...
end subroutine

Just notice the messyness of C in comparison (there are much worse 

#include <Python.h>
#include <numpy/arrayobject.h//>
#include <math.h>

double l2norm(PyArrayObject *arr)
      double acc = 0.0, *y;
      char *addr = arr->data;
      for (int i=0; i<arr->dimensions[0]; i++) {
          y = (double *)addr;
          acc += y*y;
          addr +=  arr->strides[0];
      return sqrt(acc);


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110213/47f6d5f7/attachment.html>

More information about the NumPy-Discussion mailing list