[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Thu Nov 8 11:55:13 EST 2007

Am Donnerstag, 08. November 2007 17:31:40 schrieb David Cournapeau:
> This is because the current implementation for at least some of the
> operations you are talking about are using PyArray_GenericReduce and
> other similar functions, which are really high level (they use python
> callable, etc..). This is easier, because you don't have to care about
> anything (type, etc...), but this means that python runtime is
> handling all the thing.

I suspected that after your last post, but that's really bad for pointwise 
operations on a contiguous, aligned array.  A simple transpose() should 
really not make any difference here.

> Instead, you should use a pure C 
> implementation (by pure C, I mean a C function totally independant of
> python, only dealing with standard C types). This would already lead a
> significant performance leap.

AFAICS, it would be much more elegant and easier to implement this using C++ 
templates.  We have a lot of experience with such a design from our VIGRA 
library ( http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/ ), which 
is an imaging library based on the STL concepts (and some necessary and 
convenient extensions for higher-dimensional arrays and a more flexible API).

I am not very keen on writing hundreds of lines of C code for things that can 
easily be handled with C++ functors.  But I don't think that I am the first 
to propose this, and I know that C has some advantages (faster compilation; 
are there more? ;-) ) - what is the opinion on this in the SciPy community?

> If you have segmented addresses, I don't think the ordering matters
> much anymore, for memory access, no ?

Yes, I think it does.  It probably depends on the sizes of the segments 
though.  If you have a multi-segment box-sub-range of a large dataset (3D 
volume or even only 2D), processing each contiguous "row" (column/...) at 
once within the inner loop definitely makes a difference.  I.e. as long as 
one dimension is not strided (and the data's extent in this dimension is not 
too small), it should be handled in the inner loop.  The other loops  
probably don't make a big difference.

-- 
Ciao, /  /
     /--/
    /  / ANS