[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Thu Nov 8 11:31:40 EST 2007

On Nov 9, 2007 12:50 AM, Hans Meine <meine at informatik.uni-hamburg.de> wrote:
> Am Donnerstag, 08. November 2007 16:37:06 schrieb David Cournapeau:
> > The problem is not F vs C storage: for element-wise operation, it does
> > not matter at all; you just apply the same function
> > (perform_operation) over and over on every element of the array. The
> > order does not matter at all.
>
> Yet Fortran order leads to several times slower operations, see the figures in
> my original post. :-(
This is because the current implementation for at least some of the
operations you are talking about are using PyArray_GenericReduce and
other similar functions, which are really high level (they use python
callable, etc..). This is easier, because you don't have to care about
anything (type, etc...), but this means that python runtime is
handling all the thing. Instead, you should use a pure C
implementation (by pure C, I mean a C function totally independant of
python, only dealing with standard C types). This would already lead a
significant performance leap.

Once you do that, you should not see difference between F and C
storage, normally.
>
> > But what if you have segmented buffers, non aligned, etc... arrays ?
>
> The code I posted should deal with it - by sorting the indices by decreasing
> stride, I simply ensure that all (source and target) segments are traversed
> in order of increasing memory addresses.  It does not affect segments or
> alignment.
If you have segmented addresses, I don't think the ordering matters
much anymore, for memory access, no ? For example,
>
> > All this has to be taken care of,
>
> Right - my "perform_operation(aprime)" step would need to apply the operation
> on every memory segment.
>
> > and this means handling reference
> > count and other things which are always delicate to handle well...
>
> I am not sure that I understand where refcounting comes into play here.
>
> > Or
> > you just use the current situation, which let python handle it
> > (through PyObject_Call and a callable, at a C level).
>
> I need to look at the code to see what you mean here.  Probably, I have a
> wrong picture of where the slowness comes from (I thought that the order of
> the loops was wrong).
>
> > > As I wrote above, I don't think this is good.  A fortran-order-contiguous
> > > array is still contiguous, and not inferior in any way to C-order arrays.
> > > So I actually expect copy() to return an array of unchanged order.
> >
> > Maybe this behaviour was kept for compatibility with numarray ? If you
> > look at the docstring, it is said that copy may not return the same
> > order than its input. It kind of make sense to me: C order is the
> > default of many numpy operations.
>
> That is very sad, because Fortran order is much more natural for handling
> images, where you're absolutely used to index with [x,y], and x being the
> faster-changing index in memory.
>
>
> --
> Ciao, /  /
>      /--/
>     /  / ANS
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>