[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Thu Nov 8 11:31:54 EST 2007

On Nov 9, 2007 1:31 AM, David Cournapeau <cournape at gmail.com> wrote:
> On Nov 9, 2007 12:50 AM, Hans Meine <meine at informatik.uni-hamburg.de> wrote:
> > Am Donnerstag, 08. November 2007 16:37:06 schrieb David Cournapeau:
> > > The problem is not F vs C storage: for element-wise operation, it does
> > > not matter at all; you just apply the same function
> > > (perform_operation) over and over on every element of the array. The
> > > order does not matter at all.
> >
> > Yet Fortran order leads to several times slower operations, see the figures in
> > my original post. :-(
> This is because the current implementation for at least some of the
> operations you are talking about are using PyArray_GenericReduce and
> other similar functions, which are really high level (they use python
> callable, etc..). This is easier, because you don't have to care about
> anything (type, etc...), but this means that python runtime is
> handling all the thing. Instead, you should use a pure C
> implementation (by pure C, I mean a C function totally independant of
> python, only dealing with standard C types). This would already lead a
> significant performance leap.
>
> Once you do that, you should not see difference between F and C
> storage, normally.
> >
> > > But what if you have segmented buffers, non aligned, etc... arrays ?
> >
> > The code I posted should deal with it - by sorting the indices by decreasing
> > stride, I simply ensure that all (source and target) segments are traversed
> > in order of increasing memory addresses.  It does not affect segments or
> > alignment.
> If you have segmented addresses, I don't think the ordering matters
> much anymore, for memory access, no ? For example,
>
> >
> > > All this has to be taken care of,
> >
> > Right - my "perform_operation(aprime)" step would need to apply the operation
> > on every memory segment.
> >
> > > and this means handling reference
> > > count and other things which are always delicate to handle well...
> >
> > I am not sure that I understand where refcounting comes into play here.
> >
> > > Or
> > > you just use the current situation, which let python handle it
> > > (through PyObject_Call and a callable, at a C level).
> >
> > I need to look at the code to see what you mean here.  Probably, I have a
> > wrong picture of where the slowness comes from (I thought that the order of
> > the loops was wrong).
> >
> > > > As I wrote above, I don't think this is good.  A fortran-order-contiguous
> > > > array is still contiguous, and not inferior in any way to C-order arrays.
> > > > So I actually expect copy() to return an array of unchanged order.
> > >
> > > Maybe this behaviour was kept for compatibility with numarray ? If you
> > > look at the docstring, it is said that copy may not return the same
> > > order than its input. It kind of make sense to me: C order is the
> > > default of many numpy operations.
> >
> > That is very sad, because Fortran order is much more natural for handling
> > images, where you're absolutely used to index with [x,y], and x being the
> > faster-changing index in memory.
> >
> >
> > --
> > Ciao, /  /
> >      /--/
> >     /  / ANS
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>