[Numpy-discussion] VIGRA, NumPy and Fortran-order (again)

Wed Jul 22 09:12:37 EDT 2009

On Friday 17 July 2009 22:15:31 Pauli Virtanen wrote:
> On 2009-07-17, Hans Meine <meine at informatik.uni-hamburg.de> wrote:
> > If I understood Travis' comments in the above-mentioned thread [1]
> > correctly, this would already fix some of the performance issues along
> > the way (since it would suddenly allow the use of special, optimized code
> > paths).
>
> I was wondering about this too, when working on improving the
> cache coherency of the reduction operations. Also these would be
> more efficient if the striding of the output array could be
> chosen freely.
>
> I wonder if it would be OK to make this change...

It looks to me as if it was mostly a matter of "who's going to implement it".  

For example, David Cournapeau wrote [1]:  "Because it is not trivial to do so 
[be efficient] in all cases, I guess." and then [2]
"It [a transposed array] should not [make a difference], but in practise, it 
is not so easy to do. AFAIK, even matlab has the same problem, with less 
difference, though. And they have much more ressources than numpy. Not to say 
that we cannot do better, but this takes time."

Travis simply wrote "this [the creation of c-order output arrays] would need 
to change in order to tak advantage of the [existing] special case" [3], 
without mentioning whether he'd be in favor of this change.  Later he wrote 
there were "relatively smooth pathways for optimization" [4] which sounded 
promising.  

Anne Archibald clearly saw no reason "not to let ufuncs pick whichever order 
for newly-allocated arrays they want" [5] either.

Finally, Travis wrote [6]:
> C-order is "special" in NumPy due to the history.  I agree that it doesn't
> need to be and we have taken significant steps in that direction.  [...]
> Nobody is opposed to having faster code as long as we don't in the process
> break code bases.   There is also the matter of implementation.

So, AFAICS, it's only a matter of how to do it and who actually does it; 
nobody seemed to be opposed (except that the default order in copy should stay 
"C" of course, which is absolutely acceptable for me).

BTW: Here's pseude-code in Python which I proposed back then:

[elementwise operation on array 'a']
1) ii = [i for i, s in sorted(enumerate(ia.strides), key = lambda (i, s): -s)]
2) aprime = a.transpose(ii) # aprime will be "as C-contiguous as it gets"
3) bprime = perform_operation(aprime) # fast elementwise operation
4) jj = [ii.index(i) for i in range(bprime.ndim)] # inverse ii
5) b = bprime.transpose(jj) # let result have the same order as the input

I think this could/should be added to the ufunc machinery, maybe directly 
after broadcasting the inputs.

Hmm, actually, the above might be an unnecessary implementation trick, i.e. I 
proposed to transpose before and after the operation, while Travis suggested 
to create the output array with the right order and then try with the existing 
ufunc machinery.

I really hope that we can improve NumPy here.

Have a nice day,
  Hans

[1] http://mail.scipy.org/pipermail/numpy-discussion/2007-November/029838.html
[2] http://mail.scipy.org/pipermail/numpy-discussion/2007-November/029849.html
[3] http://mail.scipy.org/pipermail/numpy-discussion/2007-November/029859.html
[4] http://mail.scipy.org/pipermail/numpy-discussion/2007-November/029863.html
[5] http://mail.scipy.org/pipermail/numpy-discussion/2007-November/029866.html