[Numpy-discussion] Ufunc memory access optimizations (Was: ufuncs on funny strides ...)

Thu Apr 1 14:46:52 EDT 2010

to, 2010-04-01 kello 11:30 -0700, M Trumpis kirjoitti:
[clip]
> Actually I realized later that the main slow-down comes from the fact
> that my array was strided in fortran order (ascending strides). But
> from the point of view of a ufunc that is operating independently at
> each value, why would it need to respect striding?

Correct. There has been discussion about improving ufunc performance by
optimizing the memory access pattern.

The main issue in your case is that the output array is in C order, so
that it is *not* possible to access both the input and the output arrays
in the optimal order. Fixing this issue requires allowing ufuncs to
allocate arrays that are in non-C order. This needs a design decision
that has not so far been made. I'd be for this, I don't think it can
break anything.

The second issue is that there is no universal access pattern choice for
every case that is optimal on all processor cache layouts. This forces
to use heuristics to determine the access pattern, which is not so
simple to get right, and usually would require some information of the
processor's cache architecture.

(Even some code has been written along these lines, though mostly
addressing the reduction:
http://github.com/pv/numpy-work/tree/ticket/1143-speedup-reduce
http://projects.scipy.org/numpy/ticket/1143
Not production quality so far, and the non-C-output order would
definitely help also here.)

-- 
Pauli Virtanen