[Numpy-discussion] NumPy re-factoring project

Hans Meine meine at informatik.uni-hamburg.de
Fri Jun 11 04:29:28 EDT 2010


On Thursday 10 June 2010 22:28:28 Pauli Virtanen wrote:
> Some places where Openmp could probably help are in the inner ufunc
> loops. However, improving the memory efficiency of the data access
> pattern is another low-hanging fruit for multidimensional arrays.

I was about to mention this when the thread started.. at least I want to raise 
awareness of the current problems again, so they can be taken into account 
while refactoring.  Theoretically, this should be really low-hanging fruit, 
but I have no idea how much work it is.

The problem is: NumPy uses C-order indexing by default.  While it supports F-
order arrays, too, using F-oder arrays leads to unnecessary slow-downs, since 
the dimensions are walked through from left to right, and not in memory order.

Ideally, algorithms would get wrapped in between two additional 
pre-/postprocessing steps:

1) Preprocessing: After broadcasting, transpose the input arrays such that 
they become C order.  More specifically, sort the strides of one (e.g. the 
first/the largest) input array decreasingly, and apply the same transposition 
to the other arrays (in + out if given).

2) Perform the actual algorithm, producing a C-order output array.

3) Apply the reverse transposition to the output array.

This would
- fix the bug that operations (e.g. addition) on F-order arrays are slower 
than on C-order arrays,
- fix the bug that the order of the output arrays differs from that of the 
input arrays (this is very annoying when dealing with 3rd party libraries that 
use and expect F-order everywhere), and
- be easy for elementwise operations, but
- need further thinking / special handling of operations where the number & 
mapping of dimensions of the input & output arrays is more complex.

Last, but not least, here is a pointer to our previous discussion of the topic 
in the thread "Unnecessarily bad performance of elementwise operators with 
Fortran-arrays":
  http://www.mail-archive.com/numpy-discussion@scipy.org/msg05100.html

Have a nice day,
  Hans



More information about the NumPy-Discussion mailing list