[Numpy-discussion] Vectorizing code, for loops, and all that

A. M. Archibald peridot.faceted at gmail.com
Tue Oct 3 02:06:42 EDT 2006


On 02/10/06, Travis Oliphant <oliphant at ee.byu.edu> wrote:

> Perhaps those inner 1-d loops could be optimized (using prefetch or
> something) to reduce the number of cache misses on the inner
> computation, and the concept of looping over the largest dimension
> (instead of the last dimension) should be re-considered.

Cache control seems to be the main factor deciding the speed of many
algorithms. Prefectching could make a huge difference, particularly on
NUMA machines (like a dual opteron). I think GCC has a moderately
portable way to request it (though it may be only in beta versions as
yet).

More generally, all the tricks that ATLAS uses to accelerate BLAS
routines would (in principle) be applicable here. The implementation
would be extremely difficult, though, even if all the basic loops
could be expressed in a few primitives.

A. M. Archibald




More information about the NumPy-Discussion mailing list