[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Pauli Virtanen pav+sp at iki.fi
Wed Jul 8 19:02:47 EDT 2009


On 2009-07-08, Stéfan van der Walt <stefan at sun.ac.za> wrote:
> I know very little about cache optimality, so excuse the triviality of
> this question: Is it possible to design this loop optimally (taking
> into account certain build-time measurable parameters), or is it the
> kind of thing that can only be discovered by tuning at compile-time?
> ATNumPy... scary :-)

I'm still kind of hoping that it's possible to make some minimal 
assumptions about CPU caches in general, and have a rule that 
decides a code path that is good enough, if not optimal.

I don't think we want to go the ATNumPy route, or even have 
tunable parameters chosen at build or compile time. (Unless, of 
course, we want to bring a monster into the world -- think about 
cross-breeding distutils with the ATLAS build system :)

-- 
Pauli Virtanen




More information about the NumPy-Discussion mailing list