[Numpy-discussion] help using np.correlate to produce correlograms.

Julian Taylor jtaylor.debian at googlemail.com
Thu Dec 11 09:39:11 EST 2014

On 12/11/2014 03:24 PM, Pierre Haessig wrote:
> Le 11/12/2014 11:19, Julian Taylor a écrit :
>> Also on a side note, in 1.10 np.convolve/correlate has been
>> significantly speed up if one of the sequences is less than 12 elements
> Interesting! What is the origin of this speed up, and why a magic number 12?

previously numpy called dot for the convolution part, this is fine for
large convolutions as dot goes out to BLAS which is superfast.
For small convolutions unfortunately it is terrible as generic dot in
BLAS libraries have enormous overheads they only amortize on large data.
So one part was computing the dot in a simple numpy internal loop if the
data is small.

The second part is the number of registers typical machines have, e.g.
amd64 has 16 floating point registers. If you can put all elements of a
convolution kernel into these registers you save reloading them from
stack on each iteration.
11 is the largest number I could reliably use without the compiler
spilling them to the stack.

More information about the NumPy-Discussion mailing list