[SciPy-User] fftconvolve performance depends on padding to a power of two

Mon May 27 19:11:36 EDT 2013

I'm using scipy.signal.fftconvolve in a performance-sensitive context, and
I've found that its default behavior (padding array size to the next power
of two) does not always give the best speed, and can consume a lot more
memory. I made a modified version that lets the user choose whether or not
to pad to the nearest power of two, and in some contexts, it goes faster by
a reasonable amount:

Shape of input data: (10L, 100L, 100L)
Time elapsed with 2**n padding: 0.144182774132
Time elapsed without 2**n padding: 0.112595033085

Shape of input data: (10L, 200L, 200L)
Time elapsed with 2**n padding: 0.636056829348
Time elapsed without 2**n padding: 0.713067174562

Shape of input data: (10L, 300L, 300L)
Time elapsed with 2**n padding: 3.10080130933
Time elapsed without 2**n padding: 1.0824417215

Assuming I'm not doing something really dumb, this might be a useful option
to expose to the user. Here's the modified code, very similar to the
existing fftconvolve code:

import numpy as np
from numpy import array, product
from scipy.fftpack import fftn, ifftn
from scipy.signal.signaltools import _centered

def fftconvolve(in1, in2, mode="full", pad_to_power_of_two=True):
    """Convolve two N-dimensional arrays using FFT. See convolve.

    """
    s1 = array(in1.shape)
    s2 = array(in2.shape)
    complex_result = (np.issubdtype(in1.dtype, np.complex) or
                      np.issubdtype(in2.dtype, np.complex))
    size = s1 + s2 - 1

    if pad_to_power_of_two:
        # Use 2**n-sized FFT; it might improve performance
        fsize = 2 ** np.ceil(np.log2(size))
    else:
        # Padding to a power of two might degrade performance, too
        fsize = size
    IN1 = fftn(in1, fsize)
    IN1 *= fftn(in2, fsize)
    fslice = tuple([slice(0, int(sz)) for sz in size])
    ret = ifftn(IN1)[fslice].copy()
    del IN1
    if not complex_result:
        ret = ret.real
    if mode == "full":
        return ret
    elif mode == "same":
        if product(s1, axis=0) > product(s2, axis=0):
            osize = s1
        else:
            osize = s2
        return _centered(ret, osize)
    elif mode == "valid":
        return _centered(ret, abs(s2 - s1) + 1)

if __name__ == '__main__':
    import time

    for sz in (100, 200, 300):
        a = np.zeros((10, sz, sz), dtype=np.float64)
        b = np.zeros((10, 20, 20), dtype=np.float64)
        print "Shape of input data:", a.shape
        start = time.clock()
        fftconvolve(a, b, mode='same', pad_to_power_of_two=True)
        end = time.clock()
        print "Time elapsed with 2**n padding:", end - start
        start = time.clock()
        fftconvolve(a, b, mode='same', pad_to_power_of_two=False)
        end = time.clock()
        print "Time elapsed without 2**n padding:", end - start
        print
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130527/920215cc/attachment.html>