[Numpy-discussion] speeding up the following expression

Sat Nov 12 12:51:10 EST 2011

On Sat, Nov 12, 2011 at 11:16 AM, <josef.pktd at gmail.com> wrote:

> On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
> <warren.weckesser at enthought.com> wrote:
> >
> >
> > On Sat, Nov 12, 2011 at 9:59 AM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
> >> <warren.weckesser at enthought.com> wrote:
> >> >
> >> >
> >> > On Sat, Nov 12, 2011 at 6:43 AM, <josef.pktd at gmail.com> wrote:
> >> >>
> >> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu <zyzhu2000 at gmail.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I am playing with multiple ways to speed up the following
> expression
> >> >> > (it is in the inner loop):
> >> >> >
> >> >> >
> >> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
> >> >> >
> >> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c
> are
> >> >> > scalars.
> >> >> >
> >> >> > I played with numexpr, but it was way slower than directly using
> >> >> > numpy. It would be nice if I could create a Mx3 matrix without
> >> >> > copying
> >> >> > memory and so I can use dot() to calculate the whole thing.
> >> >> >
> >> >> > Can anyone help with giving some advices to make this faster?
> >> >>
> >> >> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
> >> >> conditions.
> >> >
> >> >
> >> >
> >> > As Josef pointed out, this is a convolution.  There are (at least)
> >> > three convolution functions in numpy+scipy that you could use:
> >> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> >> > Of these, scipy.ndimage.convolve1d is the fastest.  However, it
> doesn't
> >> > beat the simple expression
> >> >    a*x[2:] + b*x[1:-1] + c*x[:-2]
> >> > Your idea of forming a matrix without copying memory can be done using
> >> > "stride tricks", and for arrays of the size you are interested in, it
> >> > computes the result faster than the simple expression (see below).
> >> >
> >> > Another fast alternative is to use one of the inline code generators.
> >> > This example is a easy to implement with scipy.weave.blitz, and it
> gives
> >> > a big speedup.
> >> >
> >> > Here's a test script:
> >> >
> >> > #----- convolve1dtest.py -----
> >> >
> >> >
> >> > import numpy as np
> >> > from numpy.lib.stride_tricks import as_strided
> >> > from scipy.ndimage import convolve1d
> >> > from scipy.weave import blitz
> >> >
> >> > # weighting coefficients
> >> > a = -0.5
> >> > b = 1.0
> >> > c = -0.25
> >> > w = np.array((a,b,c))
> >> > # Reversed w:
> >> > rw = w[::-1]
> >> >
> >> > # Length of C
> >> > n = 250
> >> >
> >> > # The original version of the calculation:
> >> > # Some dummy data
> >> > C = np.arange(float(n))
> >> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> >> > # Save for comparison.
> >> > C0 = C.copy()
> >> >
> >> > # Do it again using a matrix multiplication.
> >> > C = np.arange(float(n))
> >> > # The "virtual" matrix view of C.
> >> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
> >> > C.strides[0]))
> >> > C[1:-1] = np.dot(V, rw)
> >> > C1 = C.copy()
> >> >
> >> > # Again, with convolve1d this time.
> >> > C = np.arange(float(n))
> >> > C[1:-1] = convolve1d(C, w)[1:-1]
> >> > C2 = C.copy()
> >> >
> >> > # scipy.weave.blitz
> >> > C = np.arange(float(n))
> >> > # Must work with a copy, D, in the formula, because blitz does not use
> >> > # a temporary variable.
> >> > D = C.copy()
> >> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> >> > blitz(expr, check_size=0)
> >> > C3 = C.copy()
> >> >
> >> >
> >> > # Verify that all the methods give the same result.
> >> > print np.all(C0 == C1)
> >> > print np.all(C0 == C2)
> >> > print np.all(C0 == C3)
> >> >
> >> > #-----
> >> >
> >> > And here's a snippet from an ipython session:
> >> >
> >> > In [51]: run convolve1dtest.py
> >> > True
> >> > True
> >> > True
> >> >
> >> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> >> > 100000 loops, best of 3: 16.5 us per loop
> >> >
> >> > In [53]: %timeit C[1:-1] = np.dot(V, rw)
> >> > 100000 loops, best of 3: 9.84 us per loop
> >> >
> >> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> >> > 100000 loops, best of 3: 18.7 us per loop
> >> >
> >> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> >> > 100000 loops, best of 3: 4.91 us per loop
> >> >
> >> >
> >> >
> >> > scipy.weave.blitz is fastest (but note that blitz has already been
> >> > called
> >> > once, so the time shown does not include the compilation required in
> >> > the first call).  You could also try scipy.weave.inline,
> cython.inline,
> >> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
> >> >
> >> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the
> simple
> >> > expression or convolve1d.  However, if you also have to set up V
> inside
> >> > your inner loop, the speed gain will be lost.  The relative speeds
> also
> >> > depend on the size of C.  For large C, the simple expression is faster
> >> > than the matrix multiplication by V (but blitz is still faster).  In
> >> > the following, I have changed n to 2500 before running
> >> > convolve1dtest.py:
> >> >
> >> > In [56]: run convolve1dtest.py
> >> > True
> >> > True
> >> > True
> >> >
> >> > In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> >> > 10000 loops, best of 3: 29.5 us per loop
> >> >
> >> > In [58]: %timeit C[1:-1] = np.dot(V, rw)
> >> > 10000 loops, best of 3: 56.4 us per loop
> >> >
> >> > In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> >> > 10000 loops, best of 3: 37.3 us per loop
> >> >
> >> > In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
> >> > 100000 loops, best of 3: 10.3 us per loop
> >> >
> >> >
> >> > blitz wins, the simple numpy expression is a distant second, and now
> >> > the matrix multiplication is slowest.
> >> >
> >> > I hope that helps--I know I learned quite a bit. :)
> >>
> >> Interesting, two questions
> >>
> >> does scipy.signal convolve have a similar overhead as np.convolve1d ?
> >
> >
> > Did you mean np.convolve?  There is no np.convolve1d.   Some of the tests
> > that I've done with convolution functions are here:
> >     http://www.scipy.org/Cookbook/ApplyFIRFilter
> > I should add np.convolve to that page.  For the case considered here,
> > np.convolve is a bit slower than scipy.ndimage.convolve1d, but for larger
> > arrays, it looks like np.convolve can be much faster.
> >
> >
> >>
> >> memory:
> >> the blitz code doesn't include the array copy (D), so the timing might
> >> be a bit misleading?
> >
> >
> > Look again at my %timeit calls in the ipython snippets.  :)
> >
>
> Sorry, I was reading way too fast or selectively (skipping the
> imports, namespaces for example).
> (I thought convolve1d was numpy not ndimage)
>
> I tried out your cookbook script a while ago, it's nice, I had only a
> rough idea about the timing before.
> Compared to the current case the x is much longer, (unless I don't
> remember or read it correctly.)
>

I just updated the page with np.convolve:
    http://www.scipy.org/Cookbook/ApplyFIRFilter
A big part of the overhead in using np.convolve in that script is that the
function only accepts 1-d arrays, so a python loop is required to apply it
along an axis of a 2-d array.

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111112/c81cea0f/attachment.html>