Mailman 3 speeding up the following expression - NumPy-Discussion

speeding up the following expression

older
Re: [Numpy-discussion] np.dot and...

Geoffrey Zhu

12 Nov 2011 12 Nov '11

5:36 p.m.

Hi, I am playing with multiple ways to speed up the following expression (it is in the inner loop): C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)]) where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars. I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing. Can anyone help with giving some advices to make this faster? Thanks, G

Show replies by date

josef.pktd＠gmail.com

12 Nov 12 Nov

9:43 p.m.

On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu wrote:

...

Hi,

I am playing with multiple ways to speed up the following expression (it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions. Josef

...

Thanks, G _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Warren Weckesser

13 Nov 13 Nov

12:31 a.m.

On Sat, Nov 12, 2011 at 6:43 AM, wrote:

...

On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu wrote:

...
Hi,

I am playing with multiple ways to speed up the following expression (it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions.

As Josef pointed out, this is a convolution. There are (at least) three convolution functions in numpy+scipy that you could use: numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't beat the simple expression a*x[2:] + b*x[1:-1] + c*x[:-2] Your idea of forming a matrix without copying memory can be done using "stride tricks", and for arrays of the size you are interested in, it computes the result faster than the simple expression (see below). Another fast alternative is to use one of the inline code generators. This example is a easy to implement with scipy.weave.blitz, and it gives a big speedup. Here's a test script: #----- convolve1dtest.py ----- import numpy as np from numpy.lib.stride_tricks import as_strided from scipy.ndimage import convolve1d from scipy.weave import blitz # weighting coefficients a = -0.5 b = 1.0 c = -0.25 w = np.array((a,b,c)) # Reversed w: rw = w[::-1] # Length of C n = 250 # The original version of the calculation: # Some dummy data C = np.arange(float(n)) C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] # Save for comparison. C0 = C.copy() # Do it again using a matrix multiplication. C = np.arange(float(n)) # The "virtual" matrix view of C. V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0])) C[1:-1] = np.dot(V, rw) C1 = C.copy() # Again, with convolve1d this time. C = np.arange(float(n)) C[1:-1] = convolve1d(C, w)[1:-1] C2 = C.copy() # scipy.weave.blitz C = np.arange(float(n)) # Must work with a copy, D, in the formula, because blitz does not use # a temporary variable. D = C.copy() expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" blitz(expr, check_size=0) C3 = C.copy() # Verify that all the methods give the same result. print np.all(C0 == C1) print np.all(C0 == C2) print np.all(C0 == C3) #----- And here's a snippet from an ipython session: In [51]: run convolve1dtest.py True True True In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 100000 loops, best of 3: 16.5 us per loop In [53]: %timeit C[1:-1] = np.dot(V, rw) 100000 loops, best of 3: 9.84 us per loop In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 100000 loops, best of 3: 18.7 us per loop In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 4.91 us per loop scipy.weave.blitz is fastest (but note that blitz has already been called once, so the time shown does not include the compilation required in the first call). You could also try scipy.weave.inline, cython.inline, or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/). Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple expression or convolve1d. However, if you also have to set up V inside your inner loop, the speed gain will be lost. The relative speeds also depend on the size of C. For large C, the simple expression is faster than the matrix multiplication by V (but blitz is still faster). In the following, I have changed n to 2500 before running convolve1dtest.py: In [56]: run convolve1dtest.py True True True In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 10000 loops, best of 3: 29.5 us per loop In [58]: %timeit C[1:-1] = np.dot(V, rw) 10000 loops, best of 3: 56.4 us per loop In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 10000 loops, best of 3: 37.3 us per loop In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 10.3 us per loop blitz wins, the simple numpy expression is a distant second, and now the matrix multiplication is slowest. I hope that helps--I know I learned quite a bit. :) Warren

...

Josef

...
Thanks, G _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

12:59 a.m.

On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser wrote:

...

On Sat, Nov 12, 2011 at 6:43 AM, wrote:

...
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu wrote:

...
Hi,

I am playing with multiple ways to speed up the following expression (it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions.

As Josef pointed out, this is a convolution. There are (at least) three convolution functions in numpy+scipy that you could use: numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't beat the simple expression a*x[2:] + b*x[1:-1] + c*x[:-2] Your idea of forming a matrix without copying memory can be done using "stride tricks", and for arrays of the size you are interested in, it computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators. This example is a easy to implement with scipy.weave.blitz, and it gives a big speedup.

Here's a test script:

#----- convolve1dtest.py -----

import numpy as np from numpy.lib.stride_tricks import as_strided from scipy.ndimage import convolve1d from scipy.weave import blitz

# weighting coefficients a = -0.5 b = 1.0 c = -0.25 w = np.array((a,b,c)) # Reversed w: rw = w[::-1]

# Length of C n = 250

# The original version of the calculation: # Some dummy data C = np.arange(float(n)) C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] # Save for comparison. C0 = C.copy()

# Do it again using a matrix multiplication. C = np.arange(float(n)) # The "virtual" matrix view of C. V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0])) C[1:-1] = np.dot(V, rw) C1 = C.copy()

# Again, with convolve1d this time. C = np.arange(float(n)) C[1:-1] = convolve1d(C, w)[1:-1] C2 = C.copy()

# scipy.weave.blitz C = np.arange(float(n)) # Must work with a copy, D, in the formula, because blitz does not use # a temporary variable. D = C.copy() expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" blitz(expr, check_size=0) C3 = C.copy()

# Verify that all the methods give the same result. print np.all(C0 == C1) print np.all(C0 == C2) print np.all(C0 == C3)

#-----

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py True True True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 100000 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw) 100000 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 100000 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 4.91 us per loop

scipy.weave.blitz is fastest (but note that blitz has already been called once, so the time shown does not include the compilation required in the first call). You could also try scipy.weave.inline, cython.inline, or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple expression or convolve1d. However, if you also have to set up V inside your inner loop, the speed gain will be lost. The relative speeds also depend on the size of C. For large C, the simple expression is faster than the matrix multiplication by V (but blitz is still faster). In the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py True True True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 10000 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw) 10000 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 10000 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 10.3 us per loop

blitz wins, the simple numpy expression is a distant second, and now the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)

Interesting, two questions does scipy.signal convolve have a similar overhead as np.convolve1d ? memory: the blitz code doesn't include the array copy (D), so the timing might be a bit misleading? I assume the as_strided call doesn't allocate any memory yet, so the timing should be correct. (or is this your comment about setting up V in the inner loop) Josef

...

Warren

...
Josef

...
Thanks, G _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Warren Weckesser

1:32 a.m.

On Sat, Nov 12, 2011 at 9:59 AM, wrote:

...

On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser wrote:

...
On Sat, Nov 12, 2011 at 6:43 AM, wrote:

...
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu

wrote:

...
...
...
Hi,

I am playing with multiple ways to speed up the following expression (it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions.

As Josef pointed out, this is a convolution. There are (at least) three convolution functions in numpy+scipy that you could use: numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't beat the simple expression a*x[2:] + b*x[1:-1] + c*x[:-2] Your idea of forming a matrix without copying memory can be done using "stride tricks", and for arrays of the size you are interested in, it computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators. This example is a easy to implement with scipy.weave.blitz, and it gives a big speedup.

Here's a test script:

#----- convolve1dtest.py -----

import numpy as np from numpy.lib.stride_tricks import as_strided from scipy.ndimage import convolve1d from scipy.weave import blitz

# weighting coefficients a = -0.5 b = 1.0 c = -0.25 w = np.array((a,b,c)) # Reversed w: rw = w[::-1]

# Length of C n = 250

# The original version of the calculation: # Some dummy data C = np.arange(float(n)) C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] # Save for comparison. C0 = C.copy()

# Do it again using a matrix multiplication. C = np.arange(float(n)) # The "virtual" matrix view of C. V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0])) C[1:-1] = np.dot(V, rw) C1 = C.copy()

# Again, with convolve1d this time. C = np.arange(float(n)) C[1:-1] = convolve1d(C, w)[1:-1] C2 = C.copy()

# scipy.weave.blitz C = np.arange(float(n)) # Must work with a copy, D, in the formula, because blitz does not use # a temporary variable. D = C.copy() expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" blitz(expr, check_size=0) C3 = C.copy()

# Verify that all the methods give the same result. print np.all(C0 == C1) print np.all(C0 == C2) print np.all(C0 == C3)

#-----

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py True True True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 100000 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw) 100000 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 100000 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 4.91 us per loop

scipy.weave.blitz is fastest (but note that blitz has already been called once, so the time shown does not include the compilation required in the first call). You could also try scipy.weave.inline, cython.inline, or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple expression or convolve1d. However, if you also have to set up V inside your inner loop, the speed gain will be lost. The relative speeds also depend on the size of C. For large C, the simple expression is faster than the matrix multiplication by V (but blitz is still faster). In the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py True True True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 10000 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw) 10000 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 10000 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 10.3 us per loop

blitz wins, the simple numpy expression is a distant second, and now the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)

Interesting, two questions

does scipy.signal convolve have a similar overhead as np.convolve1d ?

Did you mean np.convolve? There is no np.convolve1d. Some of the tests that I've done with convolution functions are here: http://www.scipy.org/Cookbook/ApplyFIRFilter I should add np.convolve to that page. For the case considered here, np.convolve is a bit slower than scipy.ndimage.convolve1d, but for larger arrays, it looks like np.convolve can be much faster.

...

memory: the blitz code doesn't include the array copy (D), so the timing might be a bit misleading?

Look again at my %timeit calls in the ipython snippets. :)

...

I assume the as_strided call doesn't allocate any memory yet, so the timing should be correct. (or is this your comment about setting up V in the inner loop)

Yes, that's what I meant; if V has to be created inside the inner loop (so as_strided is called in the loop), the time it takes to create V eliminates the benefit of using the matrix approach. Warren

...

Josef

...
Warren

...
Josef

...
Thanks, G _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

2:16 a.m.

On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser wrote:

...

On Sat, Nov 12, 2011 at 9:59 AM, wrote:

...
On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser wrote:

...
On Sat, Nov 12, 2011 at 6:43 AM, wrote:

...
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu wrote:

...
Hi,

I am playing with multiple ways to speed up the following expression (it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions.

As Josef pointed out, this is a convolution. There are (at least) three convolution functions in numpy+scipy that you could use: numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't beat the simple expression a*x[2:] + b*x[1:-1] + c*x[:-2] Your idea of forming a matrix without copying memory can be done using "stride tricks", and for arrays of the size you are interested in, it computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators. This example is a easy to implement with scipy.weave.blitz, and it gives a big speedup.

Here's a test script:

#----- convolve1dtest.py -----

import numpy as np from numpy.lib.stride_tricks import as_strided from scipy.ndimage import convolve1d from scipy.weave import blitz

# weighting coefficients a = -0.5 b = 1.0 c = -0.25 w = np.array((a,b,c)) # Reversed w: rw = w[::-1]

# Length of C n = 250

# The original version of the calculation: # Some dummy data C = np.arange(float(n)) C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] # Save for comparison. C0 = C.copy()

# Do it again using a matrix multiplication. C = np.arange(float(n)) # The "virtual" matrix view of C. V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0])) C[1:-1] = np.dot(V, rw) C1 = C.copy()

# Again, with convolve1d this time. C = np.arange(float(n)) C[1:-1] = convolve1d(C, w)[1:-1] C2 = C.copy()

# scipy.weave.blitz C = np.arange(float(n)) # Must work with a copy, D, in the formula, because blitz does not use # a temporary variable. D = C.copy() expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" blitz(expr, check_size=0) C3 = C.copy()

# Verify that all the methods give the same result. print np.all(C0 == C1) print np.all(C0 == C2) print np.all(C0 == C3)

#-----

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py True True True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 100000 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw) 100000 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 100000 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 4.91 us per loop

scipy.weave.blitz is fastest (but note that blitz has already been called once, so the time shown does not include the compilation required in the first call). You could also try scipy.weave.inline, cython.inline, or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple expression or convolve1d. However, if you also have to set up V inside your inner loop, the speed gain will be lost. The relative speeds also depend on the size of C. For large C, the simple expression is faster than the matrix multiplication by V (but blitz is still faster). In the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py True True True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 10000 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw) 10000 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 10000 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 10.3 us per loop

blitz wins, the simple numpy expression is a distant second, and now the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)

Interesting, two questions

does scipy.signal convolve have a similar overhead as np.convolve1d ?

Did you mean np.convolve? There is no np.convolve1d. Some of the tests that I've done with convolution functions are here: http://www.scipy.org/Cookbook/ApplyFIRFilter I should add np.convolve to that page. For the case considered here, np.convolve is a bit slower than scipy.ndimage.convolve1d, but for larger arrays, it looks like np.convolve can be much faster.

...
memory: the blitz code doesn't include the array copy (D), so the timing might be a bit misleading?

Look again at my %timeit calls in the ipython snippets. :)

Sorry, I was reading way too fast or selectively (skipping the imports, namespaces for example). (I thought convolve1d was numpy not ndimage) I tried out your cookbook script a while ago, it's nice, I had only a rough idea about the timing before. Compared to the current case the x is much longer, (unless I don't remember or read it correctly.) Thanks, Josef

...

...
I assume the as_strided call doesn't allocate any memory yet, so the timing should be correct. (or is this your comment about setting up V in the inner loop)

Yes, that's what I meant; if V has to be created inside the inner loop (so as_strided is called in the loop), the time it takes to create V eliminates the benefit of using the matrix approach.

Warren

...
Josef

...
Warren

...
Josef

...
Thanks, G _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Warren Weckesser

2:51 a.m.

On Sat, Nov 12, 2011 at 11:16 AM, wrote:

...

On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser wrote:

...
On Sat, Nov 12, 2011 at 9:59 AM, wrote:

...
On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser wrote:

...
On Sat, Nov 12, 2011 at 6:43 AM, wrote:

...
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu wrote:

...
Hi,

I am playing with multiple ways to speed up the following

expression

...
...
...
...
...
(it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions.

As Josef pointed out, this is a convolution. There are (at least) three convolution functions in numpy+scipy that you could use: numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't beat the simple expression a*x[2:] + b*x[1:-1] + c*x[:-2] Your idea of forming a matrix without copying memory can be done using "stride tricks", and for arrays of the size you are interested in, it computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators. This example is a easy to implement with scipy.weave.blitz, and it gives a big speedup.

Here's a test script:

#----- convolve1dtest.py -----

import numpy as np from numpy.lib.stride_tricks import as_strided from scipy.ndimage import convolve1d from scipy.weave import blitz

# weighting coefficients a = -0.5 b = 1.0 c = -0.25 w = np.array((a,b,c)) # Reversed w: rw = w[::-1]

# Length of C n = 250

# The original version of the calculation: # Some dummy data C = np.arange(float(n)) C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] # Save for comparison. C0 = C.copy()

# Do it again using a matrix multiplication. C = np.arange(float(n)) # The "virtual" matrix view of C. V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0])) C[1:-1] = np.dot(V, rw) C1 = C.copy()

# Again, with convolve1d this time. C = np.arange(float(n)) C[1:-1] = convolve1d(C, w)[1:-1] C2 = C.copy()

# scipy.weave.blitz C = np.arange(float(n)) # Must work with a copy, D, in the formula, because blitz does not use # a temporary variable. D = C.copy() expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" blitz(expr, check_size=0) C3 = C.copy()

# Verify that all the methods give the same result. print np.all(C0 == C1) print np.all(C0 == C2) print np.all(C0 == C3)

#-----

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py True True True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 100000 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw) 100000 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 100000 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 4.91 us per loop

scipy.weave.blitz is fastest (but note that blitz has already been called once, so the time shown does not include the compilation required in the first call). You could also try scipy.weave.inline, cython.inline, or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple expression or convolve1d. However, if you also have to set up V inside your inner loop, the speed gain will be lost. The relative speeds also depend on the size of C. For large C, the simple expression is faster than the matrix multiplication by V (but blitz is still faster). In the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py True True True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 10000 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw) 10000 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 10000 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 10.3 us per loop

blitz wins, the simple numpy expression is a distant second, and now the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)

Interesting, two questions

does scipy.signal convolve have a similar overhead as np.convolve1d ?

Did you mean np.convolve? There is no np.convolve1d. Some of the tests that I've done with convolution functions are here: http://www.scipy.org/Cookbook/ApplyFIRFilter I should add np.convolve to that page. For the case considered here, np.convolve is a bit slower than scipy.ndimage.convolve1d, but for larger arrays, it looks like np.convolve can be much faster.

...
memory: the blitz code doesn't include the array copy (D), so the timing might be a bit misleading?

Look again at my %timeit calls in the ipython snippets. :)

Sorry, I was reading way too fast or selectively (skipping the imports, namespaces for example). (I thought convolve1d was numpy not ndimage)

I tried out your cookbook script a while ago, it's nice, I had only a rough idea about the timing before. Compared to the current case the x is much longer, (unless I don't remember or read it correctly.)

I just updated the page with np.convolve: http://www.scipy.org/Cookbook/ApplyFIRFilter A big part of the overhead in using np.convolve in that script is that the function only accepts 1-d arrays, so a python loop is required to apply it along an axis of a 2-d array. Warren

Geoffrey Zhu

1:07 p.m.

Hi Warren, On Sat, Nov 12, 2011 at 9:31 AM, Warren Weckesser wrote:

...

On Sat, Nov 12, 2011 at 6:43 AM, wrote:

...
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu wrote:

...
Hi,

I am playing with multiple ways to speed up the following expression (it is in the inner loop):

C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using numpy. It would be nice if I could create a Mx3 matrix without copying memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c]) to me except for the boundary conditions.

As Josef pointed out, this is a convolution. There are (at least) three convolution functions in numpy+scipy that you could use: numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't beat the simple expression a*x[2:] + b*x[1:-1] + c*x[:-2] Your idea of forming a matrix without copying memory can be done using "stride tricks", and for arrays of the size you are interested in, it computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators. This example is a easy to implement with scipy.weave.blitz, and it gives a big speedup.

Here's a test script:

#----- convolve1dtest.py -----

import numpy as np from numpy.lib.stride_tricks import as_strided from scipy.ndimage import convolve1d from scipy.weave import blitz

# weighting coefficients a = -0.5 b = 1.0 c = -0.25 w = np.array((a,b,c)) # Reversed w: rw = w[::-1]

# Length of C n = 250

# The original version of the calculation: # Some dummy data C = np.arange(float(n)) C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] # Save for comparison. C0 = C.copy()

# Do it again using a matrix multiplication. C = np.arange(float(n)) # The "virtual" matrix view of C. V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0])) C[1:-1] = np.dot(V, rw) C1 = C.copy()

# Again, with convolve1d this time. C = np.arange(float(n)) C[1:-1] = convolve1d(C, w)[1:-1] C2 = C.copy()

# scipy.weave.blitz C = np.arange(float(n)) # Must work with a copy, D, in the formula, because blitz does not use # a temporary variable. D = C.copy() expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" blitz(expr, check_size=0) C3 = C.copy()

# Verify that all the methods give the same result. print np.all(C0 == C1) print np.all(C0 == C2) print np.all(C0 == C3)

#-----

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py True True True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 100000 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw) 100000 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 100000 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 4.91 us per loop

scipy.weave.blitz is fastest (but note that blitz has already been called once, so the time shown does not include the compilation required in the first call). You could also try scipy.weave.inline, cython.inline, or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple expression or convolve1d. However, if you also have to set up V inside your inner loop, the speed gain will be lost. The relative speeds also depend on the size of C. For large C, the simple expression is faster than the matrix multiplication by V (but blitz is still faster). In the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py True True True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] 10000 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw) 10000 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] 10000 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) 100000 loops, best of 3: 10.3 us per loop

blitz wins, the simple numpy expression is a distant second, and now the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)

Warren

Thanks for the very helpful response. Fortunately I don't have to set up the matrix every time as I can work on the same C over and over again inside the loop. It's counterintuitive to see that the performance of dot() degrades as n goes up. If I didn't see the results, I would have guessed otherwise. Thanks again! Geoffrey

4545

Age (days ago)

4546

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Geoffrey Zhu
josef.pktd＠gmail.com
Warren Weckesser

speeding up the following expression

tags

participants (3)