On Sun, May 16, 2010 at 1:18 PM, Eric Firing <efiring@hawaii.edu> wrote:

On 05/16/2010 09:24 AM, Keith Goodman wrote:

On Sun, May 16, 2010 at 12:14 PM, Davide Lasagna <lasagnadavide@gmail.com> wrote:

Hi all, What is the fastest and lowest memory consumption way to compute this? y = np.arange(2**24) bases = y[1:] + y[:-1] Actually it is already quite fast, but i'm not sure whether it is occupying some temporary memory is the summation. Any help is appreciated.

Is it OK to modify y? If so:

y = np.arange(2**24) z = y[1:] + y[:-1] #<--- Slow way y[:-1] += y[1:] #<--- Fast way (y[:-1] == z).all() True

It's not faster on my machine, as timed with ipython:

In [8]:y = np.arange(2**24)

In [9]:b = np.array([1,1], dtype=int)

In [10]:timeit np.convolve(y, b, 'valid') 1 loops, best of 3: 484 ms per loop

In [11]:timeit y[1:] + y[:-1] 10 loops, best of 3: 181 ms per loop

In [12]:timeit y[:-1] += y[1:] 10 loops, best of 3: 183 ms per loop

If we include the fake data generation in the timing, to reduce cache bias in the repeated runs, the += method is noticeably slower.

In [13]:timeit y = np.arange(2**24); z = y[1:] + y[:-1] 1 loops, best of 3: 297 ms per loop

In [14]:timeit y = np.arange(2**24); y[:-1] += y[1:]; z = y[:-1] 1 loops, best of 3: 322 ms per loop

That's interesting. On my computer it is faster:

timeit y = np.arange(2**24); z = y[1:] + y[:-1] 10 loops, best of 3: 144 ms per loop timeit y = np.arange(2**24); y[:-1] += y[1:]; z = y[:-1] 10 loops, best of 3: 114 ms per loop

What accounts for the performance difference? Cache size? I assume the in-place version uses less memory. Neat if timeit reported memory usage. I haven't tried numexp, that might be something to try too.