[Numpy-discussion] Array vectorization in numpy

Tue Jul 19 18:15:47 EDT 2011

On Tue, Jul 19, 2011 at 3:35 PM, Carlos Becker <carlosbecker at gmail.com> wrote:
> Thanks Chad for the explanation on those details. I am new to python and I

> However, if I don't, I obtain this 4x penalty with numpy, even with the
> 8092x8092 array. Would it be possible to do k = m - 0.5 and pre-alllocate k
> such that python does not have to waste time on that?

I suspect the 4x penalty is related to the expression evaluation
overhead (temporaries and copying), so hopefully numexpr() will help,
or just remembering to use the in-place operators whenever
appropriate.

To answer your question, though, you can allocate an array, without
initializing it, with the empty() function.  Note - if you aren't
absolutely sure you are going to overwrite every single element of the
array, this could leave you with uninitialized values in your array.
I'd just go ahead and use the zeros() function instead, to be safe
(it's initialized outside the timeit() timing loop):

%python
>>> import timeit
>>> import numpy as np

>>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float); k = np.zeros(m.size, m.dtype)')
>>> np.mean(t.repeat(repeat=10, number=1))
0.58557529449462886

>>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
0.53153839111328127

>>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
0.038648796081542966

As you can see, preallocation doesn't seem to affect the results all
that much, it's the overhead of creating a temporary, then copying it
to the result, that seems to matter here.  The in-place operation was
much faster.

Here we see that just copying m to k, takes up more time than the 'k =
m + 0.5' operation:

>>> t=timeit.Timer('k = m.copy()', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
0.63301105499267574

Possibly that is because 8K*8K matrices are a bit too big for this
kind of benchmark; I recommend also trying it with 4K*4K, and your
original 2K*2K to see if the results are consistent.  Remember, the
timeit() setup is hiding the initial allocation time of m from the
results, but it still exists, and should be accounted for in
determining the overall execution time of the in-place operation
results.

Also, with these large array sizes, make sure these tests are in a
fresh python instance, so that the process  address space isn't
tainted with old object allocations (which may cause your OS to 'swap'
the now unused memory, and ruin your timing values).

-Chad