[Numpy-discussion] Array vectorization in numpy

Wed Jul 20 02:49:21 EDT 2011

Those are very interesting examples. I think that pre-allocation is very
important, and something similar happens in Matlab if no pre-allocation is
done: it takes 3-4x longer than with pre-allocation.
The main difference is that Matlab is able to take into account a
pre-allocated array/matrix, probably avoiding the creation of a temporary
and writing the results directly in the pre-allocated array.

I think this is essential to speed up numpy. Maybe numexpr could handle this
in the future? Right now the general use of numexpr is result =
numexpr.evaluate("whatever"), so the same problem seems to be there.

With this I am not saying that numpy is not worth it, just that for many
applications (specially with huge matrices/arrays), pre-allocation does make
a huge difference, especially if we want to attract more people to using
numpy.

----------------------
Carlos Becker

On Wed, Jul 20, 2011 at 1:42 AM, Chad Netzer <chad.netzer at gmail.com> wrote:

> On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen <pav at iki.fi> wrote:
>
> >        k = m - 0.5
> >
> > does here the same thing as
> >
> >        k = np.empty_like(m)
> >        np.subtract(m, 0.5, out=k)
> >
> > The memory allocation (empty_like and the subsequent deallocation)
> > costs essentially nothing, and there are no temporaries or copying
> > in `subtract`.
>
> As verification:
>
> >>> import timeit
> >>> import numpy as np
> >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.53904647827148433
>
> >>> t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)',
> setup='import numpy as np;m = np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.54006035327911373
>
> The trivial difference is expected as extra python parsing overhead, I
> think.
>
> Which leads me to apologize, since in my previous post I clearly meant
> to type "m -= 0.5", not "m =- 0.5", which is *quite* a different
> operation...  Carlos, and Lutz, please take heed. :)  In fact, as Lutz
> pointed out, that example was not at all what I intended to show
> anyway.
>
>
> So, just to demonstrate how it was wrong:
>
> >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.058299207687377931
>
> >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.28192551136016847
>
> >>> t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m =
> np.ones([8092,8092],float)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.27014491558074949
>
> >>> t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m =
> np.ones([8092,8092],float); k = np.empty_like(m)')
> >>> np.mean(t.repeat(repeat=10, number=1))
> 0.54962997436523442
>
> Perhaps the difference in the last two simply comes down to cache
> effects (having to iterate over two different large memory blocks,
> rather than one)?
>
> -Chad
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110720/3466bfe0/attachment.html>