[Numpy-discussion] Array vectorization in numpy

Wed Jul 20 07:02:52 EDT 2011

On Tue, Jul 19, 2011 at 11:49 PM, Carlos Becker <carlosbecker at gmail.com> wrote:
> Those are very interesting examples.

Cool.

> I think that pre-allocation is very
> important, and something similar happens in Matlab if no pre-allocation is
> done: it takes 3-4x longer than with pre-allocation.

Can you provide a simple example of this in Matlab code?  I'd like to
see the code you are testing with, and the numbers you are reporting,
all in one post (please).  So far we've seen some code in your first
post, some numbers in your follow up, but being spread out it makes it
hard to know what exactly you are asserting.

> The main difference is that Matlab is able to take into account a
> pre-allocated array/matrix, probably avoiding the creation of a temporary
> and writing the results directly in the pre-allocated array.

Now I believe you are guessing.  My last example showed the effect of
only using a pre-allocated result array in numpy; It was still slower
than an in place operation (ie. overwriting the array used to
calculate the result), which may be due to machine memory
considerations.  The simple operation you are testing (an array
operated on by a scalar) is dominated by the memory access speeds of
reading and writing to the large arrays.  With a separate,
pre-allocated array, there is twice the memory to read and write to,
and hence twice the time.  At least that's my guess, are you saying
Matlab does this 3-4 times faster than numpy?  I'd really like to see
the *exact* code you are testing, with the specific numbers you are
getting for that code, if it's not too much trouble.

> With this I am not saying that numpy is not worth it, just that for many
> applications (specially with huge matrices/arrays), pre-allocation does make
> a huge difference, especially if we want to attract more people to using
> numpy.

What do you mean by 'pre-allocated'?  It is certainly perfectly
feasible to pre-allocate numpy arrays and use them as the target of
operations, as my examples showed.  And you can also easily do sums
and multiplies using in-place array operations, with Python
nomenclature.  It's true that you have to do some work at optimizing
some expressions if you wish to avoid temporary array objects being
created during multi-term expression evaluations, but the manual
discusses this and gives the reasons why.  Is this what you mean by
pre-allocation?

I'm still not sure where exactly you are seeing a problem; can you
show us exactly what Matlab code cannot be made to run as efficiently
with numpy?

-Chad