[Numpy-discussion] Array vectorization in numpy

Chad Netzer chad.netzer at gmail.com
Tue Jul 19 15:51:51 EDT 2011


On Tue, Jul 19, 2011 at 2:27 PM, Carlos Becker <carlosbecker at gmail.com> wrote:
> Hi, everything was run on linux.

> Placing parentheses around the scalar multipliers shows that it seems to
> have to do with how expressions are handled, is there sometihng that can be
> done about this so that numpy can deal with expressions rather than single
> operations chained by python itself?

Numpy is constrained (when using scalars) to Python's normal
expression ordering rules, which tend to evaluate left to right.  So,
once an expression gets promoted to an array, adding more scalars will
do array math rather than being able to collapse all the scalar math.
To extend my example from before:

>>> t=timeit.Timer('k = m - 0.5 + 0.4 - 0.3 + 0.2 - 0.1', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
2.9083001852035522

>>> t=timeit.Timer('k = 0.5 + 0.4 - 0.3 + 0.2 - 0.1 + m', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
0.52074816226959231

In the second case, the first 4 sums are done in scalar math,
effectively collapsing the work down to '0.7 + m', however in the
first case the whole expression is upconverted to an array computation
right from the start, making the total amount of work much greater.

If python had a way of exposing it's expression tree to objects
*during execution* and allowed for delayed expression evaluation, such
quirks might be avoidable.  But it's a complex issue, and not always a
problem in practice.  In general, as with all computation numerics,
you have to be aware of the underlying evaluation order and
associativity to fully understand your results, and if you understand
that, you can optimize you yourself.

So, to show the difference with your example:

>>> t=timeit.Timer('k = (m - 0.5)*0.3*0.2', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
1.6823677778244019

>>> t=timeit.Timer('k = 0.2*0.3*(m - 0.5)', setup='import numpy as np;m = np.ones([8092,8092],float)')
>>> np.mean(t.repeat(repeat=10, number=1))
1.1084311008453369


-C



More information about the NumPy-Discussion mailing list