[Numpy-discussion] the difference between "+" and np.add?

Chris Barker - NOAA Federal chris.barker at noaa.gov
Fri Nov 23 14:00:35 EST 2012


On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted <francesc at continuum.io> wrote:
> As Nathaniel said, there is not a difference in terms of *what* is
> computed.  However, the methods that you suggested actually differ on
> *how* they are computed, and that has dramatic effects on the time
> used.  For example:
>
> In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
>
> In []: %time arr1 + arr2 + arr3 + arr4 + arr5
> CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
> Wall time: 0.15 s

> There are also ways to minimize the size of temporaries, and numexpr is
> one of the simplests:

but you can also use np.add (and friends) to reduce the number of
temporaries. It can make a difference:

In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5):
   ....:     result = arr1 + arr2
   ....:     np.add(result, arr3, out=result)
   ....:     np.add(result, arr4, out=result)
   ....:     np.add(result, arr5, out=result)

In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5
1 loops, best of 3: 528 ms per loop

In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
1 loops, best of 3: 293 ms per loop

(don't have numexpr on this machine for a comparison)

NOTE: no point in going through all this unless this operation is
really a bottleneck in your code -- profile, profile, profile!

-Chris

PS: you can put a loop in the function to make it more generic:

In [18]: def add_n_arrays(*args):
   ....:     result = args[0] + args[1]
   ....:     for arr in args[2:]:
   ....:         np.add(result, arr, result)
   ....:     return result

In [21]: timeit add_n_arrays(arr1, arr2, arr3, arr4, arr5)
1 loops, best of 3: 317 ms per loop




> In []: import numexpr as ne
>
> In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5')
> CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s
> Wall time: 0.04 s
> Out[]:
> array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
>           4.99999850e+07,   4.99999900e+07,   4.99999950e+07])
>
> Again, the computations are the same, but how you manage memory is critical.
>
> --
> Francesc Alted
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list