[Numpy-discussion] the difference between "+" and np.add?

Thu Nov 22 13:09:02 EST 2012

Thanks for the explanations. Yes, what I am thinking is basically the same
but I didn't test the time.

I never try numexpr, but it would be nice to try it.

Chao

On Thu, Nov 22, 2012 at 3:20 PM, Francesc Alted <francesc at continuum.io>wrote:

> On 11/22/12 1:41 PM, Chao YUE wrote:
> > Dear all,
> >
> > if I have two ndarray arr1 and arr2 (with the same shape), is there
> > some difference when I do:
> >
> > arr = arr1 + arr2
> >
> > and
> >
> > arr = np.add(arr1, arr2),
> >
> > and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5,
> > then I cannot use np.add anymore as it only recieves 2 arguments.
> > then what's the best practice to add these arrays? should I do
> >
> > arr = arr1 + arr2 + arr3 + arr4 + arr5
> >
> > or I do
> >
> > arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?
> >
> > because I just noticed recently that there are functions like np.add,
> > np.divide, np.substract... before I am using all like directly
> > arr1/arr2, rather than np.divide(arr1,arr2).
>
> As Nathaniel said, there is not a difference in terms of *what* is
> computed.  However, the methods that you suggested actually differ on
> *how* they are computed, and that has dramatic effects on the time
> used.  For example:
>
> In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
>
> In []: %time arr1 + arr2 + arr3 + arr4 + arr5
> CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
> Wall time: 0.15 s
> Out[]:
> array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
>           4.99999850e+07,   4.99999900e+07,   4.99999950e+07])
>
> In []: %time np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)
> CPU times: user 2.98 s, sys: 0.15 s, total: 3.13 s
> Wall time: 3.14 s
> Out[]:
> array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
>           4.99999850e+07,   4.99999900e+07,   4.99999950e+07])
>
> The difference is how memory is used.  In the first case, the additional
> memory was just a temporary with the size of the operands, while for the
> second case a big temporary has to be created, so the difference in is
> speed is pretty large.
>
> There are also ways to minimize the size of temporaries, and numexpr is
> one of the simplests:
>
> In []: import numexpr as ne
>
> In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5')
> CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s
> Wall time: 0.04 s
> Out[]:
> array([  0.00000000e+00,   5.00000000e+00,   1.00000000e+01, ...,
>           4.99999850e+07,   4.99999900e+07,   4.99999950e+07])
>
> Again, the computations are the same, but how you manage memory is
> critical.
>
> --
> Francesc Alted
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20121122/6ffb8843/attachment.html>