the difference between "+" and np.add?

Dear all, if I have two ndarray arr1 and arr2 (with the same shape), is there some difference when I do: arr = arr1 + arr2 and arr = np.add(arr1, arr2), and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5, then I cannot use np.add anymore as it only recieves 2 arguments. then what's the best practice to add these arrays? should I do arr = arr1 + arr2 + arr3 + arr4 + arr5 or I do arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)? because I just noticed recently that there are functions like np.add, np.divide, np.substract... before I am using all like directly arr1/arr2, rather than np.divide(arr1,arr2). best regards, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************

On Thu, Nov 22, 2012 at 12:41 PM, Chao YUE <chaoyuejoy@gmail.com> wrote:
Dear all,
if I have two ndarray arr1 and arr2 (with the same shape), is there some difference when I do:
arr = arr1 + arr2
and
arr = np.add(arr1, arr2),
and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5, then I cannot use np.add anymore as it only recieves 2 arguments. then what's the best practice to add these arrays? should I do
arr = arr1 + arr2 + arr3 + arr4 + arr5
or I do
arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?
because I just noticed recently that there are functions like np.add, np.divide, np.substract... before I am using all like directly arr1/arr2, rather than np.divide(arr1,arr2).
For numpy arrays, a + b just calls np.add(a, b) internally. You can use whichever looks nicer to you. Usually people just use + np.add can be more flexible, though. For instance, you can write np.add(a, b, out=c) but there's no way to pass extra arguments to the "+" operator. In fact np.add is not just a function, it's a "ufunc object" (see the numpy documentation for some more details). So it also provides methods like np.add.reduce(a) # the same as np.sum (except, sadly, with different defaults) np.add.accumulate(a) # like np.cumsum np.add.reduceat(a, indices) # complicated, see docs And there are lots of these ufunc objects, all of which provide these same interfaces. Some of them have associated operators like "+", but others don't. -n

On 11/22/12 1:41 PM, Chao YUE wrote:
Dear all,
if I have two ndarray arr1 and arr2 (with the same shape), is there some difference when I do:
arr = arr1 + arr2
and
arr = np.add(arr1, arr2),
and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5, then I cannot use np.add anymore as it only recieves 2 arguments. then what's the best practice to add these arrays? should I do
arr = arr1 + arr2 + arr3 + arr4 + arr5
or I do
arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?
because I just noticed recently that there are functions like np.add, np.divide, np.substract... before I am using all like directly arr1/arr2, rather than np.divide(arr1,arr2).
As Nathaniel said, there is not a difference in terms of *what* is computed. However, the methods that you suggested actually differ on *how* they are computed, and that has dramatic effects on the time used. For example: In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)] In []: %time arr1 + arr2 + arr3 + arr4 + arr5 CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s Wall time: 0.15 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07]) In []: %time np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0) CPU times: user 2.98 s, sys: 0.15 s, total: 3.13 s Wall time: 3.14 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07]) The difference is how memory is used. In the first case, the additional memory was just a temporary with the size of the operands, while for the second case a big temporary has to be created, so the difference in is speed is pretty large. There are also ways to minimize the size of temporaries, and numexpr is one of the simplests: In []: import numexpr as ne In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5') CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s Wall time: 0.04 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07]) Again, the computations are the same, but how you manage memory is critical. -- Francesc Alted

Thanks for the explanations. Yes, what I am thinking is basically the same but I didn't test the time. I never try numexpr, but it would be nice to try it. Chao On Thu, Nov 22, 2012 at 3:20 PM, Francesc Alted <francesc@continuum.io>wrote:
On 11/22/12 1:41 PM, Chao YUE wrote:
Dear all,
if I have two ndarray arr1 and arr2 (with the same shape), is there some difference when I do:
arr = arr1 + arr2
and
arr = np.add(arr1, arr2),
and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5, then I cannot use np.add anymore as it only recieves 2 arguments. then what's the best practice to add these arrays? should I do
arr = arr1 + arr2 + arr3 + arr4 + arr5
or I do
arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?
because I just noticed recently that there are functions like np.add, np.divide, np.substract... before I am using all like directly arr1/arr2, rather than np.divide(arr1,arr2).
As Nathaniel said, there is not a difference in terms of *what* is computed. However, the methods that you suggested actually differ on *how* they are computed, and that has dramatic effects on the time used. For example:
In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
In []: %time arr1 + arr2 + arr3 + arr4 + arr5 CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s Wall time: 0.15 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
In []: %time np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0) CPU times: user 2.98 s, sys: 0.15 s, total: 3.13 s Wall time: 3.14 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
The difference is how memory is used. In the first case, the additional memory was just a temporary with the size of the operands, while for the second case a big temporary has to be created, so the difference in is speed is pretty large.
There are also ways to minimize the size of temporaries, and numexpr is one of the simplests:
In []: import numexpr as ne
In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5') CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s Wall time: 0.04 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
Again, the computations are the same, but how you manage memory is critical.
-- Francesc Alted
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************

On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted <francesc@continuum.io> wrote:
As Nathaniel said, there is not a difference in terms of *what* is computed. However, the methods that you suggested actually differ on *how* they are computed, and that has dramatic effects on the time used. For example:
In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
In []: %time arr1 + arr2 + arr3 + arr4 + arr5 CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s Wall time: 0.15 s
There are also ways to minimize the size of temporaries, and numexpr is one of the simplests:
but you can also use np.add (and friends) to reduce the number of temporaries. It can make a difference: In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5): ....: result = arr1 + arr2 ....: np.add(result, arr3, out=result) ....: np.add(result, arr4, out=result) ....: np.add(result, arr5, out=result) In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5 1 loops, best of 3: 528 ms per loop In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) 1 loops, best of 3: 293 ms per loop (don't have numexpr on this machine for a comparison) NOTE: no point in going through all this unless this operation is really a bottleneck in your code -- profile, profile, profile! -Chris PS: you can put a loop in the function to make it more generic: In [18]: def add_n_arrays(*args): ....: result = args[0] + args[1] ....: for arr in args[2:]: ....: np.add(result, arr, result) ....: return result In [21]: timeit add_n_arrays(arr1, arr2, arr3, arr4, arr5) 1 loops, best of 3: 317 ms per loop
In []: import numexpr as ne
In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5') CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s Wall time: 0.04 s Out[]: array([ 0.00000000e+00, 5.00000000e+00, 1.00000000e+01, ..., 4.99999850e+07, 4.99999900e+07, 4.99999950e+07])
Again, the computations are the same, but how you manage memory is critical.
-- Francesc Alted
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 11/23/12 8:00 PM, Chris Barker - NOAA Federal wrote:
On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted <francesc@continuum.io> wrote:
As Nathaniel said, there is not a difference in terms of *what* is computed. However, the methods that you suggested actually differ on *how* they are computed, and that has dramatic effects on the time used. For example:
In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]
In []: %time arr1 + arr2 + arr3 + arr4 + arr5 CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s Wall time: 0.15 s There are also ways to minimize the size of temporaries, and numexpr is one of the simplests: but you can also use np.add (and friends) to reduce the number of temporaries. It can make a difference:
In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5): ....: result = arr1 + arr2 ....: np.add(result, arr3, out=result) ....: np.add(result, arr4, out=result) ....: np.add(result, arr5, out=result)
In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5 1 loops, best of 3: 528 ms per loop
In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) 1 loops, best of 3: 293 ms per loop
(don't have numexpr on this machine for a comparison)
Yes, you are right. However, numexpr still can beat this: In [8]: timeit arr1 + arr2 + arr3 + arr4 + arr5 10 loops, best of 3: 138 ms per loop In [9]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) 10 loops, best of 3: 74.3 ms per loop In [10]: timeit ne.evaluate("arr1 + arr2 + arr3 + arr4 + arr5") 10 loops, best of 3: 20.8 ms per loop The reason is that numexpr is multithreaded (using 6 cores above), and for memory-bounded problems like this one, fetching data in different threads is more efficient than using a single thread: In [12]: timeit arr1.copy() 10 loops, best of 3: 41 ms per loop In [13]: ne.set_num_threads(1) Out[13]: 6 In [14]: timeit ne.evaluate("arr1") 10 loops, best of 3: 30.7 ms per loop In [15]: ne.set_num_threads(6) Out[15]: 1 In [16]: timeit ne.evaluate("arr1") 100 loops, best of 3: 13.4 ms per loop I.e., the joy of multi-threading is that it not only buys you CPU speed, but can also bring your data from memory faster. So yeah, modern applications *do* need multi-threading for getting good performance. -- Francesc Alted
participants (4)
-
Chao YUE
-
Chris Barker - NOAA Federal
-
Francesc Alted
-
Nathaniel Smith