Casting to np.byte before clearing values
Hi all, I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster. I imagine there is some kind of special treatment for byte arrays but I've no clue. # Native float Z_float = np.ones(1000000, float) Z_int = np.ones(1000000, int) %timeit Z_float[...] = 0 1000 loops, best of 3: 361 µs per loop %timeit Z_int[...] = 0 1000 loops, best of 3: 366 µs per loop %timeit Z_float.view(np.byte)[...] = 0 1000 loops, best of 3: 267 µs per loop %timeit Z_int.view(np.byte)[...] = 0 1000 loops, best of 3: 266 µs per loop Nicolas
On Mo, 20161226 at 10:34 +0100, Nicolas P. Rougier wrote:
Hi all,
I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster. I imagine there is some kind of special treatment for byte arrays but I've no clue.
Sure, if its a 1byte width type, the code will end up calling `memset`. If it is not, it will end up calling a loop with: while (N > 0) { *dst = output; *dst += 8; /* or whatever element size/stride is */ N; } now why this gives such a difference, I don't really know, but I guess it is not too surprising and may depend on other things as well.  Sebastian
# Native float Z_float = np.ones(1000000, float) Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0 1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0 1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0 1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0 1000 loops, best of 3: 266 µs per loop
Nicolas _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
Thanks for the explanation Sebastian, makes sense. Nicolas
On 26 Dec 2016, at 11:48, Sebastian Berg
wrote: On Mo, 20161226 at 10:34 +0100, Nicolas P. Rougier wrote:
Hi all,
I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster. I imagine there is some kind of special treatment for byte arrays but I've no clue.
Sure, if its a 1byte width type, the code will end up calling `memset`. If it is not, it will end up calling a loop with:
while (N > 0) { *dst = output; *dst += 8; /* or whatever element size/stride is */ N; }
now why this gives such a difference, I don't really know, but I guess it is not too surprising and may depend on other things as well.
 Sebastian
# Native float Z_float = np.ones(1000000, float) Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0 1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0 1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0 1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0 1000 loops, best of 3: 266 µs per loop
Nicolas _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Mon, Dec 26, 2016 at 1:34 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster. I imagine there is some kind of special treatment for byte arrays but I've no clue.
I notice that the code is simply setting a value using broadcasting  I don't think there is anything special about zero in that case. But your subject refers to "clearing" an array. So I wonder if you have a use case where the performance difference matters, in which case _maybe_ it would be worth having a ndarray.zero() method that efficiently zeros out an array. Actually, there is ndarray.fill(): In [7]: %timeit Z_float[...] = 0 1000 loops, best of 3: 380 µs per loop In [8]: %timeit Z_float.view(np.byte)[...] = 0 1000 loops, best of 3: 271 µs per loop In [9]: %timeit Z_float.fill(0) 1000 loops, best of 3: 363 µs per loop which seems to take an insignificantly shorter time than assignment. Probably because it's doing exactly the same loop. whereas a .zero() could use a memset, like it does with bytes. can't say I have a usecase that would justify this, though. CHB
# Native float Z_float = np.ones(1000000, float) Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0 1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0 1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0 1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0 1000 loops, best of 3: 266 µs per loop
Nicolas _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
 Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
Yes, clearing is not the proper word but the "trick" works only work for 0 (I'll get the same result in both cases). Nicolas
On 27 Dec 2016, at 20:52, Chris Barker
wrote: On Mon, Dec 26, 2016 at 1:34 AM, Nicolas P. Rougier
wrote: I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster. I imagine there is some kind of special treatment for byte arrays but I've no clue.
I notice that the code is simply setting a value using broadcasting  I don't think there is anything special about zero in that case. But your subject refers to "clearing" an array.
So I wonder if you have a use case where the performance difference matters, in which case _maybe_ it would be worth having a ndarray.zero() method that efficiently zeros out an array.
Actually, there is ndarray.fill():
In [7]: %timeit Z_float[...] = 0
1000 loops, best of 3: 380 µs per loop
In [8]: %timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 271 µs per loop
In [9]: %timeit Z_float.fill(0)
1000 loops, best of 3: 363 µs per loop
which seems to take an insignificantly shorter time than assignment. Probably because it's doing exactly the same loop.
whereas a .zero() could use a memset, like it does with bytes.
can't say I have a usecase that would justify this, though.
CHB
# Native float Z_float = np.ones(1000000, float) Z_int = np.ones(1000000, int)
%timeit Z_float[...] = 0 1000 loops, best of 3: 361 µs per loop
%timeit Z_int[...] = 0 1000 loops, best of 3: 366 µs per loop
%timeit Z_float.view(np.byte)[...] = 0 1000 loops, best of 3: 267 µs per loop
%timeit Z_int.view(np.byte)[...] = 0 1000 loops, best of 3: 266 µs per loop
Nicolas _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion

Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (3)

Chris Barker

Nicolas P. Rougier

Sebastian Berg