Hi,
A friend of mine just pointed out that dividing by int64 is considerably slower than multiplying in numpy:
<script> from timeit import timeit
import numpy as np import numpy.random as npr
sz = (1024,) a32 = npr.randint(1, 5001, sz).astype(np.int32) b32 = npr.randint(1, 5001, sz).astype(np.int32) a64 = a32.astype(np.int64) b64 = b32.astype(np.int64)
print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32') print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32') print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64') print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64') </script>
gives (64 bit Debian Intel system, numpy trunk):
Mul32 2.71295905113 Div32 6.61985301971 Mul64 2.78101611137 Div64 22.8217148781
with similar values for numpy 1.5.1.
Crude testing with Matlab and Octave suggests they do not seem to have this same difference:
divtest
Mul32 4.300662 Div32 5.638622 Mul64 7.894490 Div64 18.121182
octave:2> divtest Mul32 3.960577 Div32 6.553704 Mul64 7.268324 Div64 13.670760
(files attached)
Is there something specific about division in numpy that would cause this slowdown?
Cheers,
Matthew
On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett matthew.brett@gmail.comwrote:
Hi,
A friend of mine just pointed out that dividing by int64 is considerably slower than multiplying in numpy:
<script> from timeit import timeit import numpy as np import numpy.random as npr sz = (1024,) a32 = npr.randint(1, 5001, sz).astype(np.int32) b32 = npr.randint(1, 5001, sz).astype(np.int32) a64 = a32.astype(np.int64) b64 = b32.astype(np.int64) print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32') print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32') print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64') print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64') </script>
gives (64 bit Debian Intel system, numpy trunk):
Mul32 2.71295905113 Div32 6.61985301971 Mul64 2.78101611137 Div64 22.8217148781
with similar values for numpy 1.5.1.
Crude testing with Matlab and Octave suggests they do not seem to have this same difference:
divtest
Mul32 4.300662 Div32 5.638622 Mul64 7.894490 Div64 18.121182
octave:2> divtest Mul32 3.960577 Div32 6.553704 Mul64 7.268324 Div64 13.670760
(files attached)
Is there something specific about division in numpy that would cause this slowdown?
Numpy is doing an integer divide unless you are using Python 3.x. The np.true_divide ufunc will speed things up a bit. I'm not sure what Matlab/Octave are doing for division in this case.
Chuck
On Mon, Aug 13, 2012 at 10:32 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett matthew.brett@gmail.comwrote:
Hi,
A friend of mine just pointed out that dividing by int64 is considerably slower than multiplying in numpy:
<script> from timeit import timeit import numpy as np import numpy.random as npr sz = (1024,) a32 = npr.randint(1, 5001, sz).astype(np.int32) b32 = npr.randint(1, 5001, sz).astype(np.int32) a64 = a32.astype(np.int64) b64 = b32.astype(np.int64) print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32') print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32') print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64') print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64') </script>
gives (64 bit Debian Intel system, numpy trunk):
Mul32 2.71295905113 Div32 6.61985301971 Mul64 2.78101611137 Div64 22.8217148781
with similar values for numpy 1.5.1.
Crude testing with Matlab and Octave suggests they do not seem to have this same difference:
divtest
Mul32 4.300662 Div32 5.638622 Mul64 7.894490 Div64 18.121182
octave:2> divtest Mul32 3.960577 Div32 6.553704 Mul64 7.268324 Div64 13.670760
(files attached)
Is there something specific about division in numpy that would cause this slowdown?
Numpy is doing an integer divide unless you are using Python 3.x. The np.true_divide ufunc will speed things up a bit. I'm not sure what Matlab/Octave are doing for division in this case.
For int64:
In [23]: timeit multiply(a, b) 100000 loops, best of 3: 3.31 us per loop
In [24]: timeit true_divide(a, b) 100000 loops, best of 3: 9.35 us per loop
Chuck
Hi,
On Mon, Aug 13, 2012 at 9:49 PM, Charles R Harris charlesr.harris@gmail.com wrote:
On Mon, Aug 13, 2012 at 10:32 PM, Charles R Harris charlesr.harris@gmail.com wrote:
On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
A friend of mine just pointed out that dividing by int64 is considerably slower than multiplying in numpy:
<script> from timeit import timeit import numpy as np import numpy.random as npr sz = (1024,) a32 = npr.randint(1, 5001, sz).astype(np.int32) b32 = npr.randint(1, 5001, sz).astype(np.int32) a64 = a32.astype(np.int64) b64 = b32.astype(np.int64) print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32') print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32') print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64') print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64') </script>
gives (64 bit Debian Intel system, numpy trunk):
Mul32 2.71295905113 Div32 6.61985301971 Mul64 2.78101611137 Div64 22.8217148781
with similar values for numpy 1.5.1.
Crude testing with Matlab and Octave suggests they do not seem to have this same difference:
divtest
Mul32 4.300662 Div32 5.638622 Mul64 7.894490 Div64 18.121182
octave:2> divtest Mul32 3.960577 Div32 6.553704 Mul64 7.268324 Div64 13.670760
(files attached)
Is there something specific about division in numpy that would cause this slowdown?
Numpy is doing an integer divide unless you are using Python 3.x. The np.true_divide ufunc will speed things up a bit. I'm not sure what Matlab/Octave are doing for division in this case.
For int64:
In [23]: timeit multiply(a, b) 100000 loops, best of 3: 3.31 us per loop
In [24]: timeit true_divide(a, b) 100000 loops, best of 3: 9.35 us per loop
Thanks for looking into this. It does look like int64 division is particularly slow for the systems I'm testing on. Here's a cython cpointer version compared to the numpy version:
Numpy versions as above:
Mul32 3.15036797523 Div32 6.68296504021 Mul64 4.50731801987 Div64 22.9649209976
Cython versions using pointers into contiguous array
Mul32cy 1.21214485168 Div32cy 6.75360918045 Mul64cy 3.98143696785 Div64cy 31.3645660877
# Timing using double Multfcy 4.11406683922 Divfcy 12.603869915
(code attached).
Matlab certainly returns integers from its int64 division, so I'm not sure why it does not have such an extreme slowdown for int64 division.
Cheers,
Matthew
Just to be sure every body know, the hardware division is always slower then the hardware multiplication. Doing division is much more complex, so it need more circuitery and can't be pipelined. So we can't reuse part of the circuitery in parallel. So hardware division will always be slower then hardware multiplication.
About matlab, this could mean they generate not optimized code(code that is not bound by the hardware division/multiplication speed). That could explain what you saw.
Fred
On Thu, Aug 16, 2012 at 5:45 PM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
On Mon, Aug 13, 2012 at 9:49 PM, Charles R Harris charlesr.harris@gmail.com wrote:
On Mon, Aug 13, 2012 at 10:32 PM, Charles R Harris charlesr.harris@gmail.com wrote:
On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
A friend of mine just pointed out that dividing by int64 is considerably slower than multiplying in numpy:
<script> from timeit import timeit import numpy as np import numpy.random as npr sz = (1024,) a32 = npr.randint(1, 5001, sz).astype(np.int32) b32 = npr.randint(1, 5001, sz).astype(np.int32) a64 = a32.astype(np.int64) b64 = b32.astype(np.int64) print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32') print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32') print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64') print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64') </script>
gives (64 bit Debian Intel system, numpy trunk):
Mul32 2.71295905113 Div32 6.61985301971 Mul64 2.78101611137 Div64 22.8217148781
with similar values for numpy 1.5.1.
Crude testing with Matlab and Octave suggests they do not seem to have this same difference:
divtest
Mul32 4.300662 Div32 5.638622 Mul64 7.894490 Div64 18.121182
octave:2> divtest Mul32 3.960577 Div32 6.553704 Mul64 7.268324 Div64 13.670760
(files attached)
Is there something specific about division in numpy that would cause this slowdown?
Numpy is doing an integer divide unless you are using Python 3.x. The np.true_divide ufunc will speed things up a bit. I'm not sure what Matlab/Octave are doing for division in this case.
For int64:
In [23]: timeit multiply(a, b) 100000 loops, best of 3: 3.31 us per loop
In [24]: timeit true_divide(a, b) 100000 loops, best of 3: 9.35 us per loop
Thanks for looking into this. It does look like int64 division is particularly slow for the systems I'm testing on. Here's a cython cpointer version compared to the numpy version:
Numpy versions as above:
Mul32 3.15036797523 Div32 6.68296504021 Mul64 4.50731801987 Div64 22.9649209976
Cython versions using pointers into contiguous array
Mul32cy 1.21214485168 Div32cy 6.75360918045 Mul64cy 3.98143696785 Div64cy 31.3645660877
# Timing using double Multfcy 4.11406683922 Divfcy 12.603869915
(code attached).
Matlab certainly returns integers from its int64 division, so I'm not sure why it does not have such an extreme slowdown for int64 division.
Cheers,
Matthew _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (3)

Charles R Harris

Frédéric Bastien

Matthew Brett