# [Numpy-discussion] Slow divide of int64?

Frédéric Bastien nouiz at nouiz.org
Fri Aug 17 09:03:45 EDT 2012

```Just to be sure every body know, the hardware division is always
slower then the hardware multiplication. Doing division is much more
complex, so it need more circuitery and can't be pipelined. So we
can't reuse part of the circuitery in parallel. So hardware division
will always be slower then hardware multiplication.

About matlab, this could mean they generate not optimized code(code
that is not bound by the hardware division/multiplication speed). That
could explain what you saw.

Fred

On Thu, Aug 16, 2012 at 5:45 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Mon, Aug 13, 2012 at 9:49 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Mon, Aug 13, 2012 at 10:32 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>>
>>> On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett <matthew.brett at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> A friend of mine just pointed out that dividing by int64 is
>>>> considerably slower than multiplying in numpy:
>>>>
>>>> <script>
>>>> from timeit import timeit
>>>>
>>>> import numpy as np
>>>> import numpy.random as npr
>>>>
>>>> sz = (1024,)
>>>> a32 = npr.randint(1, 5001, sz).astype(np.int32)
>>>> b32 = npr.randint(1, 5001, sz).astype(np.int32)
>>>> a64 = a32.astype(np.int64)
>>>> b64 = b32.astype(np.int64)
>>>>
>>>> print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32')
>>>> print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32')
>>>> print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64')
>>>> print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64')
>>>> </script>
>>>>
>>>> gives (64 bit Debian Intel system, numpy trunk):
>>>>
>>>> Mul32 2.71295905113
>>>> Div32 6.61985301971
>>>> Mul64 2.78101611137
>>>> Div64 22.8217148781
>>>>
>>>> with similar values for numpy 1.5.1.
>>>>
>>>> Crude testing with Matlab and Octave suggests they do not seem to have
>>>> this same difference:
>>>>
>>>> >> divtest
>>>> Mul32 4.300662
>>>> Div32 5.638622
>>>> Mul64 7.894490
>>>> Div64 18.121182
>>>>
>>>> octave:2> divtest
>>>> Mul32 3.960577
>>>> Div32 6.553704
>>>> Mul64 7.268324
>>>> Div64 13.670760
>>>>
>>>> (files attached)
>>>>
>>>> Is there something specific about division in numpy that would cause
>>>> this slowdown?
>>>>
>>>
>>> Numpy is doing an integer divide unless you are using Python 3.x. The
>>> np.true_divide ufunc will speed things up a bit. I'm not sure what
>>> Matlab/Octave are doing for division in this case.
>>>
>>
>> For int64:
>>
>> In [23]: timeit multiply(a, b)
>> 100000 loops, best of 3: 3.31 us per loop
>>
>> In [24]: timeit true_divide(a, b)
>> 100000 loops, best of 3: 9.35 us per loop
>
> Thanks for looking into this.  It does look like int64 division is
> particularly slow for the systems I'm testing on.  Here's a cython
> c-pointer version compared to the numpy version:
>
> Numpy versions as above:
>
> Mul32 3.15036797523
> Div32 6.68296504021
> Mul64 4.50731801987
> Div64 22.9649209976
>
> Cython versions using pointers into contiguous array
>
> Mul32-cy 1.21214485168
> Div32-cy 6.75360918045
> Mul64-cy 3.98143696785
> Div64-cy 31.3645660877
>
> # Timing using double
> Multf-cy 4.11406683922
> Divf-cy 12.603869915
>
> (code attached).
>
> Matlab certainly returns integers from its int64 division, so I'm not
> sure why it does not have such an extreme slowdown for int64 division.
>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

```