# [Numpy-discussion] Slow divide of int64?

Matthew Brett matthew.brett at gmail.com
Thu Aug 16 17:45:32 EDT 2012

```Hi,

On Mon, Aug 13, 2012 at 9:49 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Aug 13, 2012 at 10:32 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>>
>> On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett <matthew.brett at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> A friend of mine just pointed out that dividing by int64 is
>>> considerably slower than multiplying in numpy:
>>>
>>> <script>
>>> from timeit import timeit
>>>
>>> import numpy as np
>>> import numpy.random as npr
>>>
>>> sz = (1024,)
>>> a32 = npr.randint(1, 5001, sz).astype(np.int32)
>>> b32 = npr.randint(1, 5001, sz).astype(np.int32)
>>> a64 = a32.astype(np.int64)
>>> b64 = b32.astype(np.int64)
>>>
>>> print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32')
>>> print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32')
>>> print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64')
>>> print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64')
>>> </script>
>>>
>>> gives (64 bit Debian Intel system, numpy trunk):
>>>
>>> Mul32 2.71295905113
>>> Div32 6.61985301971
>>> Mul64 2.78101611137
>>> Div64 22.8217148781
>>>
>>> with similar values for numpy 1.5.1.
>>>
>>> Crude testing with Matlab and Octave suggests they do not seem to have
>>> this same difference:
>>>
>>> >> divtest
>>> Mul32 4.300662
>>> Div32 5.638622
>>> Mul64 7.894490
>>> Div64 18.121182
>>>
>>> octave:2> divtest
>>> Mul32 3.960577
>>> Div32 6.553704
>>> Mul64 7.268324
>>> Div64 13.670760
>>>
>>> (files attached)
>>>
>>> Is there something specific about division in numpy that would cause
>>> this slowdown?
>>>
>>
>> Numpy is doing an integer divide unless you are using Python 3.x. The
>> np.true_divide ufunc will speed things up a bit. I'm not sure what
>> Matlab/Octave are doing for division in this case.
>>
>
> For int64:
>
> In [23]: timeit multiply(a, b)
> 100000 loops, best of 3: 3.31 us per loop
>
> In [24]: timeit true_divide(a, b)
> 100000 loops, best of 3: 9.35 us per loop

Thanks for looking into this.  It does look like int64 division is
particularly slow for the systems I'm testing on.  Here's a cython
c-pointer version compared to the numpy version:

Numpy versions as above:

Mul32 3.15036797523
Div32 6.68296504021
Mul64 4.50731801987
Div64 22.9649209976

Cython versions using pointers into contiguous array

Mul32-cy 1.21214485168
Div32-cy 6.75360918045
Mul64-cy 3.98143696785
Div64-cy 31.3645660877

# Timing using double
Multf-cy 4.11406683922
Divf-cy 12.603869915

(code attached).

Matlab certainly returns integers from its int64 division, so I'm not
sure why it does not have such an extreme slowdown for int64 division.

Cheers,

Matthew

```