[Numpy-discussion] Default type for functions that accumulate integers
Antoine Pitrou
solipsis at pitrou.net
Tue Jan 3 14:59:47 EST 2017
On Mon, 2 Jan 2017 18:46:08 -0800
Nathaniel Smith <njs at pobox.com> wrote:
>
> So some options include:
> - make the default integer precision 64-bits everywhere
> - make the default integer precision 32-bits on 32-bit systems, and
> 64-bits on 64-bit systems (including Windows)
Either of those two would be the best IMO.
Intuitively, I think people would expect 32-bit ints in 32-bit
processes by default, and 64-bit ints in 64-bit processes likewise. So
I would slightly favour the latter option.
> - leave the default integer precision the same, but make accumulators
> 64-bits everywhere
> - leave the default integer precision the same, but make accumulators
> 64-bits on 64-bit systems (including Windows)
Both of these options introduce a confusing discrepancy.
> - speed: there's probably some cost to using 64-bit integers on 32-bit
> systems; how big is the penalty in practice?
Ok, I have fired up a Windows VM to compare 32-bit and 64-bit builds.
Numpy version is 1.11.2, Python version is 3.5.2. Keep in mind those
are Anaconda builds of Numpy, with MKL enabled for linear algebra;
YMMV.
For each benchmark, the first number is the result on the 32-bit build,
the second number on the 64-bit build.
Simple arithmetic
-----------------
>>> v = np.ones(1024**2, dtype='int32')
>>> %timeit v + v # 1.73 ms per loop | 1.78 ms per loop
>>> %timeit v * v # 1.77 ms per loop | 1.79 ms per loop
>>> %timeit v // v # 5.89 ms per loop | 5.39 ms per loop
>>> v = np.ones(1024**2, dtype='int64')
>>> %timeit v + v # 3.54 ms per loop | 3.54 ms per loop
>>> %timeit v * v # 5.61 ms per loop | 3.52 ms per loop
>>> %timeit v // v # 17.1 ms per loop | 13.9 ms per loop
Linear algebra
--------------
>>> m = np.ones((1024,1024), dtype='int32')
>>> %timeit m @ m # 556 ms per loop | 569 ms per loop
>>> m = np.ones((1024,1024), dtype='int64')
>>> %timeit m @ m # 3.81 s per loop | 1.01 s per loop
Sorting
-------
>>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int32')
>>> %timeit np.sort(v) # 43.4 ms per loop | 44 ms per loop
>>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int64')
>>> %timeit np.sort(v) # 61.5 ms per loop | 45.5 ms per loop
Indexing
--------
>>> v = np.ones(1024**2, dtype='int32')
>>> %timeit v[v[::-1]] # 2.38 ms per loop | 4.63 ms per loop
>>> v = np.ones(1024**2, dtype='int64')
>>> %timeit v[v[::-1]] # 6.9 ms per loop | 3.63 ms per loop
Quick summary:
- for very simple operations, 32b and 64b builds can have the same perf
on each given bitwidth (though speed is uniformly halved on 64-bit
integers when the given operation is SIMD-vectorized)
- for more sophisticated operations (such as element-wise
multiplication or division, or quicksort, but much more so on the
matrix product), 32b builds are competitive with 64b builds on 32-bit
ints, but lag behind on 64-bit ints
- for indexing, it's desirable to use a "native" width integer,
regardless of whether that means 32- or 64-bit
Of course the numbers will vary depend on the platform (read:
compiler), but some aspects of this comparison will probably translate
to other platforms.
Regards
Antoine.
More information about the NumPy-Discussion
mailing list