[Python-Dev] Optimize Python long integers

Tue Nov 11 14:25:18 CET 2008

Hi,

Patches
=======

There are some very interesting propositions (with patches!) to optimize 
Python int and long types (especially the long integers).
--------------------------------
haypo: Macros for PyLong: sign, number of digits, fits in an int
http://bugs.python.org/issue4294

marketdickinson: Use 30-bit digits instead of 15-bit digits for integers
http://bugs.python.org/issue4258

pernici: faster long multiplication
http://bugs.python.org/issue3944

haypo: Victor Stinner's GMP patch for longs
http://bugs.python.org/issue1814

fredrikj: Asymptotically faster divmod and str(long)
http://bugs.python.org/issue3451
--------------------------------

See also:
--------------------------------
fredrikj: create a numbits() method for int and long types
http://bugs.python.org/issue3439
--------------------------------

Benchmark
=========

I tried to do benchmark on all these patches using pystone or pybench, but the 
results are inaccurate. Pystone results change with +/- 20% with the same 
code on different runs. I tried more loops (pystone 250000), but it doesn't 
change anything. Pybench is better it is also inaccurate to benchmark 
operations on integers.

That's I started to write a *basic* benchmark tool to compare the different 
patches: see file bench_int.py of the issue #4294.

Benchmark on a 64 bits CPU @2.5 GHz :
-------------------------------
original     : 896.5 ms
 + macros    : 891.0 ms (+0.6%) | 
 ++ optimize : 884.3 ms (+1.4%) |-- issue #4294
 +++ shift   : 880.8 ms (+1.7%) |
fast long    : 746.8 ms (+16%)  -- issue #3944 
GMP          : 700.9 ms (+22%)  -- issue #1814
30 bits      : 659.9 ms (+26%)  -- issue #4258
-------------------------------

Benchmark on a 32 bits CPU @3.0 GHz :
---------------------------
original  : 1564.3 ms
fast long : 1591.7 ms (-2%)
GMP       : 1564.4 ms (=)
30 bits   : 1497.3 ms (+4%)
---------------------------

Don't trust the benchmark results since my tool is young and may also be 
inaccurate, but it's a good start to compare the patches.

Notes:
 - my macro patch doesn't want to optimize anything, it's just
   a preparation for new patches; but I also attached "optimization"
   patches which are useless (only +1.7% with all patches).
 - 30 bits is always faster
 - GMP is faster on 64 bits or at same speed on 32 bits
 - fast long is slower on a 32 bits CPU

Python integers
===============

I tried to understand why the "30 bits" patch is not as fast as I expected. I 
introduced statistics in the long type: see long_stat.patch of the issue 
#4258. Summary (on a 32 bits CPU):

- PyLong_FromLong(), long_add() and long_compare() are the 3 most common
  operations on integers. 
- Most integers are in range [-255; 255], large integers are rare. With 
  make and pystone programs, largest integer has 1321 bits, but there is
  just one such integer. Another is 33 bits, but all other integers fits 
  in 32 bits (without the sign, so 33 bits with the sign). I saw that the 
  symbol table use memory address in a dictionary key (so 32 bits on a
  32 bits CPU and 64 bits on a 32 bits CPU).
  => we have to focus on small integers
- pystone is inaccurate to benchmark integers: it only uses a big integer
  (more than 1 digit in base 2^30) on 1.000.000 integers
  => don't use pystone!

I don't have a solution working on any CPU with all numbers, this email is 
just a summary of the interesting integer patches.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/