[Python-Dev] Performance (non)optimization: 31-bit ints in pointers

Guido van Rossum guido@python.org
Mon, 12 Aug 2002 17:07:31 -0400


> Many Lisp interpreters use 'tagged types' to, among other things, let
> small ints reside directly in the machine registers.
> 
> Python might wish to take advantage of this by designating pointers to odd
> addresses stand for integers according to the following relationship:
>     p = (i<<1) | 1
>     i = (p>>1)
> (due to alignment requirements on all common machines, all valid
> pointers-to-struct have 0 in their low bit)  This means that all integers
> which fit in 31 bits can be stored without actually allocating or deallocating
> anything.
> 
> I modified a Python interpreter to the point where it could run simple
> programs.  The changes are unfortunately very invasive, because they
> make any C code which simply executes
>     o->ob_type
> or otherwise dereferences a PyObject* invalid when presented with a
> small int.  This would obviously affect a huge amount of existing code in
> extensions, and is probably enough to stop this from being implemented
> before Python 3000.
> 
> This also introduces another conditional branch in many pieces of code, such
> as any call to PyObject_TypeCheck().
> 
> Performance results are mixed.  A small program designed to test the
> speed of all-integer arithmetic comes out faster by 14% (3.38 vs 2.90
> "user" time on my machine) but pystone comes out 5% slower (14124 vs 13358
> "pystones/second").
> 
> I don't know if anybody's barked up this tree before, but I think
> these results show that it's almost certainly not worth the effort to
> incorporate this "performance" hack in Python.  I'll keep my tree around
> for awhile, in case anybody else wants to see it, but beware that it
> still has serious issues even in the core:
>     >>> 0+0j
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in ?
>     TypeError: unsupported operand types for +: 'int' and 'complex'
>     >>> (0).__class__
>     Segmentation fault

We used *exactly* this approach in ABC.  I decided not to go with it
in Python, for two reasons that are essentially what you write up
here: (1) the changes are very pervasive (in ABC, we kept finding
places where we had pointer-manipulating code that had to be fixed to
deal with the small ints), and (2) it wasn't at all clear if it was a
performance win in the end (all the extra tests and special cases
may cost as much as you gain).

In general, ABC tried to use many tricks from the books (e.g. it used
asymptotically optimal B-tree algorithms to represent dicts, lists and
strings, to guarantee performance of slicing and dicing operations for
strings of absurd lengths).  In Python I decided to stay away from
cleverness except when extensive performance analysis showed there was
a real need to speed something up.  That got us super-fast dicts, for
example, and .pyc files to cache the work of the (slow, but
trick-free) parser.

--Guido van Rossum (home page: http://www.python.org/~guido/)