[Python-Dev] Performance (non)optimization: 31-bit ints in pointers

Mon, 12 Aug 2002 15:51:08 -0500

Many Lisp interpreters use 'tagged types' to, among other things, let
small ints reside directly in the machine registers.

Python might wish to take advantage of this by designating pointers to odd
addresses stand for integers according to the following relationship:
    p = (i<<1) | 1
    i = (p>>1)
(due to alignment requirements on all common machines, all valid
pointers-to-struct have 0 in their low bit)  This means that all integers
which fit in 31 bits can be stored without actually allocating or deallocating
anything.

I modified a Python interpreter to the point where it could run simple
programs.  The changes are unfortunately very invasive, because they
make any C code which simply executes
    o->ob_type
or otherwise dereferences a PyObject* invalid when presented with a
small int.  This would obviously affect a huge amount of existing code in
extensions, and is probably enough to stop this from being implemented
before Python 3000.

This also introduces another conditional branch in many pieces of code, such
as any call to PyObject_TypeCheck().

Performance results are mixed.  A small program designed to test the
speed of all-integer arithmetic comes out faster by 14% (3.38 vs 2.90
"user" time on my machine) but pystone comes out 5% slower (14124 vs 13358
"pystones/second").

I don't know if anybody's barked up this tree before, but I think
these results show that it's almost certainly not worth the effort to
incorporate this "performance" hack in Python.  I'll keep my tree around
for awhile, in case anybody else wants to see it, but beware that it
still has serious issues even in the core:
    >>> 0+0j
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    TypeError: unsupported operand types for +: 'int' and 'complex'
    >>> (0).__class__
    Segmentation fault

Jeff
jepler@unpythonic.net
PS The program that shows the advantage of this optimization is as follows:

j = 0
for k in range(10):
    for i in range(100000) + range(1<<30, 1<<30 + 100000):
	j = j ^ i
print j