[Python-3000] long/int unification
martin at v.loewis.de
martin at v.loewis.de
Fri Aug 25 03:49:55 CEST 2006
Here is a quick status of the int_unification branch,
summarizing what I did at the Google sprint in NYC.
- the int type has been dropped; the builtins int and long
now both refer to long type
- all PyInt_* API is forwarded to the PyLong_* API. Little
changes to the C code are necessary; the most common offender
is PyInt_AS_LONG((PyIntObject*)v) since I completely removed
- Much of the test suite passes, although it still has a number
- There are timing tests for allocation and for addition.
On allocation, the current implementation is about a factor
of 2 slower; the integer addition is about 1.5 times slower;
the initial slowdowns was by a factor of 3. The pystones
dropped about 10% (pybench fails to run on p3yk).
A couple of interesting observations:
- bool was a subtype of int, and is now a subtype of long. In
order to avoid knowing the internal representation of long,
the bool type compares addresses against Py_True and Py_False,
instead of looking at ob_ival.
- to add the small ints cache, an array of statically allocated
longs is used, rather than heap-allocating them.
- after adding the small ints cache, lot of things broke, e.g.
for code like
py> x = 4
py> x = -4
This happened because long methods just toggle the sign
of the object they got, messing up the small ints cache.
- to further speedup the implementation, I added special
casing for one-digit numbers. As they are always in
range(-32767,32768), the arithmethic operations don't
need overflow checking anymore (even multiplication
won't overflow 32-bit int).
- I found that in 2.x, long objects overallocate 2 byte
on a 32-bit machine, and 6 bytes on a 64-bit machine,
because sizeof(PyLongObject) rounds up.
- pickle and marshal have been changed to deal with
the loss of int; pickle generates INT codes even
for longs now provided the value is in the range
for the code.
I'm not sure whether this performance change is
acceptable; at this point, I'm running out of ideas
how to further improve the performance. Using a plain
32-bit int as the representation could be another
try, but I somewhat doubt it helps given that the
the supposedly-simpler single-digit case is so
More information about the Python-3000