[Python-3000] long/int unification

Fri Aug 25 03:49:55 CEST 2006

Here is a quick status of the int_unification branch,
summarizing what I did at the Google sprint in NYC.

- the int type has been dropped; the builtins int and long
  now both refer to long type
- all PyInt_* API is forwarded to the PyLong_* API. Little
  changes to the C code are necessary; the most common offender
  is PyInt_AS_LONG((PyIntObject*)v) since I completely removed
  PyIntObject.
- Much of the test suite passes, although it still has a number
  of bugs.
- There are timing tests for allocation and for addition.
  On allocation, the current implementation is about a factor
  of 2 slower; the integer addition is about 1.5 times slower;
  the initial slowdowns was by a factor of 3. The pystones
  dropped about 10% (pybench fails to run on p3yk).

A couple of interesting observations:
- bool was a subtype of int, and is now a subtype of long. In
  order to avoid knowing the internal representation of long,
  the bool type compares addresses against Py_True and Py_False,
  instead of looking at ob_ival.
- to add the small ints cache, an array of statically allocated
  longs is used, rather than heap-allocating them.
- after adding the small ints cache, lot of things broke, e.g.
  for code like
  py> x = 4
  py> x = -4
  py> x
  -4
  py> 4
  -4
  This happened because long methods just toggle the sign
  of the object they got, messing up the small ints cache.
- to further speedup the implementation, I added special
  casing for one-digit numbers. As they are always in
  range(-32767,32768), the arithmethic operations don't
  need overflow checking anymore (even multiplication
  won't overflow 32-bit int).
- I found that in 2.x, long objects overallocate 2 byte
  on a 32-bit machine, and 6 bytes on a 64-bit machine,
  because sizeof(PyLongObject) rounds up.
- pickle and marshal have been changed to deal with
  the loss of int; pickle generates INT codes even
  for longs now provided the value is in the range
  for the code.

I'm not sure whether this performance change is
acceptable; at this point, I'm running out of ideas
how to further improve the performance. Using a plain
32-bit int as the representation could be another
try, but I somewhat doubt it helps given that the
the supposedly-simpler single-digit case is so
slow.

Regards,
Martin