[Python-Dev] 64-bit process optimization 1

Guido van Rossum guido@python.org
Mon, 09 Sep 2002 13:55:21 -0400


>     I am suggesting that Python 2.3 should use a different layout of
>     all Python objects than is defined in Python 2.2.1.
>     Specifically, I have found that changing lines 63-74 of
>     Include/object.h from:
> 
> #ifdef Py_TRACE_REFS
> #define PyObject_HEAD \
> 	struct _object *_ob_next, *_ob_prev; \
> 	int ob_refcnt; \
> 	struct _typeobject *ob_type;
> #define PyObject_HEAD_INIT(type) 0, 0, 1, type,
> #else /* !Py_TRACE_REFS */
> #define PyObject_HEAD \
> 	int ob_refcnt; \
> 	struct _typeobject *ob_type;
> #define PyObject_HEAD_INIT(type) 1, type,
> #endif /* !Py_TRACE_REFS */
> 
>     to:
> 
> #ifdef Py_TRACE_REFS
> #define PyObject_HEAD \
> 	struct _object *_ob_next, *_ob_prev; \
> 	struct _typeobject *ob_type; \
> 	int ob_refcnt;
> #define PyObject_HEAD_INIT(type) 0, 0, type, 1,
> #else /* !Py_TRACE_REFS */
> #define PyObject_HEAD \
> 	struct _typeobject *ob_type; \
> 	int ob_refcnt;
> #define PyObject_HEAD_INIT(type) type, 1,
> #endif /* !Py_TRACE_REFS */
> 
>     significantly improved the performance of my 64-bit processes.
> 
>     Basically, I have just changed the order of the items in
>     PyObject and PyVarObject to avoid gas due to an "int" being a
>     4-byte long and aligned types, while "long" and pointers are
>     8-byte long and aligned types (on 64-bit platforms that conform
>     to the LP64 guideline).  For the ILP32 guideline, such as Intel
>     x86 and AMD CPUs, this should have no effect.  On the Sun
>     platform on which I live, the changes work for both ILP32 and
>     LP64.  For the very large programs I run, the modification saved
>     me 40% execution time.  This was probably due to the increased
>     number of Python objects that would fit into the L2 cache, so I
>     don't believe that others would necessarily see as large as a
>     difference with this coding change.

Interesting!  I can see why this makes sense.  Strings, lists and
tuples all have an int (ob_size) directly following the standard HEAD,
and after that something that requires pointer alignment, so that
these object types would all save 8 bytes!  To wit:

string
	int refcnt, ptr type, int size, long hash, ...
                   ^gap                ^gap
list
	int refcnt, ptr type, int size, ptr item*
                   ^gap                ^gap
tuple
	int refcnt, ptr type, int size, ptr item[]
                   ^gap                ^gap

By swapping the first two fields, these gaps would all disappear.  The
dict object doesn't use ob_size, but starts with an odd number of
ints, so the same reasoning shows it would also save 8 bytes.

I don't have access to a 64-bit platform to experiment with this.

Unfortunately, one problem is binary compatibility.  We try to make it
possible to link newer Python versions with extension modules (like
Numeric, which you use) compiled for older versions.  This requires
that the binary lay-out of objects remains the same, and swapping
ob_refcnt and ob_type would cause immediate crashes in this case.

It may be that there are other reasons why binary incompatibilities
exist between 2.2 and 2.3 that make this impractical, so perhaps I'm
being too conservative here.

Another issue is that at least theoretically, on a 64-bit platform,
there could be more than 2 billion references to a particular object.
E.g. if you have enough memory, the following allocates 3 lists each
containing a billion references to None, causing the reference count
of None to go negative:

A = []
for i in range(3):
    A.append([None]*1000000000)

So perhaps the refcnt should have been a long in the first place.  A
similar argument may hold for the length of e.g. strings and lists:
one could wish to have a list of more than 2 billion elements, or a
string containing more than 2 gigabytes (that much RAM is easily found
on the larger 64-bit servers, I believe).

Opinions?

--Guido van Rossum (home page: http://www.python.org/~guido/)