[Python-Dev] 64-bit process optimization 1

ledwith@cas.org rledwith@cas.org
Mon, 9 Sep 2002 12:34:18 -0400 (EDT)


    This is my first post to Python-Dev.  As requested by the list manager
    I am supplying a brief personal introduction before getting to the topic of
    this message:

	I am a Senior Research Scientist at <A http://www.cas.org>CAS</A>,
	a branch of the <A http://www.acs.org>American Chemical Society</A>.
	I have used Python as my programming language of choice for the last
	four years.  I typically work with large collections of text documents
	performing analyses of text, computer indexing of text, and information
	retrieval.  I use Python as (1) a general purpose programming
	language, and (2) a high-level programming language to invoke
	high-performance C and C++ modules (including Numeric).
	If I examine my programs by data structures, I would find that they
	contain mostly:

	    1.  Very large dictionaries using tuples and strings as keys.
		Guido's essay on
		<a href=http://www.python.org/doc/essays/graphs.html>Implementing Graphs</A>
		was the inspiration for my using dictionaries to create very large
		directed acyclic graphs.

	    2.  Specialized C++ objects to represent inverted lists.

	    3.  Numeric objects for representing vectors and tables of floating point values.

	My primary computing platforms are four dedicated Sun servers,
	containing 30 processors, 88GB of RAM and 2TB of DASD.  Most of the
	programs I write require between 1 hour and 27 days to complete.
	(Obviously, I am an atypical Python user!)
	During the last three months, I have been forced to migrate from
	32-bit python processes to 64-bit processes due to the large number
	of data points I am analyzing within a single program run.
	It is my experiences while migrating from 32-bit to 64-bit code
	that prompted this message.

    It is with some trepidation that as the subject of my first posting
    I am suggesting that Python 2.3 should use a different layout of all Python objects
    than is defined in Python 2.2.1.  Specifically, I have found that changing
    lines 63-74 of Include/object.h from:

#ifdef Py_TRACE_REFS
#define PyObject_HEAD \
	struct _object *_ob_next, *_ob_prev; \
	int ob_refcnt; \
	struct _typeobject *ob_type;
#define PyObject_HEAD_INIT(type) 0, 0, 1, type,
#else /* !Py_TRACE_REFS */
#define PyObject_HEAD \
	int ob_refcnt; \
	struct _typeobject *ob_type;
#define PyObject_HEAD_INIT(type) 1, type,
#endif /* !Py_TRACE_REFS */


#ifdef Py_TRACE_REFS
#define PyObject_HEAD \
	struct _object *_ob_next, *_ob_prev; \
	struct _typeobject *ob_type; \
	int ob_refcnt;
#define PyObject_HEAD_INIT(type) 0, 0, type, 1,
#else /* !Py_TRACE_REFS */
#define PyObject_HEAD \
	struct _typeobject *ob_type; \
	int ob_refcnt;
#define PyObject_HEAD_INIT(type) type, 1,
#endif /* !Py_TRACE_REFS */

    significantly improved the performance of my 64-bit processes.

    Basically, I have just changed the order of the items in PyObject and
    PyVarObject to avoid gas due to an "int" being a 4-byte long and aligned
    types, while "long" and pointers are 8-byte long and aligned types (on
    64-bit platforms that conform to the LP64 guideline).  For the ILP32
    guideline, such as Intel x86 and AMD CPUs, this should have no effect.  On
    the Sun platform on which I live, the changes work for both ILP32 and LP64.
    For the very large programs I run, the modification saved me 40% execution
    time.  This was probably due to the increased number of Python objects that
    would fit into the L2 cache, so I don't believe that others would
    necessarily see as large as a difference with this coding change.

    Please consider this change for inclusion in the upcoming Python release.

					    - Bob