[Python-Dev] 64-bit process optimization 1
ledwith@cas.org
rledwith@cas.org
Mon, 9 Sep 2002 12:34:18 -0400 (EDT)
Hello,
This is my first post to Python-Dev. As requested by the list manager
I am supplying a brief personal introduction before getting to the topic of
this message:
I am a Senior Research Scientist at <A http://www.cas.org>CAS</A>,
a branch of the <A http://www.acs.org>American Chemical Society</A>.
I have used Python as my programming language of choice for the last
four years. I typically work with large collections of text documents
performing analyses of text, computer indexing of text, and information
retrieval. I use Python as (1) a general purpose programming
language, and (2) a high-level programming language to invoke
high-performance C and C++ modules (including Numeric).
If I examine my programs by data structures, I would find that they
contain mostly:
1. Very large dictionaries using tuples and strings as keys.
Guido's essay on
<a href=http://www.python.org/doc/essays/graphs.html>Implementing Graphs</A>
was the inspiration for my using dictionaries to create very large
directed acyclic graphs.
2. Specialized C++ objects to represent inverted lists.
3. Numeric objects for representing vectors and tables of floating point values.
My primary computing platforms are four dedicated Sun servers,
containing 30 processors, 88GB of RAM and 2TB of DASD. Most of the
programs I write require between 1 hour and 27 days to complete.
(Obviously, I am an atypical Python user!)
During the last three months, I have been forced to migrate from
32-bit python processes to 64-bit processes due to the large number
of data points I am analyzing within a single program run.
It is my experiences while migrating from 32-bit to 64-bit code
that prompted this message.
It is with some trepidation that as the subject of my first posting
I am suggesting that Python 2.3 should use a different layout of all Python objects
than is defined in Python 2.2.1. Specifically, I have found that changing
lines 63-74 of Include/object.h from:
#ifdef Py_TRACE_REFS
#define PyObject_HEAD \
struct _object *_ob_next, *_ob_prev; \
int ob_refcnt; \
struct _typeobject *ob_type;
#define PyObject_HEAD_INIT(type) 0, 0, 1, type,
#else /* !Py_TRACE_REFS */
#define PyObject_HEAD \
int ob_refcnt; \
struct _typeobject *ob_type;
#define PyObject_HEAD_INIT(type) 1, type,
#endif /* !Py_TRACE_REFS */
to:
#ifdef Py_TRACE_REFS
#define PyObject_HEAD \
struct _object *_ob_next, *_ob_prev; \
struct _typeobject *ob_type; \
int ob_refcnt;
#define PyObject_HEAD_INIT(type) 0, 0, type, 1,
#else /* !Py_TRACE_REFS */
#define PyObject_HEAD \
struct _typeobject *ob_type; \
int ob_refcnt;
#define PyObject_HEAD_INIT(type) type, 1,
#endif /* !Py_TRACE_REFS */
significantly improved the performance of my 64-bit processes.
Basically, I have just changed the order of the items in PyObject and
PyVarObject to avoid gas due to an "int" being a 4-byte long and aligned
types, while "long" and pointers are 8-byte long and aligned types (on
64-bit platforms that conform to the LP64 guideline). For the ILP32
guideline, such as Intel x86 and AMD CPUs, this should have no effect. On
the Sun platform on which I live, the changes work for both ILP32 and LP64.
For the very large programs I run, the modification saved me 40% execution
time. This was probably due to the increased number of Python objects that
would fit into the L2 cache, so I don't believe that others would
necessarily see as large as a difference with this coding change.
Please consider this change for inclusion in the upcoming Python release.
- Bob