64-bit process optimization 1
Hello, This is my first post to Python-Dev. As requested by the list manager I am supplying a brief personal introduction before getting to the topic of this message: I am a Senior Research Scientist at <A http://www.cas.org>CAS</A>, a branch of the <A http://www.acs.org>American Chemical Society</A>. I have used Python as my programming language of choice for the last four years. I typically work with large collections of text documents performing analyses of text, computer indexing of text, and information retrieval. I use Python as (1) a general purpose programming language, and (2) a high-level programming language to invoke high-performance C and C++ modules (including Numeric). If I examine my programs by data structures, I would find that they contain mostly: 1. Very large dictionaries using tuples and strings as keys. Guido's essay on <a href=http://www.python.org/doc/essays/graphs.html>Implementing Graphs</A> was the inspiration for my using dictionaries to create very large directed acyclic graphs. 2. Specialized C++ objects to represent inverted lists. 3. Numeric objects for representing vectors and tables of floating point values. My primary computing platforms are four dedicated Sun servers, containing 30 processors, 88GB of RAM and 2TB of DASD. Most of the programs I write require between 1 hour and 27 days to complete. (Obviously, I am an atypical Python user!) During the last three months, I have been forced to migrate from 32-bit python processes to 64-bit processes due to the large number of data points I am analyzing within a single program run. It is my experiences while migrating from 32-bit to 64-bit code that prompted this message. It is with some trepidation that as the subject of my first posting I am suggesting that Python 2.3 should use a different layout of all Python objects than is defined in Python 2.2.1. Specifically, I have found that changing lines 63-74 of Include/object.h from: #ifdef Py_TRACE_REFS #define PyObject_HEAD \ struct _object *_ob_next, *_ob_prev; \ int ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) 0, 0, 1, type, #else /* !Py_TRACE_REFS */ #define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) 1, type, #endif /* !Py_TRACE_REFS */ to: #ifdef Py_TRACE_REFS #define PyObject_HEAD \ struct _object *_ob_next, *_ob_prev; \ struct _typeobject *ob_type; \ int ob_refcnt; #define PyObject_HEAD_INIT(type) 0, 0, type, 1, #else /* !Py_TRACE_REFS */ #define PyObject_HEAD \ struct _typeobject *ob_type; \ int ob_refcnt; #define PyObject_HEAD_INIT(type) type, 1, #endif /* !Py_TRACE_REFS */ significantly improved the performance of my 64-bit processes. Basically, I have just changed the order of the items in PyObject and PyVarObject to avoid gas due to an "int" being a 4-byte long and aligned types, while "long" and pointers are 8-byte long and aligned types (on 64-bit platforms that conform to the LP64 guideline). For the ILP32 guideline, such as Intel x86 and AMD CPUs, this should have no effect. On the Sun platform on which I live, the changes work for both ILP32 and LP64. For the very large programs I run, the modification saved me 40% execution time. This was probably due to the increased number of Python objects that would fit into the L2 cache, so I don't believe that others would necessarily see as large as a difference with this coding change. Please consider this change for inclusion in the upcoming Python release. - Bob
Without commenting on the merits of your proposal, I can tell you that it'll get lost unless you file a bug report on SourceForge. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
I am suggesting that Python 2.3 should use a different layout of all Python objects than is defined in Python 2.2.1. Specifically, I have found that changing lines 63-74 of Include/object.h from:
#ifdef Py_TRACE_REFS #define PyObject_HEAD \ struct _object *_ob_next, *_ob_prev; \ int ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) 0, 0, 1, type, #else /* !Py_TRACE_REFS */ #define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) 1, type, #endif /* !Py_TRACE_REFS */
to:
#ifdef Py_TRACE_REFS #define PyObject_HEAD \ struct _object *_ob_next, *_ob_prev; \ struct _typeobject *ob_type; \ int ob_refcnt; #define PyObject_HEAD_INIT(type) 0, 0, type, 1, #else /* !Py_TRACE_REFS */ #define PyObject_HEAD \ struct _typeobject *ob_type; \ int ob_refcnt; #define PyObject_HEAD_INIT(type) type, 1, #endif /* !Py_TRACE_REFS */
significantly improved the performance of my 64-bit processes.
Basically, I have just changed the order of the items in PyObject and PyVarObject to avoid gas due to an "int" being a 4-byte long and aligned types, while "long" and pointers are 8-byte long and aligned types (on 64-bit platforms that conform to the LP64 guideline). For the ILP32 guideline, such as Intel x86 and AMD CPUs, this should have no effect. On the Sun platform on which I live, the changes work for both ILP32 and LP64. For the very large programs I run, the modification saved me 40% execution time. This was probably due to the increased number of Python objects that would fit into the L2 cache, so I don't believe that others would necessarily see as large as a difference with this coding change.
Interesting! I can see why this makes sense. Strings, lists and tuples all have an int (ob_size) directly following the standard HEAD, and after that something that requires pointer alignment, so that these object types would all save 8 bytes! To wit: string int refcnt, ptr type, int size, long hash, ... ^gap ^gap list int refcnt, ptr type, int size, ptr item* ^gap ^gap tuple int refcnt, ptr type, int size, ptr item[] ^gap ^gap By swapping the first two fields, these gaps would all disappear. The dict object doesn't use ob_size, but starts with an odd number of ints, so the same reasoning shows it would also save 8 bytes. I don't have access to a 64-bit platform to experiment with this. Unfortunately, one problem is binary compatibility. We try to make it possible to link newer Python versions with extension modules (like Numeric, which you use) compiled for older versions. This requires that the binary lay-out of objects remains the same, and swapping ob_refcnt and ob_type would cause immediate crashes in this case. It may be that there are other reasons why binary incompatibilities exist between 2.2 and 2.3 that make this impractical, so perhaps I'm being too conservative here. Another issue is that at least theoretically, on a 64-bit platform, there could be more than 2 billion references to a particular object. E.g. if you have enough memory, the following allocates 3 lists each containing a billion references to None, causing the reference count of None to go negative: A = [] for i in range(3): A.append([None]*1000000000) So perhaps the refcnt should have been a long in the first place. A similar argument may hold for the length of e.g. strings and lists: one could wish to have a list of more than 2 billion elements, or a string containing more than 2 gigabytes (that much RAM is easily found on the larger 64-bit servers, I believe). Opinions? --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... So perhaps the refcnt should have been a long in the first place.
We agreed to that years ago, but never bothered to change it. In fact, you used to tell people it *was* a long until I beat that out of you <wink>. Do note that a long is still only 4 bytes on Win64. The type we really want here is what pyport.h calls Py_intptr_t (a Python spelling of the appropriate C99 type; C99 introduced ways to say what you really mean in these cases).
A similar argument may hold for the length of e.g. strings and lists: one could wish to have a list of more than 2 billion elements, or a string containing more than 2 gigabytes (that much RAM is easily found on the larger 64-bit servers, I believe).
Opinions?
Those are more naturally addressed by size_t, since strlen and malloc are constrained to that type. I generally declare string-slinging code as using size_t vars now, and endure the pain of casting back and forth to int to talk with Python's idea of a string size. Whether it's worth the pain to change this stuff depends on whether we think 64-bit boxes are just another passing fad like the Internet <wink>.
Guido van Rossum <guido@python.org> writes:
So perhaps the refcnt should have been a long in the first place. A similar argument may hold for the length of e.g. strings and lists: one could wish to have a list of more than 2 billion elements, or a string containing more than 2 gigabytes (that much RAM is easily found on the larger 64-bit servers, I believe).
Opinions?
I agree with that position, and Tim's, that those fields should widen to 64 bits on a 64-bit system. I disagree that size_t is suitable for ob_size, since some types put negative values into ob_size. The signed version of that, ssize_t, is not universally available, so we'd need to add Py_ssize_t. Regards, Martin
guido wrote:
Unfortunately, one problem is binary compatibility. We try to make it possible to link newer Python versions with extension modules (like Numeric, which you use) compiled for older versions. This requires that the binary lay-out of objects remains the same, and swapping ob_refcnt and ob_type would cause immediate crashes in this case.
a compromise could be to make the swap in 2.3, but only on 64-bit platforms. it's obvious that most people are stuck on 32-bit platforms today, and I think it's safe to say that users on 64-bit plat- forms might be a bit more willing to build everything they need on their local platform. another alternative would be to make it a configuration option, with a platform-dependent default. </F>
Unfortunately, one problem is binary compatibility. We try to make it possible to link newer Python versions with extension modules (like Numeric, which you use) compiled for older versions. This requires that the binary lay-out of objects remains the same, and swapping ob_refcnt and ob_type would cause immediate crashes in this case.
a compromise could be to make the swap in 2.3, but only on 64-bit platforms.
it's obvious that most people are stuck on 32-bit platforms today, and I think it's safe to say that users on 64-bit plat- forms might be a bit more willing to build everything they need on their local platform.
another alternative would be to make it a configuration option, with a platform-dependent default.
I like all of that. Maybe it should also be a config option whether refcount, sizes etc. should be 32 or 64 bit quantities on 64 bit platforms. --Guido van Rossum (home page: http://www.python.org/~guido/)
--- Guido wrote:
a compromise could be to make the swap in 2.3, but only on 64-bit platforms.
it's obvious that most people are stuck on 32-bit platforms today, and I think it's safe to say that users on 64-bit plat- forms might be a bit more willing to build everything they need on their local platform.
another alternative would be to make it a configuration option, with a platform-dependent default.
I like all of that. Maybe it should also be a config option whether refcount, sizes etc. should be 32 or 64 bit quantities on 64 bit platforms.
+1 from this 64 bit user. __________________________________________________ Yahoo! - We Remember 9-11: A tribute to the more than 3,000 lives lost http://dir.remember.yahoo.com/tribute
participants (7)
-
Aahz
-
Fredrik Lundh
-
Guido van Rossum
-
ledwith@cas.org
-
martin@v.loewis.de
-
Scott Gilbert
-
Tim Peters