
After thinking more about Py_ssize_t, I'm surprised that we're not hearing about 64 bit users having a couple of major problems. If I'm understanding what was done for dictionaries, the hash table can grow larger than the range of hash values. Accordingly, I would expect large dictionaries to have an unacceptably large number of collisions. OTOH, we haven't heard a single complaint, so perhaps my understanding is off. The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Py_ssize_t API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show-up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to a long so that the poster could make a reliable build that included non-updated third-party extensions. In the absence of a bug report, it's hard to know whether there is a real problem. Have all major third-party extensions adopted Py_ssize_t or is some divine force helping unconverted extensions work with converted Python code? Maybe the datasets just haven't gotten big enough yet. Raymond

On 20 Feb, 2007, at 10:47, Raymond Hettinger wrote:
The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Py_ssize_t API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show- up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to a long so that the poster could make a reliable build that included non-updated third-party extensions.
Maybe that's because most sane 64-bit systems use LP64 and therefore don't have any problems with mixing Py_ssize_t and long. AFAIK Windows is the only major platform that doesn't use the LP64 model and 64-bit windows isn't used a lot. Ronald

Raymond Hettinger schrieb:
If I'm understanding what was done for dictionaries, the hash table can grow larger than the range of hash values. Accordingly, I would expect large dictionaries to have an unacceptably large number of collisions. OTOH, we haven't heard a single complaint, so perhaps my understanding is off.
I think this would happen, but users don't have enough memory to notice it. For a dictionary with more than 4GEntries, you need 72GiB memory (8 byte for each key, value, and cached-hash). So you are starting to see massive collisions only when you have that much memory - plus in that dictionary, you would also need space for keys and values. Very few people have machines with 128+GiB main memory, so no complaints yet. But you are right: extending the hash value to be a 64-bit quantity was "forgotten", mainly because it isn't a count of something - and being "count of something" was the primary criterion for the 2.5 changes.
The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Py_ssize_t API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show-up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to a long so that the poster could make a reliable build that included non-updated third-party extensions.
On most 64-bit systems, there is also an option to run 32-bit programs (atleast on AMD64, Sparc-64, and PPC64 there is). So people are more likely to do that when they run into problems, rather than recompiling the 64-bit Python.
In the absence of a bug report, it's hard to know whether there is a real problem. Have all major third-party extensions adopted Py_ssize_t or is some divine force helping unconverted extensions work with converted Python code?
I know Matthias Klose has fixed all extension modules in the entire Debian source to compile without warnings on 64-bit machines. They may not work all yet, but yes, for all modules in Debian, it has been fixed. Not sure whether Matthias is a divine force, but working for Canonical comes fairly close :-)
Maybe the datasets just haven't gotten big enough yet.
Primarily that. We still have a few years ahead to find all bugs before people would start complaining that Python is unstable on 64-bit systems. By the time people would actually see problems, hopefully they all have been resolved. Regards, Martin

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Feb 20, 2007, at 4:47 AM, Raymond Hettinger wrote:
The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Py_ssize_t API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show- up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to a long so that the poster could make a reliable build that included non-updated third-party extensions.
When I did an experimental port of our big embedded app to Python 2.5, that's (almost) exactly what I did. I didn't add the #define to a Python header file, but to our own and it worked pretty well, IIRC. I never went farther than the experimental phase though. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRdr1QHEjvBPtnXfVAQIO0wP5Adr7c467NFn5fjmvcAemtvjg+3Tri0qV SHI6LF88tSYkxKLezTojXPFQ+kYTjgz1yLa1KuQ6W9Q8dhiKGUVu7ZqFT12IGcIV n6Yf0htkpGmq/3G7m7D7yWHQrQE3Ce3+f6tI/4aL5eQ3mgdo1y828sY/sCCc4fTC Ln2gSad6g/M= =QQ5y -----END PGP SIGNATURE-----

On 2/20/07, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
After thinking more about Py_ssize_t, I'm surprised that we're not hearing about 64 bit users having a couple of major problems.
If I'm understanding what was done for dictionaries, the hash table can grow larger than the range of hash values. Accordingly, I would expect large dictionaries to have an unacceptably large number of collisions. OTOH, we haven't heard a single complaint, so perhaps my understanding is off.
Not until the has table has 4 billion entries. I believe that would be 96 GB just for the hash table; plus probably at least that for that many unique key strings. Not to mention the values (but those needn't be unique). I think the benefit of 64-bit architecture start above using 2 or 3 GB of RAM, so there's quite a bit of expansion space for 64-bit users before they run into this theoretical problem.
The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Py_ssize_t API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show-up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to a long so that the poster could make a reliable build that included non-updated third-party extensions.
In the absence of a bug report, it's hard to know whether there is a real problem. Have all major third-party extensions adopted Py_ssize_t or is some divine force helping unconverted extensions work with converted Python code? Maybe the datasets just haven't gotten big enough yet.
My suspicion is that building Python for an 64-bit address space is still a somewhat academic exercise. I know we don't do this at Google (we switch to other languages long before the datasets become so large we'd need a 64-bit address space for Python). What's your experience at EWT? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

My suspicion is that building Python for an 64-bit address space is still a somewhat academic exercise. I know we don't do this at Google (we switch to other languages long before the datasets become so large we'd need a 64-bit address space for Python). What's your experience at EWT?
Two people had some difficulty building non-upgraded third-party modules with Py2.5 on 64-bit machines (I think wxPython was one of the problems) but they either gave up or switched machines before we could isolate the problem and say for sure whether Py_ssize_t was the culprit. I had remembered the PEP saying that there might be some issues for non-upgraded third-party modules and have wondered whether others were similarly affected. Raymond

Raymond Hettinger writes:
Two people had some difficulty building non-upgraded third-party modules with Py2.5 on 64-bit machines (I think wxPython was one of the problems)
In my experience wxPython is problematic, period. It's extremely tightly bound to internal details of everything around it. In particular, on every package system I've (tried to) build it, the first thing the package does is to check for its own version of Python, and pull it in if it's not there.

Guido van Rossum wrote:
My suspicion is that building Python for an 64-bit address space is still a somewhat academic exercise.
arbitrary 64-bit systems, perhaps. the first Python system I ever built was deployed on an LP64 system back in 1995. it's still running, and is still being maintained. </F>

Fredrik Lundh schrieb:
My suspicion is that building Python for an 64-bit address space is still a somewhat academic exercise.
arbitrary 64-bit systems, perhaps. the first Python system I ever built was deployed on an LP64 system back in 1995. it's still running, and is still being maintained.
I can see that you would run a 64-bit version of Python on a system where no 32-bit mode is available, or where certain libraries are only available for 64-bit mode, or where the performance of the 32-bit mode is inadequate. However, I would expect that you "normally" cannot exercise the "large collections" feature of Python 2.5 on such an installation, because you have not enough memory (in particular if the system was built in 1995). Regards, Martin

[Raymond Hettinger]
After thinking more about Py_ssize_t, I'm surprised that we're not hearing about 64 bit users having a couple of major problems.
If I'm understanding what was done for dictionaries, the hash table can grow larger than the range of hash values. Accordingly, I would expect large dictionaries to have an unacceptably large number of collisions. OTOH, we haven't heard a single complaint, so perhaps my understanding is off. ...
As others have noted, it would require a truly gigantic dict for anyone to notice, and nobody yet has enough RAM to build something that large. I added this comment to dictobject.c for 2.5: Theoretical Python 2.5 headache: hash codes are only C "long", but sizeof(Py_ssize_t) > sizeof(long) may be possible. In that case, and if a dict is genuinely huge, then only the slots directly reachable via indexing by a C long can be the first slot in a probe sequence. The probe sequence will still eventually reach every slot in the table, but the collision rate on initial probes may be much higher than this scheme was designed for. Getting a hash code as fat as Py_ssize_t is the only real cure. But in practice, this probably won't make a lick of difference for many years (at which point everyone will have terabytes of RAM on 64-bit boxes). Ironically, IIRC we /have/ had a complaint in the other direction: someone on SF claims to have a box where sizeof(Py_ssize_t) < sizeof(long). Something else breaks as a result of that. I think I always implicitly assumed sizeof(Py_ssize_t) >= sizeof(long) would hold. In any case, hash codes are defined to be of type "long" in the C API, so there appears no painless way to boost their size on boxes where sizeof(Py_ssize_t) > sizeof(long).

On 2/20/07, Tim Peters <tim.peters@gmail.com> wrote:
In any case, hash codes are defined to be of type "long" in the C API, so there appears no painless way to boost their size on boxes where sizeof(Py_ssize_t) > sizeof(long).
But that would only be on Windows; I believe other vendors have a 64-bit long on 64-bit machines. I suppose the pain wouldn't be any greater than the pain of turning int into Py_ssize_t. Perhaps less so in Py3k since there the issue that PyInt only holds a C long is solved (by equating it to PyLong :). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (9)
-
"Martin v. Löwis"
-
Barry Warsaw
-
Fredrik Lundh
-
Guido van Rossum
-
Raymond Hettinger
-
Raymond Hettinger
-
Ronald Oussoren
-
Stephen J. Turnbull
-
Tim Peters