Re: [Python-Dev] cpython: Using 'long double' to force this structure to be worst case aligned is no
On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types. (*) GC-enabled PyObjects, anyway. Others will be naturally aligned thanks to the memory allocator. What's more, I think you shouldn't be doing this kind of change in a bugfix release. It might break compiled C extensions since you are changing some characteristics of object layout (although you would probably only break those extensions which access the GC header, which is probably not many of them). Resource consumption improvements generally go only into the next feature release. Regards Antoine.
On Tue, 11 Dec 2012 08:16:27 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types.
Ok, I hadn't seen your proposal. I find it reasonable: “A more correct non-hacky alternative if any alignment issues are still found would be to use a compiler specific alignment declaration on the structure and determine which value to use at configure time.” However, the commit is still problematic, and I think it should be reverted. We can't remove the alignment hack just because it seems to be useless on x86(-64). Regards Antoine.
On Mon, Dec 10, 2012 at 11:21 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Tue, 11 Dec 2012 08:16:27 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types.
Ok, I hadn't seen your proposal. I find it reasonable:
“A more correct non-hacky alternative if any alignment issues are still found would be to use a compiler specific alignment declaration on the structure and determine which value to use at configure time.”
However, the commit is still problematic, and I think it should be reverted. We can't remove the alignment hack just because it seems to be useless on x86(-64).
I didn't remove it. I made it match what our memory allocator is already doing. Thanks for reviewing commits in such detail BTW. I do appreciate it. BTW, I didn't notice your replies until now because you didn't include me in the to/cc list on the responses. Please do that if you want a faster response. :) -gps
On Mon, Dec 10, 2012 at 11:16 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
No they don't. size_t and ssize_t exist in large part because they are often larger than an int or long on 32bit platforms. They are 64-bit on Linux regardless of platform (i think there is a way to force a compile in ancient mode that forces them and the APIs being used to be 32-bit size_t variants but nobody does that).
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types.
(*) GC-enabled PyObjects, anyway. Others will be naturally aligned thanks to the memory allocator.
What's more, I think you shouldn't be doing this kind of change in a bugfix release. It might break compiled C extensions since you are changing some characteristics of object layout (although you would probably only break those extensions which access the GC header, which is probably not many of them). Resource consumption improvements generally go only into the next feature release.
This isn't a resource consumption improvement. It is a compilation correctness change with zero impact on the generated code or ABI compatibility before and after. The structure, as defined, is was flagged as problematic by Clang's undefined behavior sanitizer because it contains a 'long double' which requires 16-byte alignment but Python's own memory allocator was using an 8 byte boundary. So changing the definition of the dummy side of the union makes zero difference to already compiled code as it (a) doesn't change the structure's size and (b) all existing implementations already align these on an 8 byte boundary. -gps
On Thu, Dec 13, 2012 at 11:27 PM, Gregory P. Smith <greg@krypto.org> wrote:
On Mon, Dec 10, 2012 at 11:16 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
No they don't.
size_t and ssize_t exist in large part because they are often larger than an int or long on 32bit platforms. They are 64-bit on Linux regardless of platform (i think there is a way to force a compile in ancient mode that forces them and the APIs being used to be 32-bit size_t variants but nobody does that).
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types.
(*) GC-enabled PyObjects, anyway. Others will be naturally aligned thanks to the memory allocator.
What's more, I think you shouldn't be doing this kind of change in a bugfix release. It might break compiled C extensions since you are changing some characteristics of object layout (although you would probably only break those extensions which access the GC header, which is probably not many of them). Resource consumption improvements generally go only into the next feature release.
BTW - This change was done on tip only. The comment about this being 'in a bugfix release' is wrong. While I personally believe this is needed in all of the release branches I didn't commit this one there *just in case* there is some weird platform where this change actually makes a difference. I don't believe such a thing exists in 2012, but as there is no way that is worth my time for me to find that out, I didn't put it in a bugfix branch. -gps
This isn't a resource consumption improvement. It is a compilation correctness change with zero impact on the generated code or ABI compatibility before and after. The structure, as defined, is was flagged as problematic by Clang's undefined behavior sanitizer because it contains a 'long double' which requires 16-byte alignment but Python's own memory allocator was using an 8 byte boundary.
So changing the definition of the dummy side of the union makes zero difference to already compiled code as it (a) doesn't change the structure's size and (b) all existing implementations already align these on an 8 byte boundary.
-gps
On 14 Dec, 2012, at 8:27, "Gregory P. Smith" <greg@krypto.org> wrote:
On Mon, Dec 10, 2012 at 11:16 PM, Antoine Pitrou <solipsis@pitrou.net> wrote: On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
No they don't.
size_t and ssize_t exist in large part because they are often larger than an int or long on 32bit platforms. They are 64-bit on Linux regardless of platform (i think there is a way to force a compile in ancient mode that forces them and the APIs being used to be 32-bit size_t variants but nobody does that).
Are you sure about this, what you describe seem to be loff_t (the typedef for file offsets), not size_t (the typedef for sizes of memory blocks). The size of memory blocks is limited to a 32-bit number on 32-bit systems (for the obvious reason). Size_t is 32-bit on at least some platforms: $ cat t.c #include <stdlib.h> #include <stdio.h> int main(void) { printf("sizeof(size_t): %d\n", (int)sizeof(size_t)); return 0; } $ cc -o t t.c -arch i386 $ ./t sizeof(size_t): 4 This session is on an OSX system, but you'll get the same output on a 32-bit linux system with default compiler settings (I've tested this on a SLES10 system).
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types.
(*) GC-enabled PyObjects, anyway. Others will be naturally aligned thanks to the memory allocator.
What's more, I think you shouldn't be doing this kind of change in a bugfix release. It might break compiled C extensions since you are changing some characteristics of object layout (although you would probably only break those extensions which access the GC header, which is probably not many of them). Resource consumption improvements generally go only into the next feature release.
This isn't a resource consumption improvement. It is a compilation correctness change with zero impact on the generated code or ABI compatibility before and after. The structure, as defined, is was flagged as problematic by Clang's undefined behavior sanitizer because it contains a 'long double' which requires 16-byte alignment but Python's own memory allocator was using an 8 byte boundary.
So changing the definition of the dummy side of the union makes zero difference to already compiled code as it (a) doesn't change the structure's size and (b) all existing implementations already align these on an 8 byte boundary.
-gps
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com
On Fri, Dec 14, 2012 at 12:02 AM, Ronald Oussoren <ronaldoussoren@mac.com>wrote:
On 14 Dec, 2012, at 8:27, "Gregory P. Smith" <greg@krypto.org> wrote:
On Mon, Dec 10, 2012 at 11:16 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Tue, 11 Dec 2012 03:05:19 +0100 (CET) gregory.p.smith <python-checkins@python.org> wrote:
Using 'long double' to force this structure to be worst case aligned is no longer required as of Python 2.5+ when the gc_refs changed from an int (4 bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's Undefined-Behavior Sanitizer as on many platforms a long double requires 16-byte alignment but the Python memory allocator only guarantees 8 byte alignment.
So our code would allocate and use these structures with technically improper alignment. Though it didn't matter since the 'dummy' field is never used. This silences that warning.
Spelunking into code history, the double was added in 2001 to force better alignment on some platforms and changed to a long double in 2002 to appease Tru64. That issue should no loner be present since the upgrade from int to Py_ssize_t where the minimum structure size increased to 16 (unless anyone knows of a platform where ssize_t is 4 bytes?)
What?? Every 32-bit platform has a 4 bytes ssize_t (and size_t).
No they don't.
size_t and ssize_t exist in large part because they are often larger than an int or long on 32bit platforms. They are 64-bit on Linux regardless of platform (i think there is a way to force a compile in ancient mode that forces them and the APIs being used to be 32-bit size_t variants but nobody does that).
Are you sure about this, what you describe seem to be loff_t (the typedef for file offsets), not size_t (the typedef for sizes of memory blocks). The size of memory blocks is limited to a 32-bit number on 32-bit systems (for the obvious reason).
Size_t is 32-bit on at least some platforms:
$ cat t.c #include <stdlib.h> #include <stdio.h>
int main(void) { printf("sizeof(size_t): %d\n", (int)sizeof(size_t)); return 0; }
$ cc -o t t.c -arch i386 $ ./t sizeof(size_t): 4
This session is on an OSX system, but you'll get the same output on a 32-bit linux system with default compiler settings (I've tested this on a SLES10 system).
You are correct. My bad. size_t vs off_t vs my full brain strikes again. Regardless it doesn't change the correctness of my change. Though I'd love it if someone would figure out the cross platform compiler macro based struct alignment incantations to get rid of the need for the union with dummy all together. It wasn't even clear from the 2002 change description if changing double to long double over 10 years ago was actually _fixing_ a bug or was done for no good reason? Our allocator (obmalloc.c) doesn't allocate memory at many platform's required 16 byte long double alignment, it uses 8 byte alignment so the change couldn't have had any impact unless someone compiled --without-pymalloc using a system allocator that guaranteed a larger value. I wouldn't expect that and have seen no indication of that anywhere. The history of the long double and double additions: http://hg.python.org/cpython/rev/7065135f9202 long double - http://hg.python.org/cpython/rev/b4f829941f3d double - http://bugs.python.org/issue467145 why double was added, it fixed the HPUX 11 build. -gps
We can probably get rid of the double and this union hack all together today. That is a slightly more invasive change that can be left for later.
How do you suggest to get rid of it? Some platforms still have strict alignment rules and we must enforce that PyObjects (*) are always aligned to the largest possible alignment, since a PyObject-derived struct can hold arbitrary C types.
(*) GC-enabled PyObjects, anyway. Others will be naturally aligned thanks to the memory allocator.
What's more, I think you shouldn't be doing this kind of change in a bugfix release. It might break compiled C extensions since you are changing some characteristics of object layout (although you would probably only break those extensions which access the GC header, which is probably not many of them). Resource consumption improvements generally go only into the next feature release.
This isn't a resource consumption improvement. It is a compilation correctness change with zero impact on the generated code or ABI compatibility before and after. The structure, as defined, is was flagged as problematic by Clang's undefined behavior sanitizer because it contains a 'long double' which requires 16-byte alignment but Python's own memory allocator was using an 8 byte boundary.
So changing the definition of the dummy side of the union makes zero difference to already compiled code as it (a) doesn't change the structure's size and (b) all existing implementations already align these on an 8 byte boundary.
-gps
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com
On Fri, Dec 14, 2012 at 7:27 AM, Gregory P. Smith <greg@krypto.org> wrote:
So changing the definition of the dummy side of the union makes zero difference to already compiled code as it (a) doesn't change the structure's size and (b) all existing implementations already align these on an 8 byte boundary.
It looks to me as though the struct size *is* changed, at least on some platforms. Before this commit, I get (OS X 10.6, 64-bit non-debug build): Python 3.4.0a0 (default:b4c383f31881+, Dec 14 2012, 08:30:39) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class A(object): pass ... a = A() import sys sys.getsizeof(a) 64
After it: Python 3.4.0a0 (default:76bc92fb90c1+, Dec 14 2012, 08:33:48) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class A(object): pass ... a = A() import sys sys.getsizeof(a) 56
-- Mark
Yes, see the followup. My comments before were all misinterpreting size_t. Same result on x86_64 linux. On a 64-bit platform the 24 byte structure now occupies 24 bytes instead of being padded to 32. Nice. On a 32-bit platform it should remain 16 bytes. The PyGC_Head union structure is NOT part of the ABI laid out in http://www.python.org/dev/peps/pep-0384/ and is accurately excluded from the .h file when Py_LIMITED_API is defined so changing this in 3.4 should not be a problem. This structure occupies the space gc tracked PyObject* pointers. -gps On Fri, Dec 14, 2012 at 12:42 AM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Fri, Dec 14, 2012 at 7:27 AM, Gregory P. Smith <greg@krypto.org> wrote:
So changing the definition of the dummy side of the union makes zero difference to already compiled code as it (a) doesn't change the structure's size and (b) all existing implementations already align these on an 8 byte boundary.
It looks to me as though the struct size *is* changed, at least on some platforms. Before this commit, I get (OS X 10.6, 64-bit non-debug build):
Python 3.4.0a0 (default:b4c383f31881+, Dec 14 2012, 08:30:39) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class A(object): pass ... a = A() import sys sys.getsizeof(a) 64
After it:
Python 3.4.0a0 (default:76bc92fb90c1+, Dec 14 2012, 08:33:48) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class A(object): pass ... a = A() import sys sys.getsizeof(a) 56
-- Mark
Le Fri, 14 Dec 2012 01:14:04 -0800, "Gregory P. Smith" <greg@krypto.org> a écrit :
Yes, see the followup. My comments before were all misinterpreting size_t.
Same result on x86_64 linux. On a 64-bit platform the 24 byte structure now occupies 24 bytes instead of being padded to 32. Nice. On a 32-bit platform it should remain 16 bytes.
But you are losing the 16-byte alignment that the union was precisely designed to enforce.
The PyGC_Head union structure is NOT part of the ABI laid out in http://www.python.org/dev/peps/pep-0384/ and is accurately excluded from the .h file when Py_LIMITED_API is defined so changing this in 3.4 should not be a problem.
Not an ABI problem in 3.4 indeed (except that it might break platforms with strict alignment requirements). It should be noted that the GC head isn't part of standard atomic types (int, float, str...), so the memory gain will not be very noticeable IMO. Regards Antoine.
participants (4)
-
Antoine Pitrou
-
Gregory P. Smith
-
Mark Dickinson
-
Ronald Oussoren