How can we use 48bit pointer safely?

Hi, As far as I know, most amd64 and arm64 systems use only 48bit address spaces. (except [1]) [1] https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_... It means there are some chance to compact some data structures. I point two examples below. My question is; can we use 48bit pointer safely? It depends on CPU architecture & OS memory map. Maybe, configure option which is available on only (amd64, amd64) * (Linux, Windows, macOS)? # Possible optimizations by 48bit pointer ## PyASCIIObject [snip] unsigned int ready:1; /* Padding to ensure that PyUnicode_DATA() is always aligned to 4 bytes (see issue #19537 on m68k). */ unsigned int :24; } state; wchar_t *wstr; /* wchar_t representation (null-terminated) */ } PyASCIIObject; Currently, state is 8bit + 24bit padding. I think we can pack state and wstr in 64bit. ## PyDictKeyEntry typedef struct { /* Cached hash code of me_key. */ Py_hash_t me_hash; PyObject *me_key; PyObject *me_value; /* This field is only meaningful for combined tables */ } PyDictKeyEntry; There are chance to compact it: Use only 32bit for hash and 48bit*2 for key and value. CompactEntry may be 16byte instead of 24byte. Regards, -- INADA Naoki <songofacandy@gmail.com>

30.03.18 09:28, INADA Naoki пише:
If the size be the main problem, we could use these 8 bit for encoding the type and the size of the value for some types, and even encode the value itself for some types in other 48 bits. For example 48 bit integers, 0-, 1- and 2-character Unicode strings, ASCII strings up to 6 characters (and even longer if use base 64 encodings for ASCII identifiers), singletons like None, True, False, Ellipsis, NotImplementes could be encoded in 64 bit word without using additional memory. But this would significantly complicate and slow down the code.

On Fri, 30 Mar 2018 15:28:47 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
As that paper shows, effective virtual address width tends to increase over time to accomodate growing needs. Bigger systems like IBM POWER sytems may already have larger virtual address spaces. So we can't safely assume that bits 48-63 are available for us. Another issue is the cost of the associated bit-twiddling. It will all depend how often it needs to be done. Note that pointers can be "negative", i.e. some of them will have all 1s in their upper bits, and you need to reproduce that when reconstituting the original pointer. A safer alternative is to use the *lower* bits of pointers. The bottom 3 bits are always available for storing ancillary information, since typically all heap-allocated data will be at least 8-bytes aligned (probably 16-bytes aligned on 64-bit processes). However, you also get less bits :-) Regards Antoine.

On Fri, 30 Mar 2018 15:28:47 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
We could also simply nuke wstr. I frankly don't think it's very important. It's only used when calling system functions taking a wchar_t argument, as an « optimization ». I'd be willing to guess that modern workloads aren't bottlenecked by the cost overhead of those system functions... Of course, the question is whether all this matters. Is it important to save 8 bytes on each unicode object? Only testing would tell. Regards Antoine.

30.03.18 16:54, Antoine Pitrou пише:
This is possible only after removing all Py_UNICODE related C API. It is deprecated since 3.3, but only in the documentation, and should stay to the EOL of 2.7. Only in 3.7 most of these functions started emitting deprecation warnings at compile time (GCC-only). [1] It would be good to make them emitted in other compilers too. In future versions we could make them emitting user-visible runtime deprecation warnings, and finally make them always failing after 2020. [1] https://bugs.python.org/issue19569

On Fri, 30 Mar 2018 21:40:21 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
It should be possible with MSVC: https://stackoverflow.com/a/295229/10194 and clang as well: http://releases.llvm.org/3.9.1/tools/clang/docs/AttributeReference.html#depr... Regards Antoine.

Some of APIs are stated as "Deprecated since version 3.3, will be removed in version 4.0:". e.g. https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AS_UNICODE So we will remove them (and wstr) at Python 4.0.

Of course, the question is whether all this matters. Is it important to save 8 bytes on each unicode object? Only testing would tell.
Last year, I tried to profile memory usage of web application in my company. https://gist.github.com/methane/ce723adb9a4d32d32dc7525b738d3c31#investigati... Without -OO option, str is the most memory eater and average size is about 109bytes. (Note: SQLAlchemy uses docstring very heavily). With -OO option, str is the third memory eater, and average size was about 73bytes. So I think 8bytes for each string object is not negligible. But, of course, it's vary according to applications and libraries. -- INADA Naoki <songofacandy@gmail.com>

30.03.18 09:28, INADA Naoki пише:
If the size be the main problem, we could use these 8 bit for encoding the type and the size of the value for some types, and even encode the value itself for some types in other 48 bits. For example 48 bit integers, 0-, 1- and 2-character Unicode strings, ASCII strings up to 6 characters (and even longer if use base 64 encodings for ASCII identifiers), singletons like None, True, False, Ellipsis, NotImplementes could be encoded in 64 bit word without using additional memory. But this would significantly complicate and slow down the code.

On Fri, 30 Mar 2018 15:28:47 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
As that paper shows, effective virtual address width tends to increase over time to accomodate growing needs. Bigger systems like IBM POWER sytems may already have larger virtual address spaces. So we can't safely assume that bits 48-63 are available for us. Another issue is the cost of the associated bit-twiddling. It will all depend how often it needs to be done. Note that pointers can be "negative", i.e. some of them will have all 1s in their upper bits, and you need to reproduce that when reconstituting the original pointer. A safer alternative is to use the *lower* bits of pointers. The bottom 3 bits are always available for storing ancillary information, since typically all heap-allocated data will be at least 8-bytes aligned (probably 16-bytes aligned on 64-bit processes). However, you also get less bits :-) Regards Antoine.

On Fri, 30 Mar 2018 15:28:47 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
We could also simply nuke wstr. I frankly don't think it's very important. It's only used when calling system functions taking a wchar_t argument, as an « optimization ». I'd be willing to guess that modern workloads aren't bottlenecked by the cost overhead of those system functions... Of course, the question is whether all this matters. Is it important to save 8 bytes on each unicode object? Only testing would tell. Regards Antoine.

30.03.18 16:54, Antoine Pitrou пише:
This is possible only after removing all Py_UNICODE related C API. It is deprecated since 3.3, but only in the documentation, and should stay to the EOL of 2.7. Only in 3.7 most of these functions started emitting deprecation warnings at compile time (GCC-only). [1] It would be good to make them emitted in other compilers too. In future versions we could make them emitting user-visible runtime deprecation warnings, and finally make them always failing after 2020. [1] https://bugs.python.org/issue19569

On Fri, 30 Mar 2018 21:40:21 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
It should be possible with MSVC: https://stackoverflow.com/a/295229/10194 and clang as well: http://releases.llvm.org/3.9.1/tools/clang/docs/AttributeReference.html#depr... Regards Antoine.

Some of APIs are stated as "Deprecated since version 3.3, will be removed in version 4.0:". e.g. https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AS_UNICODE So we will remove them (and wstr) at Python 4.0.

Of course, the question is whether all this matters. Is it important to save 8 bytes on each unicode object? Only testing would tell.
Last year, I tried to profile memory usage of web application in my company. https://gist.github.com/methane/ce723adb9a4d32d32dc7525b738d3c31#investigati... Without -OO option, str is the most memory eater and average size is about 109bytes. (Note: SQLAlchemy uses docstring very heavily). With -OO option, str is the third memory eater, and average size was about 73bytes. So I think 8bytes for each string object is not negligible. But, of course, it's vary according to applications and libraries. -- INADA Naoki <songofacandy@gmail.com>
participants (3)
-
Antoine Pitrou
-
INADA Naoki
-
Serhiy Storchaka