PEP 539 (second round): A new C API for Thread-Local Storage in CPython
Hi python-dev, Since Erik started the PEP 539 thread on python-ideas, I've collected feedbacks in the discussion and pull-request, and tried improvement for the API specification and reference implementation, as the result I think resolved issues which pointed out by feedbacks. Well, it's probably not finish yet, there is one which bothers me. I'm not sure the CPython startup sequence design (PEP 432 Restructuring the CPython startup sequence, it might be a conflict with the draft specification [1]), please let me know what you think about the new API specification. In any case, I start a new thread of the updated draft. Summary of technical changes: - Two functions which correspond PyThread_delete_key_value and PyThread_ReInitTLS are omitted, because these are for the removed CPython's own TLS implementation. - Add an internal field "_is_initialized" and a constant default value "Py_tss_NEEDS_INIT" to Py_tss_t type to indicate the thread key's initialization state independent of the underlying implementation. - Then, define behaviors for functions which uses the "_is_initialized" field. - Change the key argument to pass a pointer, allow to use in the limited API that does not know the key type size. - Add three functions which dynamic (de-)allocation and the key's initialization state checking, because handle opaque struct. - Change platform support in the case of enabling thread support, all platforms are required at least one of native thread implementations. Also the draft has been added explanations and rationales for above changes, moreover, additional annotations for information. Regards, Masayuki [1]: The specifications of thread key creation and deletion refer how to use in the API clients (Modules/_tracemalloc.c and Python/pystate.c). One of those, Py_Initialize function that is a caller's origin of PyThread_tss_create is the flow "no-op when called for a second time" until CPython 3.6 [2]. However, an internal function _Py_InitializeCore that has been added newly in the current master branch is the flow "fatal error when called for a second time" [3]. [2]: https://docs.python.org/3.6/c-api/init.html#c.Py_Initialize [3]: https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L508 First round for PEP 539: https://mail.python.org/pipermail/python-ideas/2016-December/043983.html Discussion for the issue: https://bugs.python.org/issue25658 HTML version for PEP 539 draft: https://www.python.org/dev/peps/pep-0539/ Diff between first round and second round: https://gist.github.com/ma8ma/624f9e4435ebdb26230130b11ce12d20/revisions And the pull-request for reference implementation (work in progress): https://github.com/python/cpython/pull/1362 ======================================== PEP: 539 Title: A New C-API for Thread-Local Storage in CPython Version: $Revision$ Last-Modified: $Date$ Author: Erik M. Bray, Masayuki Yamamoto BDFL-Delegate: Nick Coghlan Status: Draft Type: Informational Content-Type: text/x-rst Created: 20-Dec-2016 Post-History: 16-Dec-2016 Abstract ======== The proposal is to add a new Thread Local Storage (TLS) API to CPython which would supersede use of the existing TLS API within the CPython interpreter, while deprecating the existing API. The new API is named "Thread Specific Storage (TSS) API" (see `Rationale for Proposed Solution`_ for the origin of the name). Because the existing TLS API is only used internally (it is not mentioned in the documentation, and the header that defines it, ``pythread.h``, is not included in ``Python.h`` either directly or indirectly), this proposal probably only affects CPython, but might also affect other interpreter implementations (PyPy?) that implement parts of the CPython API. This is motivated primarily by the fact that the old API uses ``int`` to represent TLS keys across all platforms, which is neither POSIX-compliant, nor portable in any practical sense [1]_. .. note:: Throughout this document the acronym "TLS" refers to Thread Local Storage and should not be confused with "Transportation Layer Security" protocols. Specification ============= The current API for TLS used inside the CPython interpreter consists of 6 functions:: PyAPI_FUNC(int) PyThread_create_key(void) PyAPI_FUNC(void) PyThread_delete_key(int key) PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value) PyAPI_FUNC(void *) PyThread_get_key_value(int key) PyAPI_FUNC(void) PyThread_delete_key_value(int key) PyAPI_FUNC(void) PyThread_ReInitTLS(void) These would be superseded by a new set of analogous functions:: PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key) PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t *key) PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t *key, void *value) PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t *key) The specification also adds a few new features: * A new type ``Py_tss_t``--an opaque type the definition of which may depend on the underlying TLS implementation. It is defined:: typedef struct { bool _is_initialized; NATIVE_TSS_KEY_T _key; } Py_tss_t; where ``NATIVE_TSS_KEY_T`` is a macro whose value depends on the underlying native TLS implementation (e.g. ``pthread_key_t``). * A constant default value for ``Py_tss_t`` variables, ``Py_tss_NEEDS_INIT``. * Three new functions:: PyAPI_FUNC(Py_tss_t *) PyThread_tss_alloc(void) PyAPI_FUNC(void) PyThread_tss_free(Py_tss_t *key) PyAPI_FUNC(bool) PyThread_tss_is_created(Py_tss_t *key) The first two are needed for dynamic (de-)allocation of a ``Py_tss_t``, particularly in extension modules built with ``Py_LIMITED_API``, where static allocation of this type is not possible due to its implementation being opaque at build time. A value returned by ``PyThread_tss_alloc`` is the same state as initialized by ``Py_tss_NEEDS_INIT``, or ``NULL`` for dynamic allocation failure. The behavior of ``PyThread_tss_free`` involves calling ``PyThread_tss_delete`` preventively, or is no-op if the value pointed to by the ``key`` argument is ``NULL``. ``PyThread_tss_is_created`` returns ``true`` if the given ``Py_tss_t`` has been initialized (i.e. by ``PyThread_tss_create``). The new TSS API does not provide functions which correspond to ``PyThread_delete_key_value`` and ``PyThread_ReInitTLS``, because these functions are for the removed CPython's own TLS implementation, that is the existing API behavior has become as follows: ``PyThread_delete_key_value(key)`` is equal to ``PyThread_set_key_value(key, NULL)``, and ``PyThread_ReInitTLS()`` is no-op [8]_. The new ``PyThread_tss_`` functions are almost exactly analogous to their original counterparts with a few minor differences: Whereas ``PyThread_create_key`` takes no arguments and returns a TLS key as an ``int``, ``PyThread_tss_create`` takes a ``Py_tss_t*`` as an argument and returns an ``int`` status code. The behavior of ``PyThread_tss_create`` is undefined if the value pointed to by the ``key`` argument is not initialized by ``Py_tss_NEEDS_INIT``. The returned status code is zero on success and non-zero on failure. The meanings of non-zero status codes are not otherwise defined by this specification. Similarly the other ``PyThread_tss_`` functions are passed a ``Py_tss_t*`` whereas previously the key was passed by value. This change is necessary, as being an opaque type, the ``Py_tss_t`` type could hypothetically be almost any size. This is especially necessary for extension modules built with ``Py_LIMITED_API``, where the size of the type is not known. Except for ``PyThread_tss_free``, the behaviors of ``PyThread_tss_`` are undefined if the value pointed to by the ``key`` argument is ``NULL``. Moreover, because of the use of ``Py_tss_t`` instead of ``int``, there are additional behaviors which the existing API design would be carried over into new API: The TSS key creation and deletion are parts of "do-if-needed" flow and these features are silently skipped if already done--Calling ``PyThread_tss_create`` with an initialized key does nothing and returns success soon. This is also the case of calling ``PyThread_tss_delete`` with an uninitialized key. The behavior of ``PyThread_tss_delete`` is defined to change the key's initialization state to "uninitialized" in order to restart the CPython interpreter without terminating the process (e.g. embedding Python in an application) [12]_. The old ``PyThread_*_key*`` functions will be marked as deprecated in the documentation, but will not generate runtime deprecation warnings. Additionally, on platforms where ``sizeof(pthread_key_t) != sizeof(int)``, ``PyThread_create_key`` will return immediately with a failure status, and the other TLS functions will all be no-ops on such platforms. Comparison of API Specification ------------------------------- ================= ============================= ============================= API Thread Local Storage (TLS) Thread Specific Storage (TSS) ================= ============================= ============================= Version Existing New Key Type ``int`` ``Py_tss_t`` (opaque type) Handle Native Key cast to ``int`` conceal into internal field Function Argument ``int`` ``Py_tss_t *`` Features - create key - create key - delete key - delete key - set value - set value - get value - get value - delete value - (set ``NULL`` instead) [8]_ - reinitialize keys (for - (unnecessary) [8]_ after fork) - dynamically (de-)allocate key - check key's initialization state Default Value (``-1`` as key creation ``Py_tss_NEEDS_INIT`` failure) Requirement native thread native thread (since CPython 3.7 [9]_) Restriction Not support platform where Unable to statically allocate native TLS key is defined in key when ``Py_LIMITED_API`` a way that cannot be safely is defined. cast to ``int``. ================= ============================= ============================= Example ------- With the proposed changes, a TSS key is initialized like:: static Py_tss_t tss_key = Py_tss_NEEDS_INIT; if (PyThread_tss_create(&tss_key)) { /* ... handle key creation failure ... */ } The initialization state of the key can then be checked like:: assert(PyThread_tss_is_created(&tss_key)); The rest of the API is used analogously to the old API:: int the_value = 1; if (PyThread_tss_get(&tss_key) == NULL) { PyThread_tss_set(&tss_key, (void *)&the_value); assert(PyThread_tss_get(&tss_key) != NULL); } /* ... once done with the key ... */ PyThread_tss_delete(&tss_key); assert(!PyThread_tss_is_created(&tss_key)); When ``Py_LIMITED_API`` is defined, a TSS key must be dynamically allocated:: static Py_tss_t *ptr_key = PyThread_tss_alloc(); if (ptr_key == NULL) { /* ... handle key allocation failure ... */ } assert(!PyThread_tss_is_created(ptr_key)); /* ... once done with the key ... */ PyThread_tss_free(ptr_key); ptr_key = NULL; Platform Support Changes ======================== A new "Native Thread Implementation" section will be added to PEP 11 that states: * As of CPython 3.7, in the case of enabling thread support, all platforms are required to provide at least one of native thread implementation (as of pthreads or Windows) to implement TSS API. Any TSS API problems that occur in the implementation without native thread will be closed as "won't fix". Motivation ========== The primary problem at issue here is the type of the keys (``int``) used for TLS values, as defined by the original PyThread TLS API. The original TLS API was added to Python by GvR back in 1997, and at the time the key used to represent a TLS value was an ``int``, and so it has been to the time of writing. This used CPython's own TLS implementation, but the current generation of which hasn't been used, largely unchanged, in Python/thread.c. Support for implementation of the API on top of native thread implementations (pthreads and Windows) was added much later, and the own implementation has been no longer necessary and removed [9]_. The problem with the choice of ``int`` to represent a TLS key, is that while it was fine for CPython's own TLS implementation, and happens to be compatible with Windows (which uses ``DWORD`` for the analogous data), it is not compatible with the POSIX standard for the pthreads API, which defines ``pthread_key_t`` as an opaque type not further defined by the standard (as with ``Py_tss_t`` described above) [14]_. This leaves it up to the underlying implementation how a ``pthread_key_t`` value is used to look up thread-specific data. This has not generally been a problem for Python's API, as it just happens that on Linux ``pthread_key_t`` is defined as an ``unsigned int``, and so is fully compatible with Python's TLS API--``pthread_key_t``'s created by ``pthread_create_key`` can be freely cast to ``int`` and back (well, not exactly, even this has some limitations as pointed out by issue #22206). However, as issue #25658 points out, there are at least some platforms (namely Cygwin, CloudABI, but likely others as well) which have otherwise modern and POSIX-compliant pthreads implementations, but are not compatible with Python's API because their ``pthread_key_t`` is defined in a way that cannot be safely cast to ``int``. In fact, the possibility of running into this problem was raised by MvL at the time pthreads TLS was added [2]_. It could be argued that PEP-11 makes specific requirements for supporting a new, not otherwise officially-support platform (such as CloudABI), and that the status of Cygwin support is currently dubious. However, this creates a very high barrier to supporting platforms that are otherwise Linux- and/or POSIX-compatible and where CPython might otherwise "just work" except for this one hurdle. CPython itself imposes this implementation barrier by way of an API that is not compatible with POSIX (and in fact makes invalid assumptions about pthreads). Rationale for Proposed Solution =============================== The use of an opaque type (``Py_tss_t``) to key TLS values allows the API to be compatible, with all present (POSIX and Windows) and future (C11?) native TLS implementations supported by CPython, as it allows the definition of ``Py_tss_t`` to depend on the underlying implementation. Since the existing TLS API has been available in *the limited API* [13]_ for some platforms (e.g. Linux), CPython makes an effort to provide the new TSS API at that level likewise. Note, however, that ``Py_tss_t`` definition becomes to be an opaque struct when ``Py_LIMITED_API`` is defined, because exposing ``NATIVE_TSS_KEY_T`` as part of the limited API would prevent us from switching native thread implementation without rebuilding extension module. A new API must be introduced, rather than changing the function signatures of the current API, in order to maintain backwards compatibility. The new API also more clearly groups together these related functions under a single name prefix, ``PyThread_tss_``. The "tss" in the name stands for "thread-specific storage", and was influenced by the naming and design of the "tss" API that is part of the C11 threads API [15]_. However, this is in no way meant to imply compatibility with or support for the C11 threads API, or signal any future intention of supporting C11--it's just the influence for the naming and design. The inclusion of the special default value ``Py_tss_NEEDS_INIT`` is required by the fact that not all native TLS implementations define a sentinel value for uninitialized TLS keys. For example, on Windows a TLS key is represented by a ``DWORD`` (``unsigned int``) and its value must be treated as opaque [3]_. So there is no unsigned integer value that can be safely used to represent an uninitialized TLS key on Windows. Likewise, POSIX does not specify a sentinel for an uninitialized ``pthread_key_t``, instead relying on the ``pthread_once`` interface to ensure that a given TLS key is initialized only once per-process. Therefore, the ``Py_tss_t`` type contains an explicit ``._is_initialized`` that can indicate the key's initialization state independent of the underlying implementation. Changing ``PyThread_create_key`` to immediately return a failure status on systems using pthreads where ``sizeof(int) != sizeof(pthread_key_t)`` is intended as a sanity check: Currently, ``PyThread_create_key`` may report initial success on such systems, but attempts to use the returned key are likely to fail. Although in practice this failure occurs earlier in the interpreter initialization, it's better to fail immediately at the source of problem (``PyThread_create_key``) rather than sometime later when use of an invalid key is attempted. In other words, this indicates clearly that the old API is not supported on platforms where it cannot be used reliably, and that no effort will be made to add such support. Rejected Ideas ============== * Do nothing: The status quo is fine because it works on Linux, and platforms wishing to be supported by CPython should follow the requirements of PEP-11. As explained above, while this would be a fair argument if CPython were being to asked to make changes to support particular quirks or features of a specific platform, in this case it is quirk of CPython that prevents it from being used to its full potential on otherwise POSIX-compliant platforms. The fact that the current implementation happens to work on Linux is a happy accident, and there's no guarantee that this will never change. * Affected platforms should just configure Python ``--without-threads``: This is a possible temporary workaround to the issue, but only that. Python should not be hobbled on affected platforms despite them being otherwise perfectly capable of running multi-threaded Python. * Affected platforms should use CPython's own TLS implementation instead of native TLS implementation: This is a more acceptable alternative to the previous idea, and in fact there had been a patch to do just that [4]_. However, the own implementation being "slower and clunkier" in general than native implementations still needlessly hobbles performance on affected platforms. At least one other module (``tracemalloc``) is also broken if Python is built without native implementation. And this idea cannot be adopted because the own implementation was removed. * Keep the existing API, but work around the issue by providing a mapping from ``pthread_key_t`` values to ``int`` values. A couple attempts were made at this ([5]_, [6]_), but this only injects needless complexity and overhead into performance-critical code on platforms that are not currently affected by this issue (such as Linux). Even if use of this workaround were made conditional on platform compatibility, it introduces platform-specific code to maintain, and still has the problem of the previous rejected ideas of needlessly hobbling performance on affected platforms. Implementation ============== An initial version of a patch [7]_ is available on the bug tracker for this issue. Since the migration to Github, it's being developed in the ``pep539-tss-api`` feature branch [10]_ in Masayuki Yamamoto's fork of the CPython repository on Github. A work-in-progress PR is available at [11]_. This reference implementation covers not only the enhancement request in API features, but also the client codes fix needed to replace the existing TLS API with the new TSS API. Copyright ========= This document has been placed in the public domain. References and Footnotes ======================== .. [1] http://bugs.python.org/issue25658 .. [2] https://bugs.python.org/msg116292 .. [3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).a... .. [4] http://bugs.python.org/file45548/configure-pthread_key_t.patch .. [5] http://bugs.python.org/file44269/issue25658-1.patch .. [6] http://bugs.python.org/file44303/key-constant-time.diff .. [7] http://bugs.python.org/file46379/pythread-tss-3.patch .. [8] https://bugs.python.org/msg298342 .. [9] http://bugs.python.org/issue30832 .. [10] https://github.com/python/cpython/compare/master...ma8ma:pep539-tss-api .. [11] https://github.com/python/cpython/pull/1362 .. [12] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx .. [13] It is also called as "stable ABI" (https://www.python.org/dev/peps/pep-0384/) .. [14] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.... .. [15] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=404
On 31 August 2017 at 18:16, Masayuki YAMAMOTO <ma3yuki.8mamo10@gmail.com> wrote:
Hi python-dev,
Since Erik started the PEP 539 thread on python-ideas, I've collected feedbacks in the discussion and pull-request, and tried improvement for the API specification and reference implementation, as the result I think resolved issues which pointed out by feedbacks.
Well, it's probably not finish yet, there is one which bothers me. I'm not sure the CPython startup sequence design (PEP 432 Restructuring the CPython startup sequence, it might be a conflict with the draft specification [1]),
I think that's just a bug in the startup refactoring - we don't currently test the "Py_Initialize()/Py_Initialize()/Py_Finalize()" sequence anywhere, and I'd missed that it's explicitly documented as being permitted. I'd still want to keep the "multiple calls without an intervening finalize are prohibited" behaviour for the new more granular APIs (since it's simpler and easier to explain if it just always fails rather than attempting to check that the previous initialization supplied the same config settings), but the documented Py_Initialize() behaviour can be reinstated by restoring the early return in _Py_InitializeEx_Private. It's also worth noting that we *do* test repeated Py_Initialize()/Py_Finalize() cycles - Py_Finalize() explicitly clears the internal flags that would otherwise lead to a fatal error in _Py_InitializeCore. As far as the PEP itself goes, this version looks good to me, so if there aren't any other significant comments between now and then, I'm likely to accept it at the core development sprint next week. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2017-08-31 18:51 GMT+09:00 Nick Coghlan <ncoghlan@gmail.com>:
[...] I think that's just a bug in the startup refactoring - we don't currently test the "Py_Initialize()/Py_Initialize()/Py_Finalize()" sequence anywhere, and I'd missed that it's explicitly documented as being permitted. I'd still want to keep the "multiple calls without an intervening finalize are prohibited" behaviour for the new more granular APIs (since it's simpler and easier to explain if it just always fails rather than attempting to check that the previous initialization supplied the same config settings), but the documented Py_Initialize() behaviour can be reinstated by restoring the early return in _Py_InitializeEx_Private.
I get the point of difference between document and current code, I don't mind :) As far as the PEP itself goes, this version looks good to me, so if
there aren't any other significant comments between now and then, I'm likely to accept it at the core development sprint next week.
I took time, but I'm happy to near to finish. Thanks you for helping! Masayuki
On Thu, Aug 31, 2017 at 10:16 AM, Masayuki YAMAMOTO <ma3yuki.8mamo10@gmail.com> wrote:
Hi python-dev,
Since Erik started the PEP 539 thread on python-ideas, I've collected feedbacks in the discussion and pull-request, and tried improvement for the API specification and reference implementation, as the result I think resolved issues which pointed out by feedbacks.
Thanks Masayuki for taking the lead on updating the PEP. I've been off the ball with it for a while. In particular the table summarizing the changes is nice. I just have a few minor changes to suggest (typos and such) that I'll make in a pull request.
Because I couldn't go to there by myself, I'm really grateful to you and your first draft. Masayuki 2017-08-31 19:40 GMT+09:00 Erik Bray <erik.m.bray@gmail.com>:
Hi python-dev,
Since Erik started the PEP 539 thread on python-ideas, I've collected feedbacks in the discussion and pull-request, and tried improvement for
On Thu, Aug 31, 2017 at 10:16 AM, Masayuki YAMAMOTO <ma3yuki.8mamo10@gmail.com> wrote: the
API specification and reference implementation, as the result I think resolved issues which pointed out by feedbacks.
Thanks Masayuki for taking the lead on updating the PEP. I've been off the ball with it for a while. In particular the table summarizing the changes is nice. I just have a few minor changes to suggest (typos and such) that I'll make in a pull request.
2017-08-31 19:40 GMT+09:00 Erik Bray <erik.m.bray@gmail.com>:
[...] the changes is nice. I just have a few minor changes to suggest (typos and such) that I'll make in a pull request.
Steve Dower points out which avoids the use of bool in header declarations [*]. I'd change PyThread_tss_is_created declaration's bool to replace with int. Erik, would you change the specification at present PEP PR? (I comment also the PR) Thanks, Masayuki [*]: https://github.com/python/cpython/pull/1362#pullrequestreview-59884901
participants (3)
-
Erik Bray
-
Masayuki YAMAMOTO
-
Nick Coghlan