New PyThread_tss_ C-API for CPython

Greetings all, I wanted to bring attention to an issue that's been languishing on the bug tracker since last year, which I think would best be addressed by changes to CPython's C-API. The original issue is at http://bugs.python.org/issue25658, but I have made an effort below in a sort of proto-PEP to summarize the problem and the proposed solution. I haven't written this up in the proper PEP format because I want to see if the idea has some broader support first, and it's also not clear to me whether C-API changes (especially to undocumented APIs) even require their own PEP. Abstract ======== The proposal is to add a new Thread Local Storage (TLS) API to CPython which would supersede use of the existing TLS API within the CPython interpreter, while deprecating the existing API. Because the existing TLS API is only used internally (it is not mentioned in the documentation, and the header that defines it, pythread.h, is not included in Python.h either directly or indirectly), this proposal probably only affects CPython, but might also affect other interpreter implementations (PyPy?) that implement parts of the CPython API. Specification ============= The current API for TLS used inside the CPython interpreter consists of 5 functions: PyAPI_FUNC(int) PyThread_create_key(void) PyAPI_FUNC(void) PyThread_delete_key(int key) PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value) PyAPI_FUNC(void *) PyThread_get_key_value(int key) PyAPI_FUNC(void) PyThread_delete_key_value(int key) These would be superseded with a new set of analogous functions: PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key) PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t key) PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t key, void *value) PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t key) PyAPI_FUNC(void) PyThread_tss_delete_value(Py_tss_t key) and includes the definition of a new type Py_tss_t--any opaque type the specification of which is not given here, and may depend on the underlying TLS implementation. The new PyThread_tss_ functions are almost exactly analogous to their original counterparts with a minor difference: Whereas PyThread_create_key takes no arguments and returns a TLS key as an int, PyThread_tss_create takes a Py_tss_t* as an argument, and returns a Py_tss_t by pointer--the int return value is a status, returning zero on success and non-zero on failure. Further, the old PyThread_*_key* functions will be marked as deprecated. Additionally, the pthread implementations of the old PyThread_*_key* functions will either fail or be no-ops on platforms where sizeof(pythead_t) != sizeof(int). Motivation ========== The primary problem at issue here is the type of the keys (int) used for TLS values, as defined by the original PyThread TLS API. The original TLS API was added to Python by GvR back in 1997, and at the time the key used to represent a TLS value was an int, and so it has been to this day. This used CPython's own TLS implementation, the current generation of which can still be found, largely unchanged, in Python/thread.c. Support for implementation of the API on top of native thread implementations (NT and pthreads) was added much later, and the built-in implementation may still be used on other platforms. The problem with the choice of int to represent a TLS key, is that while it was fine for CPython's internal TLS implementation, and happens to be fine for NT (which uses DWORD), it is not compatible the POSIX standard for the pthreads API, which defines pthread_key_t as an opaque type not further designed by the standard (as with Py_tss_t described above). This leaves it up to the underlying implementation how a pthread_key_t value is used to look thread-specific data. This has not generally been a problem for Python's API, as it just happens that on Linux pthread_key_t is just defined as an unsigned int, and so is fully compatible with Python's TLS API--pthread_key_t's created by pthread_create_key can be freely cast to ints and back (well, not really, even this has issues as pointed out by issue #22206). However, as issue #25658 points out there are at least some platforms (namely Cygwin, CloudABI, but likely others as well) which have otherwise modern and POSIX-compliant pthreads implementations, but are not compatible with Python's API because their pthread_key_t is defined in a way that cannot be safely cast to int. In fact, the possibility of running into this problem was raised by MvL at the time pthreads TLS was added [1]. It could be argued that PEP-11 makes specific requirements for supporting a new, not otherwise officially-support platform (such as CloudABI), and that the status of Cygwin support is currently dubious. However, this places a very barrier to supporting platforms that are otherwise Linux- and/or POSIX-compatible and where CPython might otherwise "just work" except for this one hurdle which Python itself imposes by way of an API that is not compatible with POSIX (and in fact makes invalid assumptions about pthreads). Rationale for Proposed Solution =============================== The use of an opaque type (Py_tss_t) to key TLS values allows the API to be compatible, at least in this regard, with CPython's internal TLS implementation, as well as all present (NT and posix) and future (C11?) native TLS implementations supported by CPython, as it allows the definition of Py_tss_t to depend on the underlying implementation. A new API must be introduced, rather than changing the function signatures of the current API, in order to maintain backwards compatibility. The new API also more clearly groups together these related functions under a single name prefix, "PyThread_tss_". The "tss" in the name stands for "thread-specific storage", and was influenced by the naming and design of the "tss" API that is part of the C11 threads API. However, this is in no way meant to imply compatibility with or support for the C11 threads API, or signal any future intention of supporting C11--it's just the influence for the naming and design. Changing PyThread_create_key to immediately return a failure status on systems using pthreads where sizeof(int) != sizeof(pthread_key_t) is intended as a sanity check: Currently, PyThread_create_key will report initial success on such systems, but attempts to use the returned key are likely to fail. Although in practice this failure occurs quickly during interpreter startup, it's better to fail immediately at the source of failure (PyThread_create_key) rather than sometime later when use of an invalid key is attempted. Rejected Ideas ============== * Do nothing: The status quo is fine because it works on Linux, and platforms wishing to be supported by CPython should follow the requirements of PEP-11. As explained above, while this would be a fair argument if CPython were being to asked to make changes to support particular quirks of a specific platform, in this case the platforms in question are only asking to fix a quirk of CPython that prevents it from being used to its full potential on those platforms. The fact that the current implementation happens to work on Linux is a happy accident, and there's no guarantee that will never change. * Affected platforms should just configure Python --without-threads: This is a possible temporary workaround to the issue, but only that. Python should not be hobbled on affected platforms despite them being otherwise perfectly capable of running multi-threaded Python. * Affected platforms should not define Py_HAVE_NATIVE_TLS: This is a more acceptable alternative to the previous idea, and in fact there is a patch to do just that [2]. However, CPython's internal TLS implementation being "slower and clunkier" in general than native implementations still needlessly hobbles performance on affected platforms. At least one other module (tracemalloc) is also broken if Python is built without Py_HAVE_NATIVE_TLS. * Keep the existing API, but work around the issue by providing a mapping from pthread_key_t values to ints. A couple attempts were made at this [3] [4], but this only injects needless complexity and overhead into performance-critical code on platforms that are not currently affected by this issue (such as Linux). Even if use of this workaround were made conditional on platform compatibility, it introduces platform-specific code to maintain, and still has the problem of the previous rejected ideas of needlessly hobbling performance on affected platforms. Implementation ============== An initial version of a patch [5] is available on the bug tracker for this issue. The patch is proposed and written by Masayuki Yamamoto, who should be considered a co-author of this proto-PEP, though I have not consulted directly with him in writing this. If he's reading, he should chime in in case I've misrepresented anything. If you've made it this far, thanks for reading and thank you for your consideration, Erik [1] https://bugs.python.org/msg116292 [2] http://bugs.python.org/file45548/configure-pthread_key_t.patch [3] http://bugs.python.org/file44269/issue25658-1.patch [4] http://bugs.python.org/file44303/key-constant-time.diff [5] http://bugs.python.org/file45763/pythread-tss.patch

On Fri, Dec 16, 2016 at 6:07 AM, Erik Bray <erik.m.bray@gmail.com> wrote:
I am not familiar enough with the threading implementation to be anything more than moral support, but I am in favor of making some change here. This is a significant blocker to Cygwin support, which is actually fairly close to being supportable. -- Zach

On 17 December 2016 at 03:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
Likewise - we know the status quo isn't right, and the proposed change addresses that. In reviewing the patch on the tracker, the one downside I've found is that due to "pthread_key_t" being an opaque type with no defined sentinel, the consuming code in _tracemalloc.c and pystate.c needed to add separate boolean flag variables to track whether or not the key had been created. (The pthread examples at http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.... use pthread_once for a similar effect) I don't see any obvious way around that either, as even using a small struct for native pthread TLS keys would still face the problem of how to initialise the pthread_key_t field. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Dec 19, 2016 at 1:11 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hmm...fair point that it's not pretty. One way around it, albeit requiring more work/complexity, would be to extend this proposal to add a new function analogous to pthread_once--say--PyThread_call_once, and an associated Py_once_flag_t

On Mon, Dec 19, 2016 at 3:45 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
Oops--fat-fingered a 'send' command before I finished. So workaround would be to add a PyThread_call_once function, analogous to pthread_once. Yet another interface one needs to implement for a native thread implementation, but not too hard either. For pthreads there's already an obvious analogue that can be wrapped directly. For other platforms that don't have a direct analogue a (naive) implementation is still fairly simple: All you need in Py_once_flag_t is a boolean flag with an associated mutex, and a sentinel value analogous to PTHREAD_ONCE_INIT. Best, Erik

On 20 December 2016 at 00:53, Erik Bray <erik.m.bray@gmail.com> wrote:
Yeah, I think I'd prefer that - it aligns nicely with the way pthreads are defined, and means we can be more prescriptive about how to use the new API correctly for key declarations (we're currently a bit vague about exactly how to handle that in the current TLS API). With that addition, I think it will be worth turning your initial post here into a PR to the peps repo, though - not to resolve any particular controversy, but rather as an easier to find reference for the design rationale than a mailing list thread or a tracker issue. (I'd also be happy to volunteer as BDFL-Delegate, since I'm already reviewing the patch on the tracker) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Dec 20, 2016 at 9:26 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Okay, thanks. I will work on a PR to the PEPs repo, and update the proposal to add the PyThread_call_once idea, which some prescription for how it should be used. Of course, an updated patch will have to follow as well. This is probably an implementation detail, but ISTM that even with PyThread_call_once, it will be necessary to reset any used once_flags manually in PyOS_AfterFork, essentially for the same reason the autoTLSkey is reset there currently... Erik

2016-12-20 22:30 GMT+09:00 Erik Bray <erik.m.bray@gmail.com>:
Deleting threads key is executed on *_Fini functions, but Py_FinalizeEx function that calls *_Fini functions doesn't terminate CPython interpreter. Furthermore, source comment and document have said description about reinitialization after calling Py_FinalizeEx. [1] [2] That is to say there is an implicit possible that is reinitialization contrary to name "call_once" on a process level. Therefore, if CPython interpreter continues to allow reinitialization, I'd suggest to rename the call_once API to avoid misreading semantics. (for example, safe_init, check_init) Best regards, Masayuki [1] https://hg.python.org/cpython/file/default/Python/pylifecycle.c#l170 [2] https://docs.python.org/dev/c-api/init.html#c.Py_FinalizeEx

On 21 December 2016 at 01:35, Masayuki YAMAMOTO <ma3yuki.8mamo10@gmail.com> wrote:
Ouch, I'd missed that, and I agree it's not a negligible implementation detail - there are definitely applications embedding CPython out there that rely on being able to run multiple Initialize/Finalize cycles in the same process and have everything "just work". It also means using the "PyThread_*" prefix for the initialisation tracking aspect would be misleading, since the life cycle details are: 1. Create the key for the first time if it has never been previously set in the process 2. Destroy and reinit if Py_Finalize gets called 3. Destroy and reinit if a new subprocess is forked It also means we can't use pthread_once even in the pthread TLS implementation, since it doesn't provide those semantics. So I see two main alternatives here. Option 1: Modify the proposed PyThread_tss_create and PyThread_tss_delete APIs to accept a "bool *init_flag" pointer in addition to their current arguments. If *init_flag is true, then PyThread_tss_create is a no-op, otherwise it sets the flag to true after creating the key. If *init_flag is false, then PyThread_tss_delete is a no-op, otherwise it sets the flag to false after deleting the key. Option 2: Similar to option 1, but using a custom type alias, rather than using a C99 bool directly The closest API we have to these semantics at the moment would be PyGILState_Ensure, so the following API naming might work for option 2: Py_ensure_t Py_ENSURE_NEEDS_INIT Py_ENSURE_INITIALIZED Respectively, these would just be aliases for bool, false, and true. And then modify the proposed PyThread_tss_create and PyThread_tss_delete APIs to accept a "Py_ensure_t *init_flag" in addition to their current arguments. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Dec 21, 2016 at 2:10 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That all sounds good--between the two option 2 looks a bit more explicit. Though what about this? Rather than adding another type, the original proposal could be changed slightly so that Py_tss_t *is* partially defined as a struct consisting of a bool, with whatever the native TLS key is. E.g. typedef struct { bool init_flag; #if defined(_POSIX_THREADS) pthreat_key_t key; #elif defined (NT_THREADS) DWORD key; /* etc... */ } Py_tss_t; Then it's just taking Masayuki's original patch, with the global bool variables, and formalizing that by combining the initialized flag with the key, and requiring the semantics you described above for PyThread_tss_create/delete. For Python's purposes it seems like this might be good enough, with the more general purpose pthread_once-like functionality not required. Best, Erik

On Wed, Dec 21, 2016 at 11:01 AM, Erik Bray <erik.m.bray@gmail.com> wrote:
*pthread_key_t* of course, though I wonder if that was a Freudian slip :)
Of course, that's not to say it might not be useful for some other purpose, but then it's outside the scope of this discussion as long as it isn't needed for TLS key initialization.

On 21 December 2016 at 20:01, Erik Bray <erik.m.bray@gmail.com> wrote:
Aye, I also thought of that approach, but talked myself out of it since there's no definable default value for pthread_key_t. However, C99 partial initialisation may deal with that for us (by zeroing the memory without actually assigning a typed value to it), and if it does, I agree it would be better to handle the initialisation flag automatically rather than requiring callers to do it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Dec 21, 2016 at 5:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think I understand what you're saying here... To be clear, let me enumerate the three currently supported cases and how they're affected: 1) CPython's TLS: Defines -1 as an uninitialized key (by fact of the implementation--that the keys are integers starting from zero) 2) pthreads: Does not definite an uninitialized default value for keys, for reasons described at [1] under "Non-Idempotent Data Key Creation". I understand their reasoning, though I can't claim to know specifically what they mean when they say that some implementations would require the mutual-exclusion to be performed on pthread_getspecific() as well. I don't know that it applies here. 3) windows: The return value of TlsAlloc() is a DWORD (unsigned int) and [2] states that its value should be opaque. So in principle we can cover all cases with an opaque struct that contains, as its first member, an is_initialized flag. The tricky part is how to initialize the rest of the struct (containing the underlying implementation-specific key). For 1) and 3) it doesn't matter--it can just be zero. For 2) it's trickier because there's no defined constant value to initialize a pthread_key_t to. Per Nick's suggestion this can be worked around by relying on C99's initialization semantics. Per [3] section 6.7.8, clause 21: """ If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration. """ How objects with static storage are initialized is described in the previous page under clause 10, but in practice it boils down to what you would expect: Everything is initialized to zero, including nested structs and arrays. So as long as we can use this feature of C99 then I think that's the best approach. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.... [2] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).a... [3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

On 29 December 2016 at 22:12, Erik Bray <erik.m.bray@gmail.com> wrote:
That section is a little weird, as they describe two requests (one for a known-NULL default value, the other for implicit synchronisation of key creation to prevent race conditions), and only provide the justification for rejecting one of them (the second one). If I've understood correctly, the situation they're worried about there is that pthread_key_create() has to be called at least once-per-process, but must be called before *any* call to pthread_getspecific or pthread_setspecific for a given key. If you do "implicit init" rather than requiring the use of an explicit mechanism like pthread_once (or our own Py_Initialize and module import locks), then you may take a small performance hit as either *every* thread then has to call pthread_key_create() to ensure the key exists before using it, or else pthread_getspecific() and pthread_setspecific() have to become potentially blocking calls. Neither of those is desirable, so it makes sense to leave that part of the problem to the API client. In our case, we don't want the implicit synchronisation, we just want the known-NULL default value so the "Is it already set?" check can be moved inside the library function.
I checked PEP 7 to see exactly which features we've added to the approved C dialect, and designated initialisers are already on the list: https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html So I believe that would allow the initializer to be declared as something like: #define Py_tss_NEEDS_INIT {.is_initialized = false} Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 30, 2016 at 5:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Right, that is confusing to me as well. I'm guessing the reason for rejecting the first is in part a way to force us to recognize the second issue.
Okay, we're on the same page here then. I just wanted to make sure there wasn't anything else I was missing in Python's case.
Great! One could argue about whether or not the designated initializer syntax also incorporates omitted fields, but it would seem strange to insist that it doesn't. Have a happy new year, Erik

On Fri, Dec 16, 2016 at 11:07 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
You're pretty close to proper PEP format. Like others, I don't have enough knowledge of threading internals to speak to the technical side of it, but this is a well-written proposal and I agree in principle with tightening this up. The need for a PEP basically comes down to whether or not it's going to be controversial; a PEP allows you to hash out the details and then present a coherent proposal to Guido (or his delegate) for final approval. ChrisA

Hi, I'm patch author, I don't need to say anything for Erik's draft. I feel awesome that it has been clearly to explain, especially for history of API and against PEP. Thanks for great job, Erik! Cheers, Masayuki

Erik Bray writes:
Thank you for the analysis! Question:
Further, the old PyThread_*_key* functions will be marked as deprecated.
Of course, but:
Typo "pythead_t" in last line. I don't understand this. I assume that there are no such platforms supported at present. I would think that when such a platform becomes supported, code supporting "key" functions becomes unsupportable without #ifdefs on that platform, at least directly. So you should either (1) raise UnimplementedError, or (2) provide the API as a wrapper over the new API by making the integer keys indexes into a table of TSS'es, or some such device. I don't understand how (3) "make it a no-op" can be implemented for PyThread_create_key -- return 0 or -1? That would only work if there's a failure return status like 0 or -1, and it seems really dangerous to me since in general a lot of code doesn't check status even though it should. Even for code checking the status, the error message will be suboptimal ("creation failed" vs. "unimplemented"). I gather from references to casting pthread_key_t to unsigned int and back that there's probably code that does this in ways making (2) too dangerous to support. If true, perhaps that should be mentioned here.

On Sat, Dec 17, 2016 at 8:21 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
And thank *you* for the feedback!
Thanks, yes, that was suppose to be pthread_key_t of course. I think I had a few other typos too.
Masayuki already explained this downthread I think, but I could have probably made that section more precise. The point was that PyThread_create_key should immediately return -1 in this case. This is just a subtle difference over the current situation, which is that PyThread_create_key succeeds, but the key is corrupted by being cast to an int, so that later calls to PyThread_set_key_value and the like fail unexpectedly. The point is that PyThread_create_key (and we're only talking about the pthread implementation thereof, to be clear) must fail immediately if it can't work correctly. #ifdefs on the platform would not be necessary--instead, Masayuki's patch adds a feature check in configure.ac for sizeof(int) == sizeof(pthread_key_t). It should be noted that even this check is not 100% perfect, as on Linux pthread_key_t is an unsigned int, and so technically can cause Python's signed int key to overflow, but there's already an explicit check for that (which would be kept), and it's also a very unlikely scenario.
It's not necessarily too dangerous, so much as not worth the trouble, IMO. Simpler to just provide, and immediately use the new API and make the old one deprecated and explicitly not supported on those platforms where it can't work. Thanks, Erik

Erik Bray writes:
Thank you for the analysis.
Typo "pythead_t" in last line. I don't understand this. I assume that there are no such platforms supported at present. I would think that when such a platform becomes supported, code supporting "key" functions becomes unsupportable without #ifdefs on that platform, at least directly. So you should either (1) raise UnimplementedError, or (2) provide the API as a wrapper over the new API by making the integer keys indexes into a table of TSS'es, or some such device. I don't understand how (3) "make it a no-op" can be implemented for PyThread_create_key -- return 0 or -1? That would only work if there's a failure return status like 0 or -1, and it seems really dangerous to me since in general a lot of code doesn't check status even though it should. Even for code checking the status, the error message will be suboptimal ("creation failed" vs. "unimplemented"). I gather from references to casting pthread_key_t to unsigned int and back that there's probably code that does this in ways making (2) too dangerous to support. If true, perhaps that should be mentioned here.

On Fri, Dec 16, 2016 at 6:07 AM, Erik Bray <erik.m.bray@gmail.com> wrote:
I am not familiar enough with the threading implementation to be anything more than moral support, but I am in favor of making some change here. This is a significant blocker to Cygwin support, which is actually fairly close to being supportable. -- Zach

On 17 December 2016 at 03:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
Likewise - we know the status quo isn't right, and the proposed change addresses that. In reviewing the patch on the tracker, the one downside I've found is that due to "pthread_key_t" being an opaque type with no defined sentinel, the consuming code in _tracemalloc.c and pystate.c needed to add separate boolean flag variables to track whether or not the key had been created. (The pthread examples at http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.... use pthread_once for a similar effect) I don't see any obvious way around that either, as even using a small struct for native pthread TLS keys would still face the problem of how to initialise the pthread_key_t field. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Dec 19, 2016 at 1:11 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hmm...fair point that it's not pretty. One way around it, albeit requiring more work/complexity, would be to extend this proposal to add a new function analogous to pthread_once--say--PyThread_call_once, and an associated Py_once_flag_t

On Mon, Dec 19, 2016 at 3:45 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
Oops--fat-fingered a 'send' command before I finished. So workaround would be to add a PyThread_call_once function, analogous to pthread_once. Yet another interface one needs to implement for a native thread implementation, but not too hard either. For pthreads there's already an obvious analogue that can be wrapped directly. For other platforms that don't have a direct analogue a (naive) implementation is still fairly simple: All you need in Py_once_flag_t is a boolean flag with an associated mutex, and a sentinel value analogous to PTHREAD_ONCE_INIT. Best, Erik

On 20 December 2016 at 00:53, Erik Bray <erik.m.bray@gmail.com> wrote:
Yeah, I think I'd prefer that - it aligns nicely with the way pthreads are defined, and means we can be more prescriptive about how to use the new API correctly for key declarations (we're currently a bit vague about exactly how to handle that in the current TLS API). With that addition, I think it will be worth turning your initial post here into a PR to the peps repo, though - not to resolve any particular controversy, but rather as an easier to find reference for the design rationale than a mailing list thread or a tracker issue. (I'd also be happy to volunteer as BDFL-Delegate, since I'm already reviewing the patch on the tracker) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Dec 20, 2016 at 9:26 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Okay, thanks. I will work on a PR to the PEPs repo, and update the proposal to add the PyThread_call_once idea, which some prescription for how it should be used. Of course, an updated patch will have to follow as well. This is probably an implementation detail, but ISTM that even with PyThread_call_once, it will be necessary to reset any used once_flags manually in PyOS_AfterFork, essentially for the same reason the autoTLSkey is reset there currently... Erik

2016-12-20 22:30 GMT+09:00 Erik Bray <erik.m.bray@gmail.com>:
Deleting threads key is executed on *_Fini functions, but Py_FinalizeEx function that calls *_Fini functions doesn't terminate CPython interpreter. Furthermore, source comment and document have said description about reinitialization after calling Py_FinalizeEx. [1] [2] That is to say there is an implicit possible that is reinitialization contrary to name "call_once" on a process level. Therefore, if CPython interpreter continues to allow reinitialization, I'd suggest to rename the call_once API to avoid misreading semantics. (for example, safe_init, check_init) Best regards, Masayuki [1] https://hg.python.org/cpython/file/default/Python/pylifecycle.c#l170 [2] https://docs.python.org/dev/c-api/init.html#c.Py_FinalizeEx

On 21 December 2016 at 01:35, Masayuki YAMAMOTO <ma3yuki.8mamo10@gmail.com> wrote:
Ouch, I'd missed that, and I agree it's not a negligible implementation detail - there are definitely applications embedding CPython out there that rely on being able to run multiple Initialize/Finalize cycles in the same process and have everything "just work". It also means using the "PyThread_*" prefix for the initialisation tracking aspect would be misleading, since the life cycle details are: 1. Create the key for the first time if it has never been previously set in the process 2. Destroy and reinit if Py_Finalize gets called 3. Destroy and reinit if a new subprocess is forked It also means we can't use pthread_once even in the pthread TLS implementation, since it doesn't provide those semantics. So I see two main alternatives here. Option 1: Modify the proposed PyThread_tss_create and PyThread_tss_delete APIs to accept a "bool *init_flag" pointer in addition to their current arguments. If *init_flag is true, then PyThread_tss_create is a no-op, otherwise it sets the flag to true after creating the key. If *init_flag is false, then PyThread_tss_delete is a no-op, otherwise it sets the flag to false after deleting the key. Option 2: Similar to option 1, but using a custom type alias, rather than using a C99 bool directly The closest API we have to these semantics at the moment would be PyGILState_Ensure, so the following API naming might work for option 2: Py_ensure_t Py_ENSURE_NEEDS_INIT Py_ENSURE_INITIALIZED Respectively, these would just be aliases for bool, false, and true. And then modify the proposed PyThread_tss_create and PyThread_tss_delete APIs to accept a "Py_ensure_t *init_flag" in addition to their current arguments. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Dec 21, 2016 at 2:10 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That all sounds good--between the two option 2 looks a bit more explicit. Though what about this? Rather than adding another type, the original proposal could be changed slightly so that Py_tss_t *is* partially defined as a struct consisting of a bool, with whatever the native TLS key is. E.g. typedef struct { bool init_flag; #if defined(_POSIX_THREADS) pthreat_key_t key; #elif defined (NT_THREADS) DWORD key; /* etc... */ } Py_tss_t; Then it's just taking Masayuki's original patch, with the global bool variables, and formalizing that by combining the initialized flag with the key, and requiring the semantics you described above for PyThread_tss_create/delete. For Python's purposes it seems like this might be good enough, with the more general purpose pthread_once-like functionality not required. Best, Erik

On Wed, Dec 21, 2016 at 11:01 AM, Erik Bray <erik.m.bray@gmail.com> wrote:
*pthread_key_t* of course, though I wonder if that was a Freudian slip :)
Of course, that's not to say it might not be useful for some other purpose, but then it's outside the scope of this discussion as long as it isn't needed for TLS key initialization.

On 21 December 2016 at 20:01, Erik Bray <erik.m.bray@gmail.com> wrote:
Aye, I also thought of that approach, but talked myself out of it since there's no definable default value for pthread_key_t. However, C99 partial initialisation may deal with that for us (by zeroing the memory without actually assigning a typed value to it), and if it does, I agree it would be better to handle the initialisation flag automatically rather than requiring callers to do it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Dec 21, 2016 at 5:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think I understand what you're saying here... To be clear, let me enumerate the three currently supported cases and how they're affected: 1) CPython's TLS: Defines -1 as an uninitialized key (by fact of the implementation--that the keys are integers starting from zero) 2) pthreads: Does not definite an uninitialized default value for keys, for reasons described at [1] under "Non-Idempotent Data Key Creation". I understand their reasoning, though I can't claim to know specifically what they mean when they say that some implementations would require the mutual-exclusion to be performed on pthread_getspecific() as well. I don't know that it applies here. 3) windows: The return value of TlsAlloc() is a DWORD (unsigned int) and [2] states that its value should be opaque. So in principle we can cover all cases with an opaque struct that contains, as its first member, an is_initialized flag. The tricky part is how to initialize the rest of the struct (containing the underlying implementation-specific key). For 1) and 3) it doesn't matter--it can just be zero. For 2) it's trickier because there's no defined constant value to initialize a pthread_key_t to. Per Nick's suggestion this can be worked around by relying on C99's initialization semantics. Per [3] section 6.7.8, clause 21: """ If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration. """ How objects with static storage are initialized is described in the previous page under clause 10, but in practice it boils down to what you would expect: Everything is initialized to zero, including nested structs and arrays. So as long as we can use this feature of C99 then I think that's the best approach. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.... [2] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).a... [3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

On 29 December 2016 at 22:12, Erik Bray <erik.m.bray@gmail.com> wrote:
That section is a little weird, as they describe two requests (one for a known-NULL default value, the other for implicit synchronisation of key creation to prevent race conditions), and only provide the justification for rejecting one of them (the second one). If I've understood correctly, the situation they're worried about there is that pthread_key_create() has to be called at least once-per-process, but must be called before *any* call to pthread_getspecific or pthread_setspecific for a given key. If you do "implicit init" rather than requiring the use of an explicit mechanism like pthread_once (or our own Py_Initialize and module import locks), then you may take a small performance hit as either *every* thread then has to call pthread_key_create() to ensure the key exists before using it, or else pthread_getspecific() and pthread_setspecific() have to become potentially blocking calls. Neither of those is desirable, so it makes sense to leave that part of the problem to the API client. In our case, we don't want the implicit synchronisation, we just want the known-NULL default value so the "Is it already set?" check can be moved inside the library function.
I checked PEP 7 to see exactly which features we've added to the approved C dialect, and designated initialisers are already on the list: https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html So I believe that would allow the initializer to be declared as something like: #define Py_tss_NEEDS_INIT {.is_initialized = false} Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 30, 2016 at 5:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Right, that is confusing to me as well. I'm guessing the reason for rejecting the first is in part a way to force us to recognize the second issue.
Okay, we're on the same page here then. I just wanted to make sure there wasn't anything else I was missing in Python's case.
Great! One could argue about whether or not the designated initializer syntax also incorporates omitted fields, but it would seem strange to insist that it doesn't. Have a happy new year, Erik

On Fri, Dec 16, 2016 at 11:07 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
You're pretty close to proper PEP format. Like others, I don't have enough knowledge of threading internals to speak to the technical side of it, but this is a well-written proposal and I agree in principle with tightening this up. The need for a PEP basically comes down to whether or not it's going to be controversial; a PEP allows you to hash out the details and then present a coherent proposal to Guido (or his delegate) for final approval. ChrisA

Hi, I'm patch author, I don't need to say anything for Erik's draft. I feel awesome that it has been clearly to explain, especially for history of API and against PEP. Thanks for great job, Erik! Cheers, Masayuki

Erik Bray writes:
Thank you for the analysis! Question:
Further, the old PyThread_*_key* functions will be marked as deprecated.
Of course, but:
Typo "pythead_t" in last line. I don't understand this. I assume that there are no such platforms supported at present. I would think that when such a platform becomes supported, code supporting "key" functions becomes unsupportable without #ifdefs on that platform, at least directly. So you should either (1) raise UnimplementedError, or (2) provide the API as a wrapper over the new API by making the integer keys indexes into a table of TSS'es, or some such device. I don't understand how (3) "make it a no-op" can be implemented for PyThread_create_key -- return 0 or -1? That would only work if there's a failure return status like 0 or -1, and it seems really dangerous to me since in general a lot of code doesn't check status even though it should. Even for code checking the status, the error message will be suboptimal ("creation failed" vs. "unimplemented"). I gather from references to casting pthread_key_t to unsigned int and back that there's probably code that does this in ways making (2) too dangerous to support. If true, perhaps that should be mentioned here.

On Sat, Dec 17, 2016 at 8:21 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
And thank *you* for the feedback!
Thanks, yes, that was suppose to be pthread_key_t of course. I think I had a few other typos too.
Masayuki already explained this downthread I think, but I could have probably made that section more precise. The point was that PyThread_create_key should immediately return -1 in this case. This is just a subtle difference over the current situation, which is that PyThread_create_key succeeds, but the key is corrupted by being cast to an int, so that later calls to PyThread_set_key_value and the like fail unexpectedly. The point is that PyThread_create_key (and we're only talking about the pthread implementation thereof, to be clear) must fail immediately if it can't work correctly. #ifdefs on the platform would not be necessary--instead, Masayuki's patch adds a feature check in configure.ac for sizeof(int) == sizeof(pthread_key_t). It should be noted that even this check is not 100% perfect, as on Linux pthread_key_t is an unsigned int, and so technically can cause Python's signed int key to overflow, but there's already an explicit check for that (which would be kept), and it's also a very unlikely scenario.
It's not necessarily too dangerous, so much as not worth the trouble, IMO. Simpler to just provide, and immediately use the new API and make the old one deprecated and explicitly not supported on those platforms where it can't work. Thanks, Erik

Erik Bray writes:
Thank you for the analysis.
Typo "pythead_t" in last line. I don't understand this. I assume that there are no such platforms supported at present. I would think that when such a platform becomes supported, code supporting "key" functions becomes unsupportable without #ifdefs on that platform, at least directly. So you should either (1) raise UnimplementedError, or (2) provide the API as a wrapper over the new API by making the integer keys indexes into a table of TSS'es, or some such device. I don't understand how (3) "make it a no-op" can be implemented for PyThread_create_key -- return 0 or -1? That would only work if there's a failure return status like 0 or -1, and it seems really dangerous to me since in general a lot of code doesn't check status even though it should. Even for code checking the status, the error message will be suboptimal ("creation failed" vs. "unimplemented"). I gather from references to casting pthread_key_t to unsigned int and back that there's probably code that does this in ways making (2) too dangerous to support. If true, perhaps that should be mentioned here.
participants (7)
-
Antoine Pitrou
-
Chris Angelico
-
Erik Bray
-
Masayuki YAMAMOTO
-
Nick Coghlan
-
Stephen J. Turnbull
-
Zachary Ware