baby steps for free-threading

A couple months ago, I exchanged a few emails with Guido about doing the free-threading work. In particular, for the 1.6 release. At that point (and now), I said that I wouldn't be starting on it until this summer, which means it would miss the 1.6 release. However, there are some items that could go into 1.6 *today* that would make it easier down the road to add free-threading to Python. I said that I'd post those in the hope that somebody might want to look at developing the necessary patches. It fell off my plate, so I'm getting back to that now... Python needs a number of basic things to support free threading. None of these should impact its performance or reliability. For the most part, they just provide a platform for the later addition. 1) Create a portable abstraction for using the platform's per-thread state mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. This mechanism will be used to store PyThreadState structure pointers, rather than _PyThreadState_Current. The latter variable must go away. Rationale: two threads will be operating simultaneously. An inherent conflict arises if _PyThreadState_Current is used. The TLS-like mechanism is used by the threads to look up "their" state. There will be a ripple effect on PyThreadState_Swap(); dunno offhand what. It may become empty. 2) Python needs a lightweight, short-duration, internally-used critical section type. The current lock type is used at the Python level and internally. For internal operations, it is rather heavyweight, has unnecessary semantics, and is slower than a plain crit section. Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's mutex type. A spinlock mechanism would be coolness. Rationale: Python needs critical sections to protect data from being trashed by multiple, simultaneous access. These crit sections need to be as fast as possible since they'll execute at all key points where data is manipulated. 3) Python needs an atomic increment/decrement (internal) operation. Rationale: these are used in INCREF/DECREF to correctly increment or decrement the refcount in the face of multiple threads trying to do this. Win32: InterlockedIncrement/Decrement. pthreads would use the lightweight crit section above (on every INC/DEC!!). Some other platforms may have specific capabilities to keep this fast. Note that platforms (outside of their threading libraries) may have functions to do this. 4) Python's configuration system needs to be updated to include a --with-free-thread option since this will not be enabled by default. Related changes to acconfig.h would be needed. Compiling in the above pieces based on the flag would be nice (although Python could switch to the crit section in some cases where it uses the heavy lock today) Rationale: duh 5) An analysis of Python's globals needs to be performed. Any global that can safely be made "const" should. If a global is write-once (such as classobject.c::getattrstr), then these are marginally okay (there is a race condition, with an acceptable outcome, but a mem leak occurs). Personally, I would prefer a general mechanism in Python for creating "constants" which can be tracked by the runtime and freed. I would also like to see a generalized "object pool" mechanism be built and used for tuples, ints, floats, frames, etc. Rationale: any globals which are mutable must be made thread-safe. The fewer non-const globals to examine, the fewer to analyze for race conditions and thread-safety requirements. Note: making some globals "const" has a ripple effect through Python. This is sometimes known as "const poisoning". Guido has stated an acceptance to adding "const" throughout the interpreter, but would prefer a complete (rather than ripple-based, partial) overhaul. I think that is all for now. Achieving these five steps within the 1.6 timeframe means that the free-threading patches will be *much* smaller. It also creates much more visibility and testing for these sections. Post 1.6, a patch set to add critical sections to lists and dicts would be built. In addition, a new analysis would be done to examine the globals that are available along with possible race conditions in other mutable types and structures. Not all structures will be made thread-safe; for example, frame objects are used by a single thread at a time (I'm sure somebody could find a way to have multiple threads use or look at them, but that person can take a leap, too :-) Depending upon Guido's desire, the various schedules, and how well the development goes, Python 1.6.1 could incorporate the free-threading option in the base distribution. Cheers, -g -- Greg Stein, http://www.lyra.org/

A couple months ago, I exchanged a few emails with Guido about doing the free-threading work. In particular, for the 1.6 release. At that point (and now), I said that I wouldn't be starting on it until this summer, which means it would miss the 1.6 release. However, there are some items that could go into 1.6 *today* that would make it easier down the road to add free-threading to Python. I said that I'd post those in the hope that somebody might want to look at developing the necessary patches. It fell off my plate, so I'm getting back to that now...
Python needs a number of basic things to support free threading. None of these should impact its performance or reliability. For the most part, they just provide a platform for the later addition.
I agree with the general design sketched below.
1) Create a portable abstraction for using the platform's per-thread state mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*.
There are at least 7 other platform specific thread implementations -- probably an 8th for the Mac. These all need to support this. (One solution would be to have a portable implementation that uses the thread-ID to index an array.)
This mechanism will be used to store PyThreadState structure pointers, rather than _PyThreadState_Current. The latter variable must go away.
Rationale: two threads will be operating simultaneously. An inherent conflict arises if _PyThreadState_Current is used. The TLS-like mechanism is used by the threads to look up "their" state.
There will be a ripple effect on PyThreadState_Swap(); dunno offhand what. It may become empty.
Cool.
2) Python needs a lightweight, short-duration, internally-used critical section type. The current lock type is used at the Python level and internally. For internal operations, it is rather heavyweight, has unnecessary semantics, and is slower than a plain crit section.
Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's mutex type. A spinlock mechanism would be coolness.
Rationale: Python needs critical sections to protect data from being trashed by multiple, simultaneous access. These crit sections need to be as fast as possible since they'll execute at all key points where data is manipulated.
Agreed.
3) Python needs an atomic increment/decrement (internal) operation.
Rationale: these are used in INCREF/DECREF to correctly increment or decrement the refcount in the face of multiple threads trying to do this.
Win32: InterlockedIncrement/Decrement. pthreads would use the lightweight crit section above (on every INC/DEC!!). Some other platforms may have specific capabilities to keep this fast. Note that platforms (outside of their threading libraries) may have functions to do this.
I'm worried here that since INCREF/DECREF are used so much this will slow down significantly, especially on platforms that don't have safe hardware instructions for this. So it should only be enabled when free threading is turned on.
4) Python's configuration system needs to be updated to include a --with-free-thread option since this will not be enabled by default. Related changes to acconfig.h would be needed. Compiling in the above pieces based on the flag would be nice (although Python could switch to the crit section in some cases where it uses the heavy lock today)
Rationale: duh
Maybe there should be more fine-grained choices? As you say, some stuff could be used without this flag. But in any case this is trivial to add.
5) An analysis of Python's globals needs to be performed. Any global that can safely be made "const" should. If a global is write-once (such as classobject.c::getattrstr), then these are marginally okay (there is a race condition, with an acceptable outcome, but a mem leak occurs). Personally, I would prefer a general mechanism in Python for creating "constants" which can be tracked by the runtime and freed.
They are almost all string constants, right? How about a macro Py_CONSTSTROBJ("value", variable)?
I would also like to see a generalized "object pool" mechanism be built and used for tuples, ints, floats, frames, etc.
Careful though -- generalizing this will slow it down. (Here I find myself almost wishing for C++ templates :-)
Rationale: any globals which are mutable must be made thread-safe. The fewer non-const globals to examine, the fewer to analyze for race conditions and thread-safety requirements.
Note: making some globals "const" has a ripple effect through Python. This is sometimes known as "const poisoning". Guido has stated an acceptance to adding "const" throughout the interpreter, but would prefer a complete (rather than ripple-based, partial) overhaul.
Actually, it's okay to do this on an "as-neeed" basis. I'm also in favor of changing all the K&R code to ANSI, and getting rid of Py_PROTO and friends. Cleaner code!
I think that is all for now. Achieving these five steps within the 1.6 timeframe means that the free-threading patches will be *much* smaller. It also creates much more visibility and testing for these sections.
Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough testing of some of these changes, the extensive nature of some of the changes, and my other obligations during those 6 weeks, I don't see how it can be done for 1.6. I would prefer to do an accellerated 1.7 or 1.6.1 release that incorporates all this. (It could be called 1.6.1 only if it'nearly identical to 1.6 for the Python user and not too different for the extension writer.)
Post 1.6, a patch set to add critical sections to lists and dicts would be built. In addition, a new analysis would be done to examine the globals that are available along with possible race conditions in other mutable types and structures. Not all structures will be made thread-safe; for example, frame objects are used by a single thread at a time (I'm sure somebody could find a way to have multiple threads use or look at them, but that person can take a leap, too :-)
It is unacceptable to have thread-unsafe structures that can be accessed in a thread-unsafe way using pure Python code only.
Depending upon Guido's desire, the various schedules, and how well the development goes, Python 1.6.1 could incorporate the free-threading option in the base distribution.
Indeed. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 18 Apr 2000, Guido van Rossum wrote:
...
1) Create a portable abstraction for using the platform's per-thread state mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*.
There are at least 7 other platform specific thread implementations -- probably an 8th for the Mac. These all need to support this. (One solution would be to have a portable implementation that uses the thread-ID to index an array.)
Yes. As the platforms "come up to speed", they can replace the fallback, portable implementation. "Users" of the TLS mechanism would allocate indices into the per-thread arrays. Another alternative is to only manage a mapping of thread-ID to ThreadState structures. The TLS code can then get the ThreadState and access the per-thread dict. Of course, the initial impetus is to solve the lookup of the ThreadState rather than a general TLS mechanism :-) Hmm. I'd say that we stick with defining a Python TLS API (in terms of the platform when possible). The fallback code would be the per-thread arrays design. "thread dict" would still exist, but is deprecated.
...
3) Python needs an atomic increment/decrement (internal) operation.
Rationale: these are used in INCREF/DECREF to correctly increment or decrement the refcount in the face of multiple threads trying to do this.
Win32: InterlockedIncrement/Decrement. pthreads would use the lightweight crit section above (on every INC/DEC!!). Some other platforms may have specific capabilities to keep this fast. Note that platforms (outside of their threading libraries) may have functions to do this.
I'm worried here that since INCREF/DECREF are used so much this will slow down significantly, especially on platforms that don't have safe hardware instructions for this.
This definitely slows Python down. If an object is known to be visible to only one thread, then you can avoid the atomic inc/dec. But that leads to madness :-)
So it should only be enabled when free threading is turned on.
Absolutely. No question. Note to readers: the different definitions of INCREF/DECREF has an impact on mixing modules in the same way Py_TRACE_REFS does.
4) Python's configuration system needs to be updated to include a --with-free-thread option since this will not be enabled by default. Related changes to acconfig.h would be needed. Compiling in the above pieces based on the flag would be nice (although Python could switch to the crit section in some cases where it uses the heavy lock today)
Rationale: duh
Maybe there should be more fine-grained choices? As you say, some stuff could be used without this flag. But in any case this is trivial to add.
Sure. For example, something like the Python TLS API could be keyed off --with-threads. Replacing _PyThreadState_Current with a TLS-based mechanism should be keyed on free threads. The "critical section" stuff could be keyed on threading -- they would be nice for Python to use internally for its standard threading operation.
5) An analysis of Python's globals needs to be performed. Any global that can safely be made "const" should. If a global is write-once (such as classobject.c::getattrstr), then these are marginally okay (there is a race condition, with an acceptable outcome, but a mem leak occurs). Personally, I would prefer a general mechanism in Python for creating "constants" which can be tracked by the runtime and freed.
They are almost all string constants, right?
Yes, I believe so. (Analysis needed)
How about a macro Py_CONSTSTROBJ("value", variable)?
Sure. Note that the variable name can usually be constructed from the value.
I would also like to see a generalized "object pool" mechanism be built and used for tuples, ints, floats, frames, etc.
Careful though -- generalizing this will slow it down. (Here I find myself almost wishing for C++ templates :-)
:-) This is a desire, but not a requirement. Same with the write-once stuff. A general pool mechanism would reduce code duplication for lock management, and possibly clarify some operation.
...
Note: making some globals "const" has a ripple effect through Python. This is sometimes known as "const poisoning". Guido has stated an acceptance to adding "const" throughout the interpreter, but would prefer a complete (rather than ripple-based, partial) overhaul.
Actually, it's okay to do this on an "as-neeed" basis. I'm also in favor of changing all the K&R code to ANSI, and getting rid of Py_PROTO and friends. Cleaner code!
Yay! :-)
I think that is all for now. Achieving these five steps within the 1.6 timeframe means that the free-threading patches will be *much* smaller. It also creates much more visibility and testing for these sections.
Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough testing of some of these changes, the extensive nature of some of the
[ aside: most of these changes are specified with the intent of reducing the impact on Python. most are additional behavior rather than changing behavior. ]
changes, and my other obligations during those 6 weeks, I don't see how it can be done for 1.6. I would prefer to do an accellerated 1.7 or 1.6.1 release that incorporates all this. (It could be called 1.6.1 only if it'nearly identical to 1.6 for the Python user and not too different for the extension writer.)
Ah. That would be nice. It also provides some focus on what would need to occur for the extension writer: *) Python TLS API *) critical sections *) WITH_FREE_THREAD from the configure process The INCREF/DECREF and const-ness is hidden from the extension writer. Adding integrity locks to list/dict/etc is also hidden.
Post 1.6, a patch set to add critical sections to lists and dicts would be built. In addition, a new analysis would be done to examine the globals that are available along with possible race conditions in other mutable types and structures. Not all structures will be made thread-safe; for example, frame objects are used by a single thread at a time (I'm sure somebody could find a way to have multiple threads use or look at them, but that person can take a leap, too :-)
It is unacceptable to have thread-unsafe structures that can be accessed in a thread-unsafe way using pure Python code only.
Hmm. I guess that I can grab a frame object reference via a traceback object. The frame and traceback objects can then be shared between threads. Now the question arises: if the original thread resumes execution and starts modifying these objects (inside the interpreter since both are readonly to Python), then the passed-to thread might see invalid data. I'm not sure whether these objects have multi-field integrity constraints. Conversely: if they don't, then changing a single field will simply create a race condition with the passed-to thread. Oh, and assuming that we remove a value from the structure before DECREF'ing it. By your "pure Python" statement, I'm presuming that you aren't worried about PyTuple_SET_ITEM() and similar. However, do you really want to start locking up the frame and traceback objects? (and code objects and ...) Cheers, -g -- Greg Stein, http://www.lyra.org/
participants (2)
-
Greg Stein
-
Guido van Rossum