[Python-Dev] baby steps for free-threading
Greg Stein
gstein@lyra.org
Tue, 18 Apr 2000 23:46:56 -0700 (PDT)
On Tue, 18 Apr 2000, Guido van Rossum wrote:
>...
> > 1) Create a portable abstraction for using the platform's per-thread state
> > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*.
>
> There are at least 7 other platform specific thread implementations --
> probably an 8th for the Mac. These all need to support this. (One
> solution would be to have a portable implementation that uses the
> thread-ID to index an array.)
Yes. As the platforms "come up to speed", they can replace the fallback,
portable implementation. "Users" of the TLS mechanism would allocate
indices into the per-thread arrays.
Another alternative is to only manage a mapping of thread-ID to
ThreadState structures. The TLS code can then get the ThreadState and
access the per-thread dict. Of course, the initial impetus is to solve the
lookup of the ThreadState rather than a general TLS mechanism :-)
Hmm. I'd say that we stick with defining a Python TLS API (in terms of the
platform when possible). The fallback code would be the per-thread arrays
design. "thread dict" would still exist, but is deprecated.
>...
> > 3) Python needs an atomic increment/decrement (internal) operation.
> >
> > Rationale: these are used in INCREF/DECREF to correctly increment or
> > decrement the refcount in the face of multiple threads trying to do
> > this.
> >
> > Win32: InterlockedIncrement/Decrement. pthreads would use the
> > lightweight crit section above (on every INC/DEC!!). Some other
> > platforms may have specific capabilities to keep this fast. Note that
> > platforms (outside of their threading libraries) may have functions to
> > do this.
>
> I'm worried here that since INCREF/DECREF are used so much this will
> slow down significantly, especially on platforms that don't have safe
> hardware instructions for this.
This definitely slows Python down. If an object is known to be visible to
only one thread, then you can avoid the atomic inc/dec. But that leads to
madness :-)
> So it should only be enabled when free threading is turned on.
Absolutely. No question.
Note to readers: the different definitions of INCREF/DECREF has an impact
on mixing modules in the same way Py_TRACE_REFS does.
> > 4) Python's configuration system needs to be updated to include a
> > --with-free-thread option since this will not be enabled by default.
> > Related changes to acconfig.h would be needed. Compiling in the above
> > pieces based on the flag would be nice (although Python could switch to
> > the crit section in some cases where it uses the heavy lock today)
> >
> > Rationale: duh
>
> Maybe there should be more fine-grained choices? As you say, some
> stuff could be used without this flag. But in any case this is
> trivial to add.
Sure.
For example, something like the Python TLS API could be keyed off
--with-threads. Replacing _PyThreadState_Current with a TLS-based
mechanism should be keyed on free threads.
The "critical section" stuff could be keyed on threading -- they would be
nice for Python to use internally for its standard threading operation.
> > 5) An analysis of Python's globals needs to be performed. Any global that
> > can safely be made "const" should. If a global is write-once (such as
> > classobject.c::getattrstr), then these are marginally okay (there is a
> > race condition, with an acceptable outcome, but a mem leak occurs).
> > Personally, I would prefer a general mechanism in Python for creating
> > "constants" which can be tracked by the runtime and freed.
>
> They are almost all string constants, right?
Yes, I believe so. (Analysis needed)
> How about a macro Py_CONSTSTROBJ("value", variable)?
Sure. Note that the variable name can usually be constructed from the
value.
> > I would also like to see a generalized "object pool" mechanism be built
> > and used for tuples, ints, floats, frames, etc.
>
> Careful though -- generalizing this will slow it down. (Here I find
> myself almost wishing for C++ templates :-)
:-)
This is a desire, but not a requirement. Same with the write-once stuff. A
general pool mechanism would reduce code duplication for lock management,
and possibly clarify some operation.
>...
> > Note: making some globals "const" has a ripple effect through Python.
> > This is sometimes known as "const poisoning". Guido has stated an
> > acceptance to adding "const" throughout the interpreter, but would
> > prefer a complete (rather than ripple-based, partial) overhaul.
>
> Actually, it's okay to do this on an "as-neeed" basis. I'm also in
> favor of changing all the K&R code to ANSI, and getting rid of
> Py_PROTO and friends. Cleaner code!
Yay! :-)
> > I think that is all for now. Achieving these five steps within the 1.6
> > timeframe means that the free-threading patches will be *much* smaller. It
> > also creates much more visibility and testing for these sections.
>
> Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough
> testing of some of these changes, the extensive nature of some of the
[ aside: most of these changes are specified with the intent of reducing
the impact on Python. most are additional behavior rather than changing
behavior. ]
> changes, and my other obligations during those 6 weeks, I don't see
> how it can be done for 1.6. I would prefer to do an accellerated 1.7
> or 1.6.1 release that incorporates all this. (It could be called
> 1.6.1 only if it'nearly identical to 1.6 for the Python user and not
> too different for the extension writer.)
Ah. That would be nice.
It also provides some focus on what would need to occur for the extension
writer:
*) Python TLS API
*) critical sections
*) WITH_FREE_THREAD from the configure process
The INCREF/DECREF and const-ness is hidden from the extension writer.
Adding integrity locks to list/dict/etc is also hidden.
> > Post 1.6, a patch set to add critical sections to lists and dicts would be
> > built. In addition, a new analysis would be done to examine the globals
> > that are available along with possible race conditions in other mutable
> > types and structures. Not all structures will be made thread-safe; for
> > example, frame objects are used by a single thread at a time (I'm sure
> > somebody could find a way to have multiple threads use or look at them,
> > but that person can take a leap, too :-)
>
> It is unacceptable to have thread-unsafe structures that can be
> accessed in a thread-unsafe way using pure Python code only.
Hmm. I guess that I can grab a frame object reference via a traceback
object. The frame and traceback objects can then be shared between
threads. Now the question arises: if the original thread resumes execution
and starts modifying these objects (inside the interpreter since both are
readonly to Python), then the passed-to thread might see invalid data. I'm
not sure whether these objects have multi-field integrity constraints.
Conversely: if they don't, then changing a single field will simply
create a race condition with the passed-to thread. Oh, and assuming that
we remove a value from the structure before DECREF'ing it.
By your "pure Python" statement, I'm presuming that you aren't worried
about PyTuple_SET_ITEM() and similar. However, do you really want to start
locking up the frame and traceback objects? (and code objects and ...)
Cheers,
-g
--
Greg Stein, http://www.lyra.org/