[Python-Dev] baby steps for free-threading

Greg Stein gstein@lyra.org
Tue, 18 Apr 2000 23:46:56 -0700 (PDT)


On Tue, 18 Apr 2000, Guido van Rossum wrote:
>...
> > 1) Create a portable abstraction for using the platform's per-thread state
> >    mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*.
> 
> There are at least 7 other platform specific thread implementations --
> probably an 8th for the Mac.  These all need to support this.  (One
> solution would be to have a portable implementation that uses the
> thread-ID to index an array.)

Yes. As the platforms "come up to speed", they can replace the fallback,
portable implementation. "Users" of the TLS mechanism would allocate
indices into the per-thread arrays.

Another alternative is to only manage a mapping of thread-ID to
ThreadState structures. The TLS code can then get the ThreadState and
access the per-thread dict. Of course, the initial impetus is to solve the
lookup of the ThreadState rather than a general TLS mechanism :-)

Hmm. I'd say that we stick with defining a Python TLS API (in terms of the
platform when possible). The fallback code would be the per-thread arrays
design. "thread dict" would still exist, but is deprecated.

>...
> > 3) Python needs an atomic increment/decrement (internal) operation.
> > 
> >    Rationale: these are used in INCREF/DECREF to correctly increment or
> >    decrement the refcount in the face of multiple threads trying to do
> >    this.
> > 
> >    Win32: InterlockedIncrement/Decrement. pthreads would use the
> >    lightweight crit section above (on every INC/DEC!!). Some other
> >    platforms may have specific capabilities to keep this fast. Note that
> >    platforms (outside of their threading libraries) may have functions to
> >    do this.
> 
> I'm worried here that since INCREF/DECREF are used so much this will
> slow down significantly, especially on platforms that don't have safe
> hardware instructions for this.

This definitely slows Python down. If an object is known to be visible to
only one thread, then you can avoid the atomic inc/dec. But that leads to
madness :-)

> So it should only be enabled when free threading is turned on.

Absolutely. No question.

Note to readers: the different definitions of INCREF/DECREF has an impact
on mixing modules in the same way Py_TRACE_REFS does.

> > 4) Python's configuration system needs to be updated to include a
> >    --with-free-thread option since this will not be enabled by default.
> >    Related changes to acconfig.h would be needed. Compiling in the above
> >    pieces based on the flag would be nice (although Python could switch to
> >    the crit section in some cases where it uses the heavy lock today)
> > 
> >    Rationale: duh
> 
> Maybe there should be more fine-grained choices?  As you say, some
> stuff could be used without this flag.  But in any case this is
> trivial to add.

Sure.

For example, something like the Python TLS API could be keyed off
--with-threads. Replacing _PyThreadState_Current with a TLS-based
mechanism should be keyed on free threads.

The "critical section" stuff could be keyed on threading -- they would be
nice for Python to use internally for its standard threading operation.

> > 5) An analysis of Python's globals needs to be performed. Any global that
> >    can safely be made "const" should. If a global is write-once (such as
> >    classobject.c::getattrstr), then these are marginally okay (there is a 
> >    race condition, with an acceptable outcome, but a mem leak occurs).
> >    Personally, I would prefer a general mechanism in Python for creating
> >    "constants" which can be tracked by the runtime and freed.
> 
> They are almost all string constants, right?

Yes, I believe so. (Analysis needed)

> How about a macro Py_CONSTSTROBJ("value", variable)?

Sure. Note that the variable name can usually be constructed from the
value.

> >    I would also like to see a generalized "object pool" mechanism be built
> >    and used for tuples, ints, floats, frames, etc.
> 
> Careful though -- generalizing this will slow it down.  (Here I find
> myself almost wishing for C++ templates :-)

:-)

This is a desire, but not a requirement. Same with the write-once stuff. A
general pool mechanism would reduce code duplication for lock management,
and possibly clarify some operation.

>...
> >    Note: making some globals "const" has a ripple effect through Python.
> >    This is sometimes known as "const poisoning". Guido has stated an
> >    acceptance to adding "const" throughout the interpreter, but would
> >    prefer a complete (rather than ripple-based, partial) overhaul.
> 
> Actually, it's okay to do this on an "as-neeed" basis.  I'm also in
> favor of changing all the K&R code to ANSI, and getting rid of
> Py_PROTO and friends.  Cleaner code!

Yay! :-)

> > I think that is all for now. Achieving these five steps within the 1.6
> > timeframe means that the free-threading patches will be *much* smaller. It
> > also creates much more visibility and testing for these sections.
> 
> Alas.  Given the timeframe for 1.6 (6 weeks!), the need for thorough
> testing of some of these changes, the extensive nature of some of the

[ aside: most of these changes are specified with the intent of reducing
  the impact on Python. most are additional behavior rather than changing
  behavior. ]

> changes, and my other obligations during those 6 weeks, I don't see
> how it can be done for 1.6.  I would prefer to do an accellerated 1.7
> or 1.6.1 release that incorporates all this.  (It could be called
> 1.6.1 only if it'nearly identical to 1.6 for the Python user and not
> too different for the extension writer.)

Ah. That would be nice.

It also provides some focus on what would need to occur for the extension
writer:

*) Python TLS API
*) critical sections
*) WITH_FREE_THREAD from the configure process

The INCREF/DECREF and const-ness is hidden from the extension writer.
Adding integrity locks to list/dict/etc is also hidden.

> > Post 1.6, a patch set to add critical sections to lists and dicts would be
> > built. In addition, a new analysis would be done to examine the globals
> > that are available along with possible race conditions in other mutable
> > types and structures. Not all structures will be made thread-safe; for
> > example, frame objects are used by a single thread at a time (I'm sure
> > somebody could find a way to have multiple threads use or look at them,
> > but that person can take a leap, too :-)
> 
> It is unacceptable to have thread-unsafe structures that can be
> accessed in a thread-unsafe way using pure Python code only.

Hmm. I guess that I can grab a frame object reference via a traceback
object. The frame and traceback objects can then be shared between
threads. Now the question arises: if the original thread resumes execution
and starts modifying these objects (inside the interpreter since both are
readonly to Python), then the passed-to thread might see invalid data. I'm
not sure whether these objects have multi-field integrity constraints.
Conversely: if they don't, then changing a single field will simply
create a race condition with the passed-to thread. Oh, and assuming that
we remove a value from the structure before DECREF'ing it.

By your "pure Python" statement, I'm presuming that you aren't worried
about PyTuple_SET_ITEM() and similar. However, do you really want to start
locking up the frame and traceback objects? (and code objects and ...)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/