[Python-Dev] Pre-PEP: Task-local variables
Phillip J. Eby
pje at telecommunity.com
Thu Oct 20 03:40:35 CEST 2005
This is still rather rough, but I figured it's easier to let everybody fill
in the remaining gaps by arguments than it is for me to pick a position I
like and try to convince everybody else that it's right. :) Your feedback
is requested and welcome.
Title: Task-local Variables
Author: Phillip J. Eby <pje at telecommunity.com>
Type: Standards Track
Many Python modules provide some kind of global or thread-local state,
which is relatively easy to implement. With the acceptance of PEP
342, however, co-routines will become more common, and it will be
desirable in many cases to treat each as its own logical thread of
execution. So, many kinds of state that might now be kept as a
thread-specific variable (such as the "current transaction" in ZODB or
the "current database connection" in SQLObject) will not work with
This PEP proposes a simple mechanism akin to thread-local variables,
but which will make it easy and efficient for co-routine schedulers to
switch state between tasks. The mechanism is proposed for the standard
library because its usefulness is dependent on its adoption by
standard library modules, such as the ``decimal`` module. The proposed
features can be implemented as pure Python code, and as such are
suitable for use by other Python implementations (including older
versions of Python, if desired).
PEP 343's new "with" statement makes it very attractive to temporarily
alter some aspect of system state, and then restore it, using a
context manager. Many of PEP 343's examples are of this nature,
whether they are temporarily redirecting ``sys.stdout``, or
temporarily altering decimal precision.
But when this attractive feature is combined with PEP 342-style
co-routines, a new challenge emerges. Consider this code, which may
misbehave if run as a co-routine::
with opening(filename, "w") as f:
print "Hello world"
print "Goodbye world"
Problems can arise from this code in two ways. First, the redirection
of output "leaks out" to other coroutines during the pause. Second,
when this coroutine is finished, it resets stdout to whatever it was
at the beginning of the coroutine, regardless of what another
co-routine might have been using.
Similar issues can be demonstrated using the decimal context,
transactions, database connections, etc., which are all likely to be
popular contexts for the "with" statement. However, if these new
context managers are written to use global or thread-local state,
coroutines will be locked out of the market, so to speak.
Therefore, this PEP proposes to provide and promote a standard way of
managing per-execution-context state, such that coroutine schedulers
can keep each coroutine's state distinct. If this mechanism is then
used by library modules (such as ``decimal``) to maintain their
current state, then they will be transparently compatible with
co-routines as well as threaded and threadless code.
(Note that for Python 2.x versions, backward compatibility requires
that we continue to allow direct reassignment to e.g. ``sys.stdout``.
So, it will still of course be possible to write code that will
interoperate poorly with co-routines. But for Python 3.x it seems
worth considering making some of the ``sys`` module's contents into
task-local variables rather than assignment targets.)
This PEP proposes to offer a standard library module called
``context``, with the following core contents:
A class that allows creation of a context variable (see below).
Returns a snapshot of the current execution context.
Set the current context to `ctx`, returning a snapshot of the
The basic idea here is that a co-routine scheduler can switch between
tasks by doing something like::
last_coroutine.state = context.swap(next_coroutine.state)
Or perhaps more like::
# ... execute coroutine iteration
last_coroutine.state = context.snapshot()
# ... figure out what routine to run next
Each ``context.Variable`` stores and retrieves its state using the
current execution context, which is thread-specific. (Thus, each
thread may execute any number of concurrent tasks, although most
practical systems today have only one thread that executes coroutines,
the other threads being reserved for operations that would otherwise
block co-routine execution. Nonetheless, such other threads will often
still require context variables of their own.)
Context Variable Objects
A context variable object provides the following methods:
Return the value of the variable in the current execution context,
or `default` if not set.
Set the value of the variable for the current execution context.
Delete the value of the variable for the current execution context.
If called with an argument, return a context manager that sets the
variable to the specified value, then restores the old value upon
``__exit__``. If called without an argument, return the value of
the variable for the current execution context, or raise an error
if no value is set. Thus::
would be roughly equivalent to::
old = some_variable()
The simplest possible implementation is for ``Variable`` objects to
use themselves as unique keys into an execution context dictionary.
The context dictionary would be stored in another dictionary, keyed by
``get_thread_ident()``. This approach would work with almost any
version or implementation of Python.
For efficiency's sake, however, CPython could simply store the
execution context dictionary in its "thread state" structure, creating
an empty dictionary at thread initialization time. This would make it
somewhat easier to offer a C API for access to context variables,
especially where efficiency of access is desirable. But the proposal
does not depend on this.
In the PEP author's experiments, a simple copy-on-write optimization
to the the ``set()`` and ``unset()`` methods allows for high
performance task switching. By placing a "frozen" flag in the context
dictionary when a snapshot is taken, and then checking for the flag
before making changes, a single snapshot can be shared by multiple
callers, and thus a ``swap()`` operation is little more than two
dictionary writes and a read. This leads to higher performance in the
typical case, because context variables are more likely to set in
outer loops, but task switches are more likely to occur in inner
loops. A copy-on-write approach thus prevents copying from occurring
during most task switches.
The core of this proposal is extremely minimalist, as it should be
possible to do almost anything desired using combinations of
``Variable`` objects or by simply using variables whose values are
mutable objects. There are, however, a variety of options for
The ``context`` module could perhaps be the home of the PEP 343
``contextmanager`` decorator, effectively renamed to
``context.manager``. This could be a natural fit, in that it would
remind the creators of new context managers that they should
consider tracking any associated state in a ``context.Variable``.
Sometimes it's useful to have an object that looks like a module
global (e.g. ``sys.stdout``) but which actually delegates its
behavior to a context-specific instance. Thus, you could have one
``sys.stdout``, but its actual output would be directed based on
the current execution context. The simplest form of such a proxy
class might look something like::
def __init__(self, initial_value):
self.var = context.Variable()
def __getattribute__(self, attr):
var = object.__getattribute__(self,'var')
return getattr(var, attr)
sys.stdout = Proxy(sys.stdout) # make sys.stdout selectable
with sys.stdout(somefile): # temporary redirect in current context
The main open issues in implementing this sort of proxy are in the
precise set of special methods (e.g. ``__getitem__``,
``__setattr__``, etc.) that should be supported, and what API
should be supplied for changing the value, setting a default value
for new threads, etc.
Currently, this PEP does not specify an API for accessing and
modifying the current execution context, nor a C API for such
access. It currently assumes that ``snapshot()``, ``swap()`` and
``Variable`` are the only public means of accessing context
information. It may be desirable to offer finer-grained APIs for
use by more advanced uses (such as creating an API for management
of proxies). And it may be desirable to have a C API for use by
Python extensions that wish convenient access to context
Different libraries have different uses for maintaining a "current"
state, be it global or local to a specific thread or task. There is
currently no way for task-management code to find and switch all of
these "current" states. And even if it could, task switching
performance would degrade linearly as new libraries were added.
One possible alternative approach to this proposal, would be for
explicit task objects to exist, and to provide a way to give them
identities, so that libraries could instead store their own state
as a property of the task, rather than storing their state in a
task-specific mapping. This offers similar potential performance
to a copy-on-write strategy, but would use more memory than this
proposal when only one task is involved. (Because each variable
would have a dictionary mapping from task to the variable's value, but
in this proposal there is simply a single dictionary for the task.)
Some languages offer "dynamically scoped" variables that are somewhat
similar in behavior to the context variables proposed by this PEP.
The principal differences are that:
1. Context variables are objects used to obtain or save a value,
rather than being a syntactic construct of the language.
2. PEP 343 allows for *controlled* manipulation of context variables,
helping to prevent "duelling libraries" from changing state on each
other. Also, a library can potentially ``snapshot()`` a desired
state at startup, and use ``swap()`` to restore that state on
re-entry. (And could even define a simple decorator to wrap its
entry points to ensure this.)
3. The PEP author is not aware of any language that explicitly offers
coroutine-scoped variables, but presumes that they can be modelled
with monads or continuations in functional languages like Haskell.
(And I only mention this to forestall the otherwise-inevitable
response from fans of such techniques, pointing out that it's
The author has prototyped an implementation with somewhat fancier
features than shown here, but prefers not to publish it until the
basic features and choices of optional functionality have been
discussed on Python-Dev.
This document has been placed in the public domain.
More information about the Python-Dev