[Python-Dev] Pre-PEP: Task-local variables

Thu Oct 20 03:40:35 CEST 2005

This is still rather rough, but I figured it's easier to let everybody fill 
in the remaining gaps by arguments than it is for me to pick a position I 
like and try to convince everybody else that it's right.  :)  Your feedback 
is requested and welcome.

PEP: XXX
Title: Task-local Variables
Author: Phillip J. Eby <pje at telecommunity.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Oct-2005
Python-Version: 2.5
Post-History: 19-Oct-2005

Abstract
========

Many Python modules provide some kind of global or thread-local state,
which is relatively easy to implement.  With the acceptance of PEP
342, however, co-routines will become more common, and it will be
desirable in many cases to treat each as its own logical thread of
execution. So, many kinds of state that might now be kept as a
thread-specific variable (such as the "current transaction" in ZODB or
the "current database connection" in SQLObject) will not work with
coroutines.

This PEP proposes a simple mechanism akin to thread-local variables,
but which will make it easy and efficient for co-routine schedulers to
switch state between tasks.  The mechanism is proposed for the standard
library because its usefulness is dependent on its adoption by
standard library modules, such as the ``decimal`` module. The proposed
features can be implemented as pure Python code, and as such are
suitable for use by other Python implementations (including older
versions of Python, if desired).

Motivation
==========

PEP 343's new "with" statement makes it very attractive to temporarily
alter some aspect of system state, and then restore it, using a
context manager.  Many of PEP 343's examples are of this nature,
whether they are temporarily redirecting ``sys.stdout``, or
temporarily altering decimal precision.

But when this attractive feature is combined with PEP 342-style
co-routines, a new challenge emerges.  Consider this code, which may
misbehave if run as a co-routine::

         with opening(filename, "w") as f:
             with redirecting_stdout(f):
                 print "Hello world"
                 yield pause(5)
                 print "Goodbye world"

Problems can arise from this code in two ways.  First, the redirection
of output "leaks out" to other coroutines during the pause.  Second,
when this coroutine is finished, it resets stdout to whatever it was
at the beginning of the coroutine, regardless of what another
co-routine might have been using.

Similar issues can be demonstrated using the decimal context,
transactions, database connections, etc., which are all likely to be
popular contexts for the "with" statement.  However, if these new
context managers are written to use global or thread-local state,
coroutines will be locked out of the market, so to speak.

Therefore, this PEP proposes to provide and promote a standard way of
managing per-execution-context state, such that coroutine schedulers
can keep each coroutine's state distinct.  If this mechanism is then
used by library modules (such as ``decimal``) to maintain their
current state, then they will be transparently compatible with
co-routines as well as threaded and threadless code.

(Note that for Python 2.x versions, backward compatibility requires
that we continue to allow direct reassignment to e.g. ``sys.stdout``.
So, it will still of course be possible to write code that will
interoperate poorly with co-routines.  But for Python 3.x it seems
worth considering making some of the ``sys`` module's contents into
task-local variables rather than assignment targets.)

Specification
=============

This PEP proposes to offer a standard library module called
``context``, with the following core contents:

Variable
     A class that allows creation of a context variable (see below).

snapshot()
     Returns a snapshot of the current execution context.

swap(ctx)
     Set the current context to `ctx`, returning a snapshot of the
     current context.

The basic idea here is that a co-routine scheduler can switch between
tasks by doing something like::

     last_coroutine.state = context.swap(next_coroutine.state)

Or perhaps more like::

     # ... execute coroutine iteration
     last_coroutine.state = context.snapshot()
     # ... figure out what routine to run next
     context.swap(next_coroutine.state)

Each ``context.Variable`` stores and retrieves its state using the
current execution context, which is thread-specific.  (Thus, each
thread may execute any number of concurrent tasks, although most
practical systems today have only one thread that executes coroutines,
the other threads being reserved for operations that would otherwise
block co-routine execution.  Nonetheless, such other threads will often
still require context variables of their own.)

Context Variable Objects
------------------------

A context variable object provides the following methods:

get(default=None)
     Return the value of the variable in the current execution context,
     or `default` if not set.

set(value)
     Set the value of the variable for the current execution context.

unset()
     Delete the value of the variable for the current execution context.

__call__(*value)
     If called with an argument, return a context manager that sets the
     variable to the specified value, then restores the old value upon
     ``__exit__``.  If called without an argument, return the value of
     the variable for the current execution context, or raise an error
     if no value is set.  Thus::

         with some_variable(value):
              foo()

     would be roughly equivalent to::

         old = some_variable()
         some_variable.set(value)
         try:
             foo()
         finally:
             some_variable.set(old)

Implementation Details
----------------------

The simplest possible implementation is for ``Variable`` objects to
use themselves as unique keys into an execution context dictionary.
The context dictionary would be stored in another dictionary, keyed by
``get_thread_ident()``.  This approach would work with almost any
version or implementation of Python.

For efficiency's sake, however, CPython could simply store the
execution context dictionary in its "thread state" structure, creating
an empty dictionary at thread initialization time.  This would make it
somewhat easier to offer a C API for access to context variables,
especially where efficiency of access is desirable.  But the proposal
does not depend on this.

In the PEP author's experiments, a simple copy-on-write optimization
to the the ``set()`` and ``unset()`` methods allows for high
performance task switching.  By placing a "frozen" flag in the context
dictionary when a snapshot is taken, and then checking for the flag
before making changes, a single snapshot can be shared by multiple
callers, and thus a ``swap()`` operation is little more than two
dictionary writes and a read.  This leads to higher performance in the
typical case, because context variables are more likely to set in
outer loops, but task switches are more likely to occur in inner
loops.  A copy-on-write approach thus prevents copying from occurring
during most task switches.

Possible Enhancements
---------------------

The core of this proposal is extremely minimalist, as it should be
possible to do almost anything desired using combinations of
``Variable`` objects or by simply using variables whose values are
mutable objects.  There are, however, a variety of options for
enhancement:

``manager`` decorator
     The ``context`` module could perhaps be the home of the PEP 343
     ``contextmanager`` decorator, effectively renamed to
     ``context.manager``.  This could be a natural fit, in that it would
     remind the creators of new context managers that they should
     consider tracking any associated state in a ``context.Variable``.

Proxy class
     Sometimes it's useful to have an object that looks like a module
     global (e.g. ``sys.stdout``) but which actually delegates its
     behavior to a context-specific instance.  Thus, you could have one
     ``sys.stdout``, but its actual output would be directed based on
     the current execution context. The simplest form of such a proxy
     class might look something like::

         class Proxy(object):
             def __init__(self, initial_value):
                 self.var = context.Variable()
                 self.var.set(initial_value)

             def __call__(self,*value):
                 return object.__getattribute__(self,'var')(*value)

             def __getattribute__(self, attr):
                 var = object.__getattribute__(self,'var')
                 return getattr(var, attr)

         sys.stdout = Proxy(sys.stdout)   # make sys.stdout selectable

         with sys.stdout(somefile):  # temporary redirect in current context
             print "hey!"

     The main open issues in implementing this sort of proxy are in the
     precise set of special methods (e.g. ``__getitem__``,
     ``__setattr__``, etc.) that should be supported, and what API
     should be supplied for changing the value, setting a default value
     for new threads, etc.

Low-level API
     Currently, this PEP does not specify an API for accessing and
     modifying the current execution context, nor a C API for such
     access. It currently assumes that ``snapshot()``, ``swap()`` and
     ``Variable`` are the only public means of accessing context
     information.  It may be desirable to offer finer-grained APIs for
     use by more advanced uses (such as creating an API for management
     of proxies).  And it may be desirable to have a C API for use by
     Python extensions that wish convenient access to context
     variables.

Rationale
=========

Different libraries have different uses for maintaining a "current"
state, be it global or local to a specific thread or task.  There is
currently no way for task-management code to find and switch all of
these "current" states.  And even if it could, task switching
performance would degrade linearly as new libraries were added.

One possible alternative approach to this proposal, would be for
explicit task objects to exist, and to provide a way to give them
identities, so that libraries could instead store their own state
as a property of the task, rather than storing their state in a
task-specific mapping.  This offers similar potential performance
to a copy-on-write strategy, but would use more memory than this
proposal when only one task is involved.  (Because each variable
would have a dictionary mapping from task to the variable's value, but
in this proposal there is simply a single dictionary for the task.)

Some languages offer "dynamically scoped" variables that are somewhat
similar in behavior to the context variables proposed by this PEP.
The principal differences are that:

1. Context variables are objects used to obtain or save a value,
    rather than being a syntactic construct of the language.

2. PEP 343 allows for *controlled* manipulation of context variables,
    helping to prevent "duelling libraries" from changing state on each
    other.  Also, a library can potentially ``snapshot()`` a desired
    state at startup, and use ``swap()`` to restore that state on
    re-entry.  (And could even define a simple decorator to wrap its
    entry points to ensure this.)

3. The PEP author is not aware of any language that explicitly offers
    coroutine-scoped variables, but presumes that they can be modelled
    with monads or continuations in functional languages like Haskell.
    (And I only mention this to forestall the otherwise-inevitable
    response from fans of such techniques, pointing out that it's
    possible.)

Reference Implementation
========================

The author has prototyped an implementation with somewhat fancier
features than shown here, but prefers not to publish it until the
basic features and choices of optional functionality have been
discussed on Python-Dev.

Copyright
=========

This document has been placed in the public domain.

..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    End: