[Python-ideas] PEP draft: context variables

Koos Zevenhoven k7hoven at gmail.com
Sat Oct 7 15:20:02 EDT 2017


Hi all,

Thank you for the feedback so far. FYI, or as a reminder, this is now PEP
555, but the web version is still the same draft that I posted here.

​The discussion of this was paused as there was a lot going on at that
moment, but I'm now getting ready to make a next version of the draft.​
Below, I'll draft some changes I intend to make so they can already be
discussed.

First of all, I'm considering calling the concept "context arguments"
instead of "context variables", because that describes the concept better.
But see below for some more.

On Tue, Sep 5, 2017 at 12:50 AM, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> Hi all,
>
> as promised, here is a draft PEP for context variable semantics and
> implementation. Apologies for the slight delay; I had a not-so-minor
> autosave accident and had to retype the majority of this first draft.
>
> During the past years, there has been growing interest in something like
> task-local storage or async-local storage. This PEP proposes an alternative
> approach to solving the problems that are typically stated as motivation
> for such concepts.
>
> This proposal is based on sketches of solutions since spring 2015, with
> some minor influences from the recent discussion related to PEP 550. I can
> also see some potential implementation synergy between this PEP and PEP
> 550, even if the proposed semantics are quite different.
>
> So, here it is. This is the first draft and some things are still missing,
> but the essential things should be there.
>
> -- Koos
>
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>
> PEP: 999
> Title: Context-local variables (contextvars)
> Version: $Revision$
> Last-Modified: $Date$
> Author: Koos Zevenhoven
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: DD-Mmm-YYYY
> Post-History: DD-Mmm-YYYY
>
>
> Abstract
> ========
>
> Sometimes, in special cases, it is desired that code can pass information
> down the function call chain to the callees without having to explicitly
> pass the information as arguments to each function in the call chain. This
> proposal describes a construct which allows code to explicitly switch in
> and out of a context where a certain context variable has a given value
> assigned to it. This is a modern alternative to some uses of things like
> global variables in traditional single-threaded (or thread-unsafe) code and
> of thread-local storage in traditional *concurrency-unsafe* code (single-
> or multi-threaded). In particular, the proposed mechanism can also be used
> with more modern concurrent execution mechanisms such as asynchronously
> executed coroutines, without the concurrently executed call chains
> interfering with each other's contexts.
>
> The "call chain" can consist of normal functions, awaited coroutines, or
> generators. The semantics of context variable scope are equivalent in all
> cases, allowing code to be refactored freely into *subroutines* (which here
> refers to functions, sub-generators or sub-coroutines) without affecting
> the semantics of context variables. Regarding implementation, this proposal
> aims at simplicity and minimum changes to the CPython interpreter and to
> other Python interpreters.
>
> Rationale
> =========
>
> Consider a modern Python *call chain* (or call tree), which in this
> proposal refers to any chained (nested) execution of *subroutines*, using
> any possible combinations of normal function calls, or expressions using
> ``await`` or ``yield from``. In some cases, passing necessary *information*
> down the call chain as arguments can substantially complicate the required
> function signatures, or it can even be impossible to achieve in practice.
> In these cases, one may search for another place to store this information.
> Let us look at some historical examples.
>
> The most naive option is to assign the value to a global variable or
> similar, where the code down the call chain can access it. However, this
> immediately makes the code thread-unsafe, because with multiple threads,
> all threads assign to the same global variable, and another thread can
> interfere at any point in the call chain.
>
> A somewhat less naive option is to store the information as per-thread
> information in thread-local storage, where each thread has its own "copy"
> of the variable which other threads cannot interfere with. Although
> non-ideal, this has been the best solution in many cases. However, thanks
> to generators and coroutines, the execution of the call chain can be
> suspended and resumed, allowing code in other contexts to run concurrently.
> Therefore, using thread-local storage is *concurrency-unsafe*, because
> other call chains in other contexts may interfere with the thread-local
> variable.
>
> Note that in the above two historical approaches, the stored information
> has the *widest* available scope without causing problems. For a third
> solution along the same path, one would first define an equivalent of a
> "thread" for asynchronous execution and concurrency. This could be seen as
> the largest amount of code and nested calls that is guaranteed to be
> executed sequentially without ambiguity in execution order. This might be
> referred to as concurrency-local or task-local storage. In this meaning of
> "task", there is no ambiguity in the order of execution of the code within
> one task. (This concept of a task is close to equivalent to a ``Task`` in
> ``asyncio``, but not exactly.) In such concurrency-locals, it is possible
> to pass information down the call chain to callees without another code
> path interfering with the value in the background.
>
> Common to the above approaches is that they indeed use variables with a
> wide but just-narrow-enough scope. Thread-locals could also be called
> thread-wide globals---in single-threaded code, they are indeed truly
> global. And task-locals could be called task-wide globals, because tasks
> can be very big.
>
> The issue here is that neither global variables, thread-locals nor
> task-locals are really meant to be used for this purpose of passing
> information of the execution context down the call chain. Instead of the
> widest possible variable scope, the scope of the variables should be
> controlled by the programmer, typically of a library, to have the desired
> scope---not wider. In other words, task-local variables (and globals and
> thread-locals) have nothing to do with the kind of context-bound
> information passing that this proposal intends to enable, even if
> task-locals can be used to emulate the desired semantics. Therefore, in the
> following, this proposal describes the semantics and the outlines of an
> implementation for *context-local variables* (or context variables,
> contextvars). In fact, as a side effect of this PEP, an async framework can
> use the proposed feature to implement task-local variables.
>
> Proposal
> ========
>
> Because the proposed semantics are not a direct extension to anything
> already available in Python, this proposal is first described in terms of
> semantics and API at a fairly high level. In particular, Python ``with``
> statements are heavily used in the description, as they are a good match
> with the proposed semantics. However, the underlying ``__enter__`` and
> ``__exit__`` methods correspond to functions in the lower-level
> speed-optimized (C) API. For clarity of this document, the lower-level
> functions are not explicitly named in the definition of the semantics.
> After describing the semantics and high-level API, the implementation is
> described, going to a lower level.
>
> Semantics and higher-level API
> ------------------------------
>
> Core concept
> ''''''''''''
>
> A context-local variable is represented by a single instance of
> ``contextvars.Var``, say ``cvar``. Any code that has access to the ``cvar``
> object can ask for its value with respect to the current context. In the
> high-level API, this value is given by the ``cvar.value`` property::
>
>     cvar = contextvars.Var(default="the default value",
>                            description="example context variable")
>
>

​Some points related to the arguments and naming:​

Indeed, this might change to contextvars.Arg. After all, these are more
like arguments than variables.

But just like with function arguments, you can use a mutable value, which
allows more variable-like semantics. That is, however, not the primarily
intended use. It may also cause more problems at inter-process or
inter-interpreter boundaries etc., where direct mutation of objects may not
be possible.​

​I might have to remove the ``default`` argument, at least in this form. If
there is a default, it should be more explicit what the scope of the
default is. There could be thread-wide defaults or interpreter-wide
defaults and so on.​ It is not completely clear what a truly global default
would mean.

One way to deal with this would be to always pass the context on to other
threads and processes etc when they are created. But there are some
ambiguities here too, so the safest way might be to let the user implement
the desired semantics regarding defaults and thread boundaries etc.


>     assert cvar.value == "the default value"  # default still applies
>
>     # In code examples, all ``assert`` statements should
>     # succeed according to the proposed semantics.
>
>
> No assignments to ``cvar`` have been applied for this context, so
> ``cvar.value`` gives the default value. Assigning new values to contextvars
> is done in a highly scope-aware manner::
>
>     with cvar.assign(new_value):
>         assert cvar.value is new_value
>         # Any code here, or down the call chain from here, sees:
>         #     cvar.value is new_value
>         # unless another value has been assigned in a
>         # nested context
>         assert cvar.value is new_value
>     # the assignment of ``cvar`` to ``new_value`` is no longer visible
>     assert cvar.value == "the default value"
>
>
> Here, ``cvar.assign(value)`` returns another object, namely
> ``contextvars.Assignment(cvar, new_value)``. The essential part here is
> that applying a context variable assignment (``Assignment.__enter__``) is
> paired with a de-assignment (``Assignment.__exit__``). These operations set
> the bounds for the scope of the assigned value.
>
> Assignments to the same context variable can be nested to override the
> outer assignment in a narrower context::
>
>     assert cvar.value == "the default value"
>     with cvar.assign("outer"):
>         assert cvar.value == "outer"
>         with cvar.assign("inner"):
>             assert cvar.value == "inner"
>         assert cvar.value == "outer"
>     assert cvar.value == "the default value"
>
>
> Also multiple variables can be assigned to in a nested manner without
> affecting each other::
>
>     cvar1 = contextvars.Var()
>     cvar2 = contextvars.Var()
>
>     assert cvar1.value is None # default is None by default
>     assert cvar2.value is None
>
>     with cvar1.assign(value1):
>         assert cvar1.value is value1
>         assert cvar2.value is None
>         with cvar2.assign(value2):
>             assert cvar1.value is value1
>             assert cvar2.value is value2
>         assert cvar1.value is value1
>         assert cvar2.value is None
>     assert cvar1.value is None
>     assert cvar2.value is None
>
> Or with more convenient Python syntax::
>
>     with cvar1.assign(value1), cvar2.assign(value2):
>         assert cvar1.value is value1
>         assert cvar2.value is value2
>
> In another *context*, in another thread or otherwise concurrently executed
> task or code path, the context variables can have a completely different
> state. The programmer thus only needs to worry about the context at hand.
>
> Refactoring into subroutines
> ''''''''''''''''''''''''''''
>
> Code using contextvars can be refactored into subroutines without
> affecting the semantics.  For instance::
>
>     assi = cvar.assign(new_value)
>     def apply():
>         assi.__enter__()
>     assert cvar.value == "the default value"
>     apply()
>     assert cvar.value is new_value
>     assi.__exit__()
>     assert cvar.value == "the default value"
>
>
> Or similarly in an asynchronous context where ``await`` expressions are
> used. The subroutine can now be a coroutine::
>
>     assi = cvar.assign(new_value)
>     async def apply():
>         assi.__enter__()
>     assert cvar.value == "the default value"
>     await apply()
>     assert cvar.value is new_value
>     assi.__exit__()
>     assert cvar.value == "the default value"
>
>
> Or when the subroutine is a generator::
>
>     def apply():
>         yield
>         assi.__enter__()
>
>
> which is called using ``yield from apply()`` or with calls to ``next`` or
> ``.send``. This is discussed further in later sections.
>
> Semantics for generators and generator-based coroutines
> '''''''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Generators, coroutines and async generators act as subroutines in much the
> same way that normal functions do. However, they have the additional
> possibility of being suspended by ``yield`` expressions. Assignment
> contexts entered inside a generator are normally preserved across yields::
>
>     def genfunc():
>         with cvar.assign(new_value):
>             assert cvar.value is new_value
>             yield
>             assert cvar.value is new_value
>     g = genfunc()
>     next(g)
>     assert cvar.value == "the default value"
>     with cvar.assign(another_value):
>         next(g)
>
>
> However, the outer context visible to the generator may change state
> across yields::
>
>     def genfunc():
>         assert cvar.value is value2
>         yield
>         assert cvar.value is value1
>         yield
>         with cvar.assign(value3):
>             assert cvar.value is value3
>
>     with cvar.assign(value1):
>         g = genfunc()
>         with cvar.assign(value2):
>             next(g)
>         next(g)
>         next(g)
>         assert cvar.value is value1
>
>
> Similar semantics apply to async generators defined by ``async def ...
> yield ...`` ).
>
> By default, values assigned inside a generator do not leak through yields
> to the code that drives the generator. However, the assignment contexts
> entered and left open inside the generator *do* become visible outside the
> generator after the generator has finished with a ``StopIteration`` or
> another exception::
>
>     assi = cvar.assign(new_value)
>     def genfunc():
>         yield
>         assi.__enter__():
>         yield
>
>     g = genfunc()
>     assert cvar.value == "the default value"
>     next(g)
>     assert cvar.value == "the default value"
>     next(g)  # assi.__enter__() is called here
>     assert cvar.value == "the default value"
>     next(g)
>     assert cvar.value is new_value
>     assi.__exit__()
>
>
>
> Special functionality for framework authors
> -------------------------------------------
>
> Frameworks, such as ``asyncio`` or third-party libraries, can use
> additional functionality in ``contextvars`` to achieve the desired
> semantics in cases which are not determined by the Python interpreter. Some
> of the semantics described in this section are also afterwards used to
> describe the internal implementation.
>
> Leaking yields
> ''''''''''''''
>
> Using the ``contextvars.leaking_yields`` decorator, one can choose to leak
> the context through ``yield`` expressions into the outer context that
> drives the generator::
>
>     @contextvars.leaking_yields
>     def genfunc():
>         assert cvar.value == "outer"
>         with cvar.assign("inner"):
>             yield
>             assert cvar.value == "inner"
>         assert cvar.value == "outer"
>
>     g = genfunc():
>     with cvar.assign("outer"):
>         assert cvar.value == "outer"
>         next(g)
>         assert cvar.value == "inner"
>         next(g)
>         assert cvar.value == "outer"
>
>
>
​Unfortunately, we actually need a third kind of generator semantics,
something like this:

@​contextvars.caller_context
def genfunc():
    assert cvar.value is the_value
    yield
    assert cvar.value is the_value

with cvar.assign(the_value):
    gen = genfunc()

next(gen)

with cvar.assign(1234567890):
    try:
        next(gen)
    except StopIteration:
        pass

Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly
missed the reasons for this in discussions related to PEP 550. Perhaps
because we had mostly been looking at it from an async angle.

[In addition to this, all context changes (Assignment __enter__ or
__exit__) would be leaked out when the generator finishes iff there are no
outer context changes. If there are outer context changes, an attempt to
leak changes will fail. (I will probably need to explain this better).]

​

> Capturing contextvar assignments
> ''''''''''''''''''''''''''''''''
>
> Using ``contextvars.capture()``, one can capture the assignment contexts
> that are entered by a block of code. The changes applied by the block of
> code can then be reverted and subsequently reapplied, even in another
> context::
>
>     assert cvar1.value is None # default
>     assert cvar2.value is None # default
>     assi1 = cvar1.assign(value1)
>     assi2 = cvar1.assign(value2)
>     with contextvars.capture() as delta:
>         assi1.__enter__()
>         with cvar2.assign("not captured"):
>             assert cvar2.value is "not captured"
>         assi2.__enter__()
>     assert cvar1.value is value2
>     delta.revert()
>     assert cvar1.value is None
>     assert cvar2.value is None
>     ...
>     with cvar1.assign(1), cvar2.assign(2):
>         delta.reapply()
>         assert cvar1.value is value2
>         assert cvar2.value == 2
>
>
> However, reapplying the "delta" if its net contents include deassignments
> may not be possible (see also Implementation and Open Issues).
>
>
> Getting a snapshot of context state
> '''''''''''''''''''''''''''''''''''
>
> The function ``contextvars.get_local_state()`` returns an object
> representing the applied assignments to all context-local variables in the
> context where the function is called. This can be seen as equivalent to
> using ``contextvars.capture()`` to capture all context changes from the
> beginning of execution. The returned object supports methods ``.revert()``
> and ``reapply()`` as above.
>
>
​We will probably need also a ``use()`` method (or another name) here. That
would return a context manager that applies the full context on __enter__
and reapplies the previous one on __exit__.



>
> Running code in a clean state
> '''''''''''''''''''''''''''''
>
> Although it is possible to revert all applied context changes using the
> above primitives, a more convenient way to run a block of code in a clean
> context is provided::
>
>     with context_vars.clean_context():
>         # here, all context vars start off with their default values
>     # here, the state is back to what it was before the with block.
>
>
>>

​As an additional tool, there could be contextvars.callback:

@contextvars.callback
def some_callback():
    # do stuff

This would provide some of the functionality of this PEP if callbacks are
used, so that the callback would be run with the same context as the code
that creates the callback.

The implementation of this would be essentially:

def callback(func):
    context = contextvars.get_local_context():
    def wrapped(*args, **kwargs):
        with context.use():
            func(*args, **kwargs
    return wrapped

With some trickery this might allow an async framework based on callbacks
instead of coroutines to use context arguments. But using this m​ight be a
bit awkward sometimes. A contextlib.ExitStack might help here.

​
>
Implementation
> --------------
>
> This section describes to a variable level of detail how the described
> semantics can be implemented. At present, an implementation aimed at
> simplicity but sufficient features is described. More details will be added
> later.
>
> Alternatively, a somewhat more complicated implementation offers minor
> additional features while adding some performance overhead and requiring
> more code in the implementation.
>
> Data structures and implementation of the core concept
> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Each thread of the Python interpreter keeps its on stack of
> ``contextvars.Assignment`` objects, each having a pointer to the previous
> (outer) assignment like in a linked list. The local state (also returned by
> ``contextvars.get_local_state()``) then consists of a reference to the
> top of the stack and a pointer/weak reference to the bottom of the stack.
> This allows efficient stack manipulations. An object produced by
> ``contextvars.capture()`` is similar, but refers to only a part of the
> stack with the bottom reference pointing to the top of the stack as it was
> in the beginning of the capture block.
>
> Now, the stack evolves according to the assignment ``__enter__`` and
> ``__exit__`` methods. For example::
>
>     cvar1 = contextvars.Var()
>     cvar2 = contextvars.Var()
>     # stack: []
>     assert cvar1.value is None
>     assert cvar2.value is None
>
>     with cvar1.assign("outer"):
>         # stack: [Assignment(cvar1, "outer")]
>         assert cvar1.value == "outer"
>
>         with cvar1.assign("inner"):
>             # stack: [Assignment(cvar1, "outer"),
>             #         Assignment(cvar1, "inner")]
>             assert cvar1.value == "inner"
>
>             with cvar2.assign("hello"):
>                 # stack: [Assignment(cvar1, "outer"),
>                 #         Assignment(cvar1, "inner"),
>                 #         Assignment(cvar2, "hello")]
>                 assert cvar2.value == "hello"
>
>             # stack: [Assignment(cvar1, "outer"),
>             #         Assignment(cvar1, "inner")]
>             assert cvar1.value == "inner"
>             assert cvar2.value is None
>
>         # stack: [Assignment(cvar1, "outer")]
>         assert cvar1.value == "outer"
>
>     # stack: []
>     assert cvar1.value is None
>     assert cvar2.value is None
>
>
> Getting a value from the context using ``cvar1.value`` can be implemented
> as finding the topmost occurrence of a ``cvar1`` assignment on the stack
> and returning the value there, or the default value if no assignment is
> found on the stack. However, this can be optimized to instead be an O(1)
> operation in most cases. Still, even searching through the stack may be
> reasonably fast since these stacks are not intended to grow very large.
>

​I will still need to explain the O(1) algorithm, but one nice thing is
that an implementation like micropython does not necessarily need to
include that optimization.​


>
> The above description is already sufficient for implementing the core
> concept. Suspendable frames require some additional attention, as explained
> in the following.
>
> Implementation of generator and coroutine semantics
> '''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Within generators, coroutines and async generators, assignments and
> deassignments are handled in exactly the same way as anywhere else.
> However, some changes are needed in the builtin generator methods ``send``,
> ``__next__``, ``throw`` and ``close``. Here is the Python equivalent of the
> changes needed in ``send`` for a generator (here ``_old_send`` refers to
> the behavior in Python 3.6)::
>
>     def send(self, value):
>         # if decorated with contextvars.leaking_yields
>         if self.gi_contextvars is LEAK:
>             # nothing needs to be done to leak context through yields :)
>             return self._old_send(value)
>         try:
>             with contextvars.capture() as delta:
>                 if self.gi_contextvars:
>                     # non-zero captured content from previous iteration
>                     self.gi_contextvars.reapply()
>                 ret = self._old_send(value)
>         except Exception:
>             raise
>         else:
>             # suspending, revert context changes but
>             delta.revert()
>             self.gi_contextvars = delta
>         return ret
>
>
> The corresponding modifications to the other methods is essentially
> identical. The same applies to coroutines and async generators.
>
> For code that does not use ``contextvars``, the additions are O(1) and
> essentially reduce to a couple of pointer comparisons. For code that does
> use ``contextvars``, the additions are still O(1) in most cases.
>
> More on implementation
> ''''''''''''''''''''''
>
> The rest of the functionality, including ``contextvars.leaking_yields``,
> contextvars.capture()``, ``contextvars.get_local_state()`` and
> ``contextvars.clean_context()`` are in fact quite straightforward to
> implement, but their implementation will be discussed further in later
> versions of this proposal. Caching of assigned values is somewhat more
> complicated, and will be discussed later, but it seems that most cases
> should achieve O(1) complexity.
>
> Backwards compatibility
> =======================
>
> There are no *direct* backwards-compatibility concerns, since a completely
> new feature is proposed.
>
> However, various traditional uses of thread-local storage may need a
> smooth transition to ``contextvars`` so they can be concurrency-safe. There
> are several approaches to this, including emulating task-local storage with
> a little bit of help from async frameworks. A fully general implementation
> cannot be provided, because the desired semantics may depend on the design
> of the framework.
>
>
​I have a preliminary design for this, but probably doesn't need to be in
this PEP.​


> Another way to deal with the transition is for code to first look for a
> context created using ``contextvars``. If that fails because a new-style
> context has not been set or because the code runs on an older Python
> version, a fallback to thread-local storage is used.
>
>
​If context variables are renamed context arguments, then there could be a
settable variant called a context variable (could also be a third-party
thing on top of context arguments, depending on what is done with decimal
contexts).​


>
> Open Issues
> ===========
>
> Out-of-order de-assignments
> ---------------------------
>
> In this proposal, all variable deassignments are made in the opposite
> order compared to the preceding assignments. This has two useful
> properties: it encourages using ``with`` statements to define assignment
> scope and has a tendency to catch errors early (forgetting a
> ``.__exit__()`` call often results in a meaningful error. To have this as a
> requirement requirement is beneficial also in terms of implementation
> simplicity and performance. Nevertheless, allowing out-of-order context
> exits is not completely out of the question, and reasonable implementation
> strategies for that do exist.
>
>>
Rejected Ideas
> ==============
>
> Dynamic scoping linked to subroutine scopes
> -------------------------------------------
>
> The scope of value visibility should not be determined by the way the code
> is refactored into subroutines. It is necessary to have per-variable
> control of the assignment scope.
>
>
​In fact, in early sketches, my approach was closer to this. The context
variables (or async variables) were stored in frame locals in a namespace
called `__async__` and they were propagated through subroutine calls to
callees. But this introduces problems when new scope layers are added,
and ended up being more complicated (and slightly similar to PEP 550).


Anyway, for starters, this was a glimpse of the changes I have planned,
and open for discussion.

-- Koos

​--
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171007/2e41dc00/attachment-0001.html>


More information about the Python-ideas mailing list