[Python-Dev] PEP 567 v2

Paul Moore p.f.moore at gmail.com
Wed Jan 3 06:34:08 EST 2018


On 28 December 2017 at 06:08, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> This is a second version of PEP 567.

Overall, I like the proposal. It's relatively straightforward to follow, and
makes sense.

One thing I *don't* see in the PEP is an example of how code using thread-local
storage should be modified to use context variables. My impression is that it's
simply a matter of replacing the TLS calls with equivalent ContextVar calls,
but an example might be helpful.

Some detail points below.

> Rationale
> =========
>
> Thread-local variables are insufficient for asynchronous tasks that
> execute concurrently in the same OS thread.  Any context manager that
> saves and restores a context value using ``threading.local()`` will
> have its context values bleed to other code unexpectedly when used
> in async/await code.

I understand how this could happen, having followed the discussions here, but a
(simple) example of the issue might be useful.

> A few examples where having a working context local storage for
> asynchronous code is desirable:
>
> * Context managers like ``decimal`` contexts and ``numpy.errstate``.
>
> * Request-related data, such as security tokens and request
>   data in web applications, language context for ``gettext``, etc.
>
> * Profiling, tracing, and logging in large code bases.
>
>
> Introduction
> ============
>
> The PEP proposes a new mechanism for managing context variables.
> The key classes involved in this mechanism are ``contextvars.Context``
> and ``contextvars.ContextVar``.  The PEP also proposes some policies
> for using the mechanism around asynchronous tasks.
>
> The proposed mechanism for accessing context variables uses the
> ``ContextVar`` class.  A module (such as ``decimal``) that wishes to
> store a context variable should:
>
> * declare a module-global variable holding a ``ContextVar`` to
>   serve as a key;
>
> * access the current value via the ``get()`` method on the
>   key variable;
>
> * modify the current value via the ``set()`` method on the
>   key variable.
>
> The notion of "current value" deserves special consideration:
> different asynchronous tasks that exist and execute concurrently
> may have different values for the same key.  This idea is well-known
> from thread-local storage but in this case the locality of the value is
> not necessarily bound to a thread.  Instead, there is the notion of the
> "current ``Context``" which is stored in thread-local storage, and
> is accessed via ``contextvars.copy_context()`` function.

Accessed by copying it? That seems weird to me. I'd expect either that you'd be
able to access the current Context directly, *or* that you'd say that the
current Context is not directly accessible by the user, but that a copy can be
obtained using copy_context. But given that the Context is immutable, why the
need tp copy it?

Also, the references to threads in the above are confusing. It says that this
is a well-known concept in terms of thread-local storage, but this case is
different. It then goes on to say that the current Context is stored in thread
local storage, which gives me the impression that the new idea *is* related to
thread local storage...

I think that the fact that a Context is held in thread-local storage is an
implementation detail. Assuming I'm right, don't bother mentioning it - simply
say that there's a notion of a current Context and leave it at that.

> Manipulation of the current ``Context`` is the responsibility of the
> task framework, e.g. asyncio.
>
> A ``Context`` is conceptually a read-only mapping, implemented using
> an immutable dictionary.  The ``ContextVar.get()`` method does a
> lookup in the current ``Context`` with ``self`` as a key, raising a
> ``LookupError``  or returning a default value specified in
> the constructor.
>
> The ``ContextVar.set(value)`` method clones the current ``Context``,
> assigns the ``value`` to it with ``self`` as a key, and sets the
> new ``Context`` as the new current ``Context``.
>

On first reading, this confused me because I didn't spot that you're saying a
*Context* is read-only, but a *ContextVar* has get and set methods.

Maybe reword this to say that a Context is a read-only mapping from ContextVars
to values. A ContextVar has a get method that looks up its value in the current
Context, and a set method that replaces the current Context with a new one that
associates the specified value with this ContextVar.

(The current version feels confusing to me because it goes into too much detail
on how the implementation does this, rather than sticking to the high-level
specification)

> Specification
> =============
>
> A new standard library module ``contextvars`` is added with the
> following APIs:
>
> 1. ``copy_context() -> Context`` function is used to get a copy of
>    the current ``Context`` object for the current OS thread.
>
> 2. ``ContextVar`` class to declare and access context variables.
>
> 3. ``Context`` class encapsulates context state.  Every OS thread
>    stores a reference to its current ``Context`` instance.
>    It is not possible to control that reference manually.
>    Instead, the ``Context.run(callable, *args, **kwargs)`` method is
>    used to run Python code in another context.

Context.run() came a bit out of nowhere here. Maybe the part from "It
is not possible..." should be in the introduction above? Something
like the following, covering this and copy_context:

    The current Context cannot be accessed directly by user code. If the
    frameowrk wants to run some code in a different Context, the
    Context.run(callable, *args, **kwargs) method is used to do that. To
    construct a new context for this purpose, the current context can be copied
    via the copy_context function, and manipulated prior to the call to run().

>
> contextvars.ContextVar
> ----------------------
>
> The ``ContextVar`` class has the following constructor signature:
> ``ContextVar(name, *, default=_NO_DEFAULT)``.  The ``name`` parameter
> is used only for introspection and debug purposes, and is exposed
> as a read-only ``ContextVar.name`` attribute.  The ``default``
> parameter is optional.  Example::
>
>     # Declare a context variable 'var' with the default value 42.
>     var = ContextVar('var', default=42)
>
> (The ``_NO_DEFAULT`` is an internal sentinel object used to
> detect if the default value was provided.)

My first thought was that default was the context variable's initial value. But
if that's what it is, why not call it that? If the default has another effect
as well as being the initial value, maybe clarify here what that is?

> ``ContextVar.get()`` returns a value for context variable from the
> current ``Context``::
>
>     # Get the value of `var`.
>     var.get()
>
> ``ContextVar.set(value) -> Token`` is used to set a new value for
> the context variable in the current ``Context``::
>
>     # Set the variable 'var' to 1 in the current context.
>     var.set(1)
>
> ``ContextVar.reset(token)`` is used to reset the variable in the
> current context to the value it had before the ``set()`` operation
> that created the ``token``::
>
>     assert var.get(None) is None

get doesn't take an argument. Typo?

>     token = var.set(1)
>     try:
>         ...
>     finally:
>         var.reset(token)
>
>     assert var.get(None) is None

same typo?

> ``ContextVar.reset()`` method is idempotent and can be called
> multiple times on the same Token object: second and later calls
> will be no-ops.
>
>
> contextvars.Token
> -----------------
>
> ``contextvars.Token`` is an opaque object that should be used to
> restore the ``ContextVar`` to its previous value, or remove it from
> the context if the variable was not set before.  It can be created
> only by calling ``ContextVar.set()``.
>
> For debug and introspection purposes it has:
>
> * a read-only attribute ``Token.var`` pointing to the variable
>   that created the token;
>
> * a read-only attribute ``Token.old_value`` set to the value the
>   variable had before the ``set()`` call, or to ``Token.MISSING``
>   if the variable wasn't set before.
>
> Having the ``ContextVar.set()`` method returning a ``Token`` object
> and the ``ContextVar.reset(token)`` method, allows context variables
> to be removed from the context if they were not in it before the
> ``set()`` call.
>
>
> contextvars.Context
> -------------------
>
> ``Context`` object is a mapping of context variables to values.
>
> ``Context()`` creates an empty context.  To get a copy of the current
> ``Context`` for the current OS thread, use the
> ``contextvars.copy_context()`` method::
>
>     ctx = contextvars.copy_context()
>
> To run Python code in some ``Context``, use ``Context.run()``
> method::
>
>     ctx.run(function)
>
> Any changes to any context variables that ``function`` causes will
> be contained in the ``ctx`` context::
>
>     var = ContextVar('var')
>     var.set('spam')
>
>     def function():
>         assert var.get() == 'spam'
>
>         var.set('ham')
>         assert var.get() == 'ham'
>
>     ctx = copy_context()
>
>     # Any changes that 'function' makes to 'var' will stay
>     # isolated in the 'ctx'.
>     ctx.run(function)
>
>     assert var.get() == 'spam'
>
> Any changes to the context will be contained in the ``Context``
> object on which ``run()`` is called on.
>
> ``Context.run()`` is used to control in which context asyncio
> callbacks and Tasks are executed.  It can also be used to run some
> code in a different thread in the context of the current thread::
>
>     executor = ThreadPoolExecutor()
>     current_context = contextvars.copy_context()
>
>     executor.submit(
>         lambda: current_context.run(some_function))
>
> ``Context`` objects implement the ``collections.abc.Mapping`` ABC.
> This can be used to introspect context objects::
>
>     ctx = contextvars.copy_context()
>
>     # Print all context variables and their values in 'ctx':
>     print(ctx.items())
>
>     # Print the value of 'some_variable' in context 'ctx':
>     print(ctx[some_variable])
>
>
> asyncio
> -------
[...]
>
> C API
> -----
>
[...]

I haven't commented on these as they aren't my area of expertise.

> Implementation
> ==============
>
> This section explains high-level implementation details in
> pseudo-code.  Some optimizations are omitted to keep this section
> short and clear.

Again, I'm ignoring this as I don't really have an interest in how the facility
is implemented.

>
> Implementation Notes
> ====================
>
> * The internal immutable dictionary for ``Context`` is implemented
>   using Hash Array Mapped Tries (HAMT).  They allow for O(log N)
>   ``set`` operation, and for O(1) ``copy_context()`` function, where
>   *N* is the number of items in the dictionary.  For a detailed
>   analysis of HAMT performance please refer to :pep:`550` [1]_.

Would it be worth exposing this data structure elsewhere, in case
other uses for it exist?

> * ``ContextVar.get()`` has an internal cache for the most recent
>   value, which allows to bypass a hash lookup.  This is similar
>   to the optimization the ``decimal`` module implements to
>   retrieve its context from ``PyThreadState_GetDict()``.
>   See :pep:`550` which explains the implementation of the cache
>   in a great detail.
>

Should the cache (or at least the performance guarantees it implies) be part of
the spec? Do we care if other implementations fail to implement a cache?


More information about the Python-Dev mailing list