[Python-Dev] PEP 567 v2

Chris Jerdonek chris.jerdonek at gmail.com
Thu Dec 28 05:28:28 EST 2017


I have a couple basic questions around how this API could be used in
practice. Both of my questions are for the Python API as applied to Tasks
in asyncio.

1) Would this API support looking up the value of a context variable for
**another** Task? For example, if you're managing multiple tasks using
asyncio.wait() and there is an exception in some task, you might want to
examine and report the value of a context variable for that task.

2) Would an appropriate use of this API be to assign a unique task id to
each task? Or can that be handled more simply? I'm wondering because I
recently thought this would be useful, and it doesn't seem like asyncio
means for one to subclass Task (though I could be wrong).

Thanks,
--Chris


On Wed, Dec 27, 2017 at 10:08 PM, Yury Selivanov <yselivanov.ml at gmail.com>
wrote:

> This is a second version of PEP 567.
>
> A few things have changed:
>
> 1. I now have a reference implementation:
> https://github.com/python/cpython/pull/5027
>
> 2. The C API was updated to match the implementation.
>
> 3. The get_context() function was renamed to copy_context() to better
> reflect what it is really doing.
>
> 4. Few clarifications/edits here and there to address earlier feedback.
>
>
> Yury
>
>
> PEP: 567
> Title: Context Variables
> Version: $Revision$
> Last-Modified: $Date$
> Author: Yury Selivanov <yury at magic.io>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 12-Dec-2017
> Python-Version: 3.7
> Post-History: 12-Dec-2017, 28-Dec-2017
>
>
> Abstract
> ========
>
> This PEP proposes a new ``contextvars`` module and a set of new
> CPython C APIs to support context variables.  This concept is
> similar to thread-local storage (TLS), but, unlike TLS, it also allows
> correctly keeping track of values per asynchronous task, e.g.
> ``asyncio.Task``.
>
> This proposal is a simplified version of :pep:`550`.  The key
> difference is that this PEP is concerned only with solving the case
> for asynchronous tasks, not for generators.  There are no proposed
> modifications to any built-in types or to the interpreter.
>
> This proposal is not strictly related to Python Context Managers.
> Although it does provide a mechanism that can be used by Context
> Managers to store their state.
>
>
> Rationale
> =========
>
> Thread-local variables are insufficient for asynchronous tasks that
> execute concurrently in the same OS thread.  Any context manager that
> saves and restores a context value using ``threading.local()`` will
> have its context values bleed to other code unexpectedly when used
> in async/await code.
>
> A few examples where having a working context local storage for
> asynchronous code is desirable:
>
> * Context managers like ``decimal`` contexts and ``numpy.errstate``.
>
> * Request-related data, such as security tokens and request
>   data in web applications, language context for ``gettext``, etc.
>
> * Profiling, tracing, and logging in large code bases.
>
>
> Introduction
> ============
>
> The PEP proposes a new mechanism for managing context variables.
> The key classes involved in this mechanism are ``contextvars.Context``
> and ``contextvars.ContextVar``.  The PEP also proposes some policies
> for using the mechanism around asynchronous tasks.
>
> The proposed mechanism for accessing context variables uses the
> ``ContextVar`` class.  A module (such as ``decimal``) that wishes to
> store a context variable should:
>
> * declare a module-global variable holding a ``ContextVar`` to
>   serve as a key;
>
> * access the current value via the ``get()`` method on the
>   key variable;
>
> * modify the current value via the ``set()`` method on the
>   key variable.
>
> The notion of "current value" deserves special consideration:
> different asynchronous tasks that exist and execute concurrently
> may have different values for the same key.  This idea is well-known
> from thread-local storage but in this case the locality of the value is
> not necessarily bound to a thread.  Instead, there is the notion of the
> "current ``Context``" which is stored in thread-local storage, and
> is accessed via ``contextvars.copy_context()`` function.
> Manipulation of the current ``Context`` is the responsibility of the
> task framework, e.g. asyncio.
>
> A ``Context`` is conceptually a read-only mapping, implemented using
> an immutable dictionary.  The ``ContextVar.get()`` method does a
> lookup in the current ``Context`` with ``self`` as a key, raising a
> ``LookupError``  or returning a default value specified in
> the constructor.
>
> The ``ContextVar.set(value)`` method clones the current ``Context``,
> assigns the ``value`` to it with ``self`` as a key, and sets the
> new ``Context`` as the new current ``Context``.
>
>
> Specification
> =============
>
> A new standard library module ``contextvars`` is added with the
> following APIs:
>
> 1. ``copy_context() -> Context`` function is used to get a copy of
>    the current ``Context`` object for the current OS thread.
>
> 2. ``ContextVar`` class to declare and access context variables.
>
> 3. ``Context`` class encapsulates context state.  Every OS thread
>    stores a reference to its current ``Context`` instance.
>    It is not possible to control that reference manually.
>    Instead, the ``Context.run(callable, *args, **kwargs)`` method is
>    used to run Python code in another context.
>
>
> contextvars.ContextVar
> ----------------------
>
> The ``ContextVar`` class has the following constructor signature:
> ``ContextVar(name, *, default=_NO_DEFAULT)``.  The ``name`` parameter
> is used only for introspection and debug purposes, and is exposed
> as a read-only ``ContextVar.name`` attribute.  The ``default``
> parameter is optional.  Example::
>
>     # Declare a context variable 'var' with the default value 42.
>     var = ContextVar('var', default=42)
>
> (The ``_NO_DEFAULT`` is an internal sentinel object used to
> detect if the default value was provided.)
>
> ``ContextVar.get()`` returns a value for context variable from the
> current ``Context``::
>
>     # Get the value of `var`.
>     var.get()
>
> ``ContextVar.set(value) -> Token`` is used to set a new value for
> the context variable in the current ``Context``::
>
>     # Set the variable 'var' to 1 in the current context.
>     var.set(1)
>
> ``ContextVar.reset(token)`` is used to reset the variable in the
> current context to the value it had before the ``set()`` operation
> that created the ``token``::
>
>     assert var.get(None) is None
>
>     token = var.set(1)
>     try:
>         ...
>     finally:
>         var.reset(token)
>
>     assert var.get(None) is None
>
> ``ContextVar.reset()`` method is idempotent and can be called
> multiple times on the same Token object: second and later calls
> will be no-ops.
>
>
> contextvars.Token
> -----------------
>
> ``contextvars.Token`` is an opaque object that should be used to
> restore the ``ContextVar`` to its previous value, or remove it from
> the context if the variable was not set before.  It can be created
> only by calling ``ContextVar.set()``.
>
> For debug and introspection purposes it has:
>
> * a read-only attribute ``Token.var`` pointing to the variable
>   that created the token;
>
> * a read-only attribute ``Token.old_value`` set to the value the
>   variable had before the ``set()`` call, or to ``Token.MISSING``
>   if the variable wasn't set before.
>
> Having the ``ContextVar.set()`` method returning a ``Token`` object
> and the ``ContextVar.reset(token)`` method, allows context variables
> to be removed from the context if they were not in it before the
> ``set()`` call.
>
>
> contextvars.Context
> -------------------
>
> ``Context`` object is a mapping of context variables to values.
>
> ``Context()`` creates an empty context.  To get a copy of the current
> ``Context`` for the current OS thread, use the
> ``contextvars.copy_context()`` method::
>
>     ctx = contextvars.copy_context()
>
> To run Python code in some ``Context``, use ``Context.run()``
> method::
>
>     ctx.run(function)
>
> Any changes to any context variables that ``function`` causes will
> be contained in the ``ctx`` context::
>
>     var = ContextVar('var')
>     var.set('spam')
>
>     def function():
>         assert var.get() == 'spam'
>
>         var.set('ham')
>         assert var.get() == 'ham'
>
>     ctx = copy_context()
>
>     # Any changes that 'function' makes to 'var' will stay
>     # isolated in the 'ctx'.
>     ctx.run(function)
>
>     assert var.get() == 'spam'
>
> Any changes to the context will be contained in the ``Context``
> object on which ``run()`` is called on.
>
> ``Context.run()`` is used to control in which context asyncio
> callbacks and Tasks are executed.  It can also be used to run some
> code in a different thread in the context of the current thread::
>
>     executor = ThreadPoolExecutor()
>     current_context = contextvars.copy_context()
>
>     executor.submit(
>         lambda: current_context.run(some_function))
>
> ``Context`` objects implement the ``collections.abc.Mapping`` ABC.
> This can be used to introspect context objects::
>
>     ctx = contextvars.copy_context()
>
>     # Print all context variables and their values in 'ctx':
>     print(ctx.items())
>
>     # Print the value of 'some_variable' in context 'ctx':
>     print(ctx[some_variable])
>
>
> asyncio
> -------
>
> ``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``,
> and ``Loop.call_at()`` to schedule the asynchronous execution of a
> function.  ``asyncio.Task`` uses ``call_soon()`` to run the
> wrapped coroutine.
>
> We modify ``Loop.call_{at,later,soon}`` and
> ``Future.add_done_callback()`` to accept the new optional *context*
> keyword-only argument, which defaults to the current context::
>
>     def call_soon(self, callback, *args, context=None):
>         if context is None:
>             context = contextvars.copy_context()
>
>         # ... some time later
>         context.run(callback, *args)
>
> Tasks in asyncio need to maintain their own context that they inherit
> from the point they were created at.  ``asyncio.Task`` is modified
> as follows::
>
>     class Task:
>         def __init__(self, coro):
>             ...
>             # Get the current context snapshot.
>             self._context = contextvars.copy_context()
>             self._loop.call_soon(self._step, context=self._context)
>
>         def _step(self, exc=None):
>             ...
>             # Every advance of the wrapped coroutine is done in
>             # the task's context.
>             self._loop.call_soon(self._step, context=self._context)
>             ...
>
>
> C API
> -----
>
> 1. ``PyContextVar * PyContextVar_New(char *name, PyObject *default)``:
>    create a ``ContextVar`` object.
>
> 2. ``int PyContextVar_Get(PyContextVar *, PyObject *default_value,
> PyObject **value)``:
>    return ``-1`` if an error occurs during the lookup, ``0`` otherwise.
>    If a value for the context variable is found, it will be set to the
>    ``value`` pointer.  Otherwise, ``value`` will be set to
>    ``default_value`` when it is not ``NULL``.  If ``default_value`` is
>    ``NULL``, ``value`` will be set to the default value of the
>    variable, which can be ``NULL`` too.  ``value`` is always a borrowed
>    reference.
>
> 3. ``PyContextToken * PyContextVar_Set(PyContextVar *, PyObject *)``:
>    set the value of the variable in the current context.
>
> 4. ``PyContextVar_Reset(PyContextVar *, PyContextToken *)``:
>    reset the value of the context variable.
>
> 5. ``PyContext * PyContext_New()``: create a new empty context.
>
> 6. ``PyContext * PyContext_Copy()``: get a copy of the current context.
>
> 7. ``int PyContext_Enter(PyContext *)`` and
>    ``int PyContext_Exit(PyContext *)`` allow to set and restore
>    the context for the current OS thread.  It is required to always
>    restore the previous context::
>
>       PyContext *old_ctx = PyContext_Copy();
>       if (old_ctx == NULL) goto error;
>
>       if (PyContext_Enter(new_ctx)) goto error;
>
>       // run some code
>
>       if (PyContext_Exit(old_ctx)) goto error;
>
>
> Implementation
> ==============
>
> This section explains high-level implementation details in
> pseudo-code.  Some optimizations are omitted to keep this section
> short and clear.
>
> For the purposes of this section, we implement an immutable dictionary
> using ``dict.copy()``::
>
>     class _ContextData:
>
>         def __init__(self):
>             self._mapping = dict()
>
>         def get(self, key):
>             return self._mapping[key]
>
>         def set(self, key, value):
>             copy = _ContextData()
>             copy._mapping = self._mapping.copy()
>             copy._mapping[key] = value
>             return copy
>
>         def delete(self, key):
>             copy = _ContextData()
>             copy._mapping = self._mapping.copy()
>             del copy._mapping[key]
>             return copy
>
> Every OS thread has a reference to the current ``_ContextData``.
> ``PyThreadState`` is updated with a new ``context_data`` field that
> points to a ``_ContextData`` object::
>
>     class PyThreadState:
>         context_data: _ContextData
>
> ``contextvars.copy_context()`` is implemented as follows::
>
>     def copy_context():
>         ts : PyThreadState = PyThreadState_Get()
>
>         if ts.context_data is None:
>             ts.context_data = _ContextData()
>
>         ctx = Context()
>         ctx._data = ts.context_data
>         return ctx
>
> ``contextvars.Context`` is a wrapper around ``_ContextData``::
>
>     class Context(collections.abc.Mapping):
>
>         def __init__(self):
>             self._data = _ContextData()
>
>         def run(self, callable, *args, **kwargs):
>             ts : PyThreadState = PyThreadState_Get()
>             saved_data : _ContextData = ts.context_data
>
>             try:
>                 ts.context_data = self._data
>                 return callable(*args, **kwargs)
>             finally:
>                 self._data = ts.context_data
>                 ts.context_data = saved_data
>
>         # Mapping API methods are implemented by delegating
>         # `get()` and other Mapping calls to `self._data`.
>
> ``contextvars.ContextVar`` interacts with
> ``PyThreadState.context_data`` directly::
>
>     class ContextVar:
>
>         def __init__(self, name, *, default=_NO_DEFAULT):
>             self._name = name
>             self._default = default
>
>         @property
>         def name(self):
>             return self._name
>
>         def get(self, default=_NO_DEFAULT):
>             ts : PyThreadState = PyThreadState_Get()
>             data : _ContextData = ts.context_data
>
>             try:
>                 return data.get(self)
>             except KeyError:
>                 pass
>
>             if default is not _NO_DEFAULT:
>                 return default
>
>             if self._default is not _NO_DEFAULT:
>                 return self._default
>
>             raise LookupError
>
>         def set(self, value):
>             ts : PyThreadState = PyThreadState_Get()
>             data : _ContextData = ts.context_data
>
>             try:
>                 old_value = data.get(self)
>             except KeyError:
>                 old_value = Token.MISSING
>
>             ts.context_data = data.set(self, value)
>             return Token(self, old_value)
>
>         def reset(self, token):
>             if token._used:
>                 return
>
>             if token._old_value is Token.MISSING:
>                 ts.context_data = data.delete(token._var)
>             else:
>                 ts.context_data = data.set(token._var,
>                                            token._old_value)
>
>             token._used = True
>
>
>     class Token:
>
>         MISSING = object()
>
>         def __init__(self, var, old_value):
>             self._var = var
>             self._old_value = old_value
>             self._used = False
>
>         @property
>         def var(self):
>             return self._var
>
>         @property
>         def old_value(self):
>             return self._old_value
>
>
> Implementation Notes
> ====================
>
> * The internal immutable dictionary for ``Context`` is implemented
>   using Hash Array Mapped Tries (HAMT).  They allow for O(log N)
>   ``set`` operation, and for O(1) ``copy_context()`` function, where
>   *N* is the number of items in the dictionary.  For a detailed
>   analysis of HAMT performance please refer to :pep:`550` [1]_.
>
> * ``ContextVar.get()`` has an internal cache for the most recent
>   value, which allows to bypass a hash lookup.  This is similar
>   to the optimization the ``decimal`` module implements to
>   retrieve its context from ``PyThreadState_GetDict()``.
>   See :pep:`550` which explains the implementation of the cache
>   in a great detail.
>
>
> Summary of the New APIs
> =======================
>
> * A new ``contextvars`` module with ``ContextVar``, ``Context``,
>   and ``Token`` classes, and a ``copy_context()`` function.
>
> * ``asyncio.Loop.call_at()``, ``asyncio.Loop.call_later()``,
>   ``asyncio.Loop.call_soon()``, and
>   ``asyncio.Future.add_done_callback()`` run callback functions in
>   the context they were called in.  A new *context* keyword-only
>   parameter can be used to specify a custom context.
>
> * ``asyncio.Task`` is modified internally to maintain its own
>   context.
>
>
> Design Considerations
> =====================
>
> Why contextvars.Token and not ContextVar.unset()?
> -------------------------------------------------
>
> The Token API allows to get around having a ``ContextVar.unset()``
> method, which is incompatible with chained contexts design of
> :pep:`550`.  Future compatibility with :pep:`550` is desired
> (at least for Python 3.7) in case there is demand to support
> context variables in generators and asynchronous generators.
>
> The Token API also offers better usability: the user does not have
> to special-case absence of a value. Compare::
>
>     token = cv.get()
>     try:
>         cv.set(blah)
>         # code
>     finally:
>         cv.reset(token)
>
> with::
>
>     _deleted = object()
>     old = cv.get(default=_deleted)
>     try:
>         cv.set(blah)
>         # code
>     finally:
>         if old is _deleted:
>             cv.unset()
>         else:
>             cv.set(old)
>
>
> Rejected Ideas
> ==============
>
> Replication of threading.local() interface
> ------------------------------------------
>
> Please refer to :pep:`550` where this topic is covered in detail: [2]_.
>
>
> Backwards Compatibility
> =======================
>
> This proposal preserves 100% backwards compatibility.
>
> Libraries that use ``threading.local()`` to store context-related
> values, currently work correctly only for synchronous code.  Switching
> them to use the proposed API will keep their behavior for synchronous
> code unmodified, but will automatically enable support for
> asynchronous code.
>
>
> Reference Implementation
> ========================
>
> The reference implementation can be found here: [3]_.
>
>
> References
> ==========
>
> .. [1] https://www.python.org/dev/peps/pep-0550/#appendix-hamt-
> performance-analysis
>
> .. [2] https://www.python.org/dev/peps/pep-0550/#replication-of-
> threading-local-interface
>
> .. [3] https://github.com/python/cpython/pull/5027
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> chris.jerdonek%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171228/1ce5f193/attachment.html>


More information about the Python-Dev mailing list