<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Dec 12, 2017 at 12:34 PM Yury Selivanov <<a href="mailto:yselivanov.ml@gmail.com">yselivanov.ml@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
This is a new proposal to implement context storage in Python.<br>
<br>
It's a successor of PEP 550 and builds on some of its API ideas and<br>
datastructures. Contrary to PEP 550 though, this proposal only focuses<br>
on adding new APIs and implementing support for it in asyncio. There<br>
are no changes to the interpreter or to the behaviour of generator or<br>
coroutine objects.<br></blockquote><div><br></div><div>I like this proposal. Tornado has a more general implementation of a similar idea (<a href="https://github.com/tornadoweb/tornado/blob/branch4.5/tornado/stack_context.py">https://github.com/tornadoweb/tornado/blob/branch4.5/tornado/stack_context.py</a>), but it also tried to solve the problem of exception handling of callback-based code so it had a significant performance cost (to interpose try/except blocks all over the place). Limiting the interface to coroutine-local variables should keep the performance impact minimal.</div><div><br></div><div>If the contextvars package were published on pypi (and backported to older pythons), I'd deprecate Tornado's stack_context and use it instead (even if there's not an official backport, I'll probably move towards whatever interface is defined in this PEP if it is accepted).</div><div><br></div><div>One caveat based on Tornado's experience with stack_context: There are times when the automatic propagation of contexts won't do the right thing (for example, a database client with a connection pool may end up hanging on to the context from the request that created the connection instead of picking up a new context for each query). Compatibility with this feature will require testing and possible fixes with many libraries in the asyncio ecosystem before it can be relied upon. </div><div><br></div><div>-Ben</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<br>
PEP: 567<br>
Title: Context Variables<br>
Version: $Revision$<br>
Last-Modified: $Date$<br>
Author: Yury Selivanov <<a href="mailto:yury@magic.io" target="_blank">yury@magic.io</a>><br>
Status: Draft<br>
Type: Standards Track<br>
Content-Type: text/x-rst<br>
Created: 12-Dec-2017<br>
Python-Version: 3.7<br>
Post-History: 12-Dec-2017<br>
<br>
<br>
Abstract<br>
========<br>
<br>
This PEP proposes the new ``contextvars`` module and a set of new<br>
CPython C APIs to support context variables. This concept is<br>
similar to thread-local variables but, unlike TLS, it allows<br>
correctly keeping track of values per asynchronous task, e.g.<br>
``asyncio.Task``.<br>
<br>
This proposal builds directly upon concepts originally introduced<br>
in :pep:`550`. The key difference is that this PEP is only concerned<br>
with solving the case for asynchronous tasks, and not generators.<br>
There are no proposed modifications to any built-in types or to the<br>
interpreter.<br>
<br>
<br>
Rationale<br>
=========<br>
<br>
Thread-local variables are insufficient for asynchronous tasks which<br>
execute concurrently in the same OS thread. Any context manager that<br>
needs to save and restore a context value and uses<br>
``threading.local()``, will have its context values bleed to other<br>
code unexpectedly when used in async/await code.<br>
<br>
A few examples where having a working context local storage for<br>
asynchronous code is desired:<br>
<br>
* Context managers like decimal contexts and ``numpy.errstate``.<br>
<br>
* Request-related data, such as security tokens and request<br>
 data in web applications, language context for ``gettext`` etc.<br>
<br>
* Profiling, tracing, and logging in large code bases.<br>
<br>
<br>
Introduction<br>
============<br>
<br>
The PEP proposes a new mechanism for managing context variables.<br>
The key classes involved in this mechanism are ``contextvars.Context``<br>
and ``contextvars.ContextVar``. The PEP also proposes some policies<br>
for using the mechanism around asynchronous tasks.<br>
<br>
The proposed mechanism for accessing context variables uses the<br>
``ContextVar`` class. A module (such as decimal) that wishes to<br>
store a context variable should:<br>
<br>
* declare a module-global variable holding a ``ContextVar`` to<br>
 serve as a "key";<br>
<br>
* access the current value via the ``get()`` method on the<br>
 key variable;<br>
<br>
* modify the current value via the ``set()`` method on the<br>
 key variable.<br>
<br>
The notion of "current value" deserves special consideration:<br>
different asynchronous tasks that exist and execute concurrently<br>
may have different values. This idea is well-known from thread-local<br>
storage but in this case the locality of the value is not always<br>
necessarily to a thread. Instead, there is the notion of the<br>
"current ``Context``" which is stored in thread-local storage, and<br>
is accessed via ``contextvars.get_context()`` function.<br>
Manipulation of the current ``Context`` is the responsibility of the<br>
task framework, e.g. asyncio.<br>
<br>
A ``Context`` is conceptually a mapping, implemented using an<br>
immutable dictionary. The ``ContextVar.get()`` method does a<br>
lookup in the current ``Context`` with ``self`` as a key, raising a<br>
``LookupError``Â or returning a default value specified in<br>
the constructor.<br>
<br>
The ``ContextVar.set(value)`` method clones the current ``Context``,<br>
assigns the ``value`` to it with ``self`` as a key, and sets the<br>
new ``Context`` as a new current. Because ``Context`` uses an<br>
immutable dictionary, cloning it is O(1).<br>
<br>
<br>
Specification<br>
=============<br>
<br>
A new standard library module ``contextvars`` is added with the<br>
following APIs:<br>
<br>
1. ``get_context() -> Context`` function is used to get the current<br>
  ``Context`` object for the current OS thread.<br>
<br>
2. ``ContextVar`` class to declare and access context variables.<br>
<br>
3. ``Context`` class encapsulates context state. Every OS thread<br>
  stores a reference to its current ``Context`` instance.<br>
  It is not possible to control that reference manually.<br>
  Instead, the ``Context.run(callable, *args)`` method is used to run<br>
  Python code in another context.<br>
<br>
<br>
contextvars.ContextVar<br>
----------------------<br>
<br>
The ``ContextVar`` class has the following constructor signature:<br>
``ContextVar(name, *, default=no_default)``. The ``name`` parameter<br>
is used only for introspection and debug purposes. The ``default``<br>
parameter is optional. Example::<br>
<br>
  # Declare a context variable 'var' with the default value 42.<br>
  var = ContextVar('var', default=42)<br>
<br>
``ContextVar.get()`` returns a value for context variable from the<br>
current ``Context``::<br>
<br>
  # Get the value of `var`.<br>
  var.get()<br>
<br>
``ContextVar.set(value) -> Token`` is used to set a new value for<br>
the context variable in the current ``Context``::<br>
<br>
  # Set the variable 'var' to 1 in the current context.<br>
  var.set(1)<br>
<br>
``contextvars.Token`` is an opaque object that should be used to<br>
restore the ``ContextVar`` to its previous value, or remove it from<br>
the context if it was not set before. The ``ContextVar.reset(Token)``<br>
is used for that::<br>
<br>
  old = var.set(1)<br>
  try:<br>
    ...<br>
  finally:<br>
    var.reset(old)<br>
<br>
The ``Token`` API exists to make the current proposal forward<br>
compatible with :pep:`550`, in case there is demand to support<br>
context variables in generators and asynchronous generators in the<br>
future.<br>
<br>
``ContextVar`` design allows for a fast implementation of<br>
``ContextVar.get()``, which is particularly important for modules<br>
like ``decimal`` an ``numpy``.<br>
<br>
<br>
contextvars.Context<br>
-------------------<br>
<br>
``Context`` objects are mappings of ``ContextVar`` to values.<br>
<br>
To get the current ``Context`` for the current OS thread, use<br>
``contextvars.get_context()`` method::<br>
<br>
  ctx = contextvars.get_context()<br>
<br>
To run Python code in some ``Context``, use ``Context.run()``<br>
method::<br>
<br>
  ctx.run(function)<br>
<br>
Any changes to any context variables that ``function`` causes, will<br>
be contained in the ``ctx`` context::<br>
<br>
  var = ContextVar('var')<br>
  var.set('spam')<br>
<br>
  def function():<br>
    assert var.get() == 'spam'<br>
<br>
    var.set('ham')<br>
    assert var.get() == 'ham'<br>
<br>
  ctx = get_context()<br>
  ctx.run(function)<br>
<br>
  assert var.get('spam')<br>
<br>
Any changes to the context will be contained and persisted in the<br>
``Context`` object on which ``run()`` is called on.<br>
<br>
``Context`` objects implement the ``collections.abc.Mapping`` ABC.<br>
This can be used to introspect context objects::<br>
<br>
  ctx = contextvars.get_context()<br>
<br>
  # Print all context variables in their values in 'ctx':<br>
  print(ctx.items())<br>
<br>
  # Print the value of 'some_variable' in context 'ctx':<br>
  print(ctx[some_variable])<br>
<br>
<br>
asyncio<br>
-------<br>
<br>
``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``,<br>
and ``Loop.call_at()`` to schedule the asynchronous execution of a<br>
function. ``asyncio.Task`` uses ``call_soon()`` to run the<br>
wrapped coroutine.<br>
<br>
We modify ``Loop.call_{at,later,soon}`` to accept the new<br>
optional *context* keyword-only argument, which defaults to<br>
the current context::<br>
<br>
  def call_soon(self, callback, *args, context=None):<br>
    if context is None:<br>
      context = contextvars.get_context()<br>
<br>
    # ... some time later<br>
    context.run(callback, *args)<br>
<br>
Tasks in asyncio need to maintain their own isolated context.<br>
``asyncio.Task`` is modified as follows::<br>
<br>
  class Task:<br>
    def __init__(self, coro):<br>
      ...<br>
      # Get the current context snapshot.<br>
      self._context = contextvars.get_context()<br>
      self._loop.call_soon(self._step, context=self._context)<br>
<br>
    def _step(self, exc=None):<br>
      ...<br>
      # Every advance of the wrapped coroutine is done in<br>
      # the task's context.<br>
      self._loop.call_soon(self._step, context=self._context)<br>
      ...<br>
<br>
<br>
CPython C API<br>
-------------<br>
<br>
TBD<br>
<br>
<br>
Implementation<br>
==============<br>
<br>
This section explains high-level implementation details in<br>
pseudo-code. Some optimizations are omitted to keep this section<br>
short and clear.<br>
<br>
The internal immutable dictionary for ``Context`` is implemented<br>
using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set``<br>
operation, and for O(1) ``get_context()`` function. For the purposes<br>
of this section, we implement an immutable dictionary using<br>
``dict.copy()``::<br>
<br>
  class _ContextData:<br>
<br>
    def __init__(self):<br>
      self.__mapping = dict()<br>
<br>
    def get(self, key):<br>
      return self.__mapping[key]<br>
<br>
    def set(self, key, value):<br>
      copy = _ContextData()<br>
      copy.__mapping = self.__mapping.copy()<br>
      copy.__mapping[key] = value<br>
      return copy<br>
<br>
    def delete(self, key):<br>
      copy = _ContextData()<br>
      copy.__mapping = self.__mapping.copy()<br>
      del copy.__mapping[key]<br>
      return copy<br>
<br>
Every OS thread has a reference to the current ``_ContextData``.<br>
``PyThreadState`` is updated with a new ``context_data`` field that<br>
points to a ``_ContextData`` object::<br>
<br>
  PyThreadState:<br>
    context : _ContextData<br>
<br>
``contextvars.get_context()`` is implemented as follows:<br>
<br>
  def get_context():<br>
    ts : PyThreadState = PyThreadState_Get()<br>
<br>
    if ts.context_data is None:<br>
      ts.context_data = _ContextData()<br>
<br>
    ctx = Context()<br>
    ctx.__data = ts.context_data<br>
    return ctx<br>
<br>
``contextvars.Context`` is a wrapper around ``_ContextData``::<br>
<br>
  class Context(collections.abc.Mapping):<br>
<br>
    def __init__(self):<br>
      self.__data = _ContextData()<br>
<br>
    def run(self, callable, *args):<br>
      ts : PyThreadState = PyThreadState_Get()<br>
      saved_data : _ContextData = ts.context_data<br>
<br>
      try:<br>
        ts.context_data = self.__data<br>
        callable(*args)<br>
      finally:<br>
        self.__data = ts.context_data<br>
        ts.context_data = saved_data<br>
<br>
    # Mapping API methods are implemented by delegating<br>
    # `get()` and other Mapping calls to `self.__data`.<br>
<br>
``contextvars.ContextVar`` interacts with<br>
``PyThreadState.context_data`` directly::<br>
<br>
  class ContextVar:<br>
<br>
    def __init__(self, name, *, default=NO_DEFAULT):<br>
      self.__name = name<br>
      self.__default = default<br>
<br>
    @property<br>
    def name(self):<br>
      return self.__name<br>
<br>
    def get(self, default=NO_DEFAULT):<br>
      ts : PyThreadState = PyThreadState_Get()<br>
      data : _ContextData = ts.context_data<br>
<br>
      try:<br>
        return data.get(self)<br>
      except KeyError:<br>
        pass<br>
<br>
      if default is not NO_DEFAULT:<br>
        return default<br>
<br>
      if self.__default is not NO_DEFAULT:<br>
        return self.__default<br>
<br>
      raise LookupError<br>
<br>
    def set(self, value):<br>
      ts : PyThreadState = PyThreadState_Get()<br>
      data : _ContextData = ts.context_data<br>
<br>
      try:<br>
        old_value = data.get(self)<br>
      except KeyError:<br>
        old_value = NO_VALUE<br>
<br>
      ts.context_data = data.set(self, value)<br>
      return Token(self, old_value)<br>
<br>
    def reset(self, token):<br>
      if token.__used:<br>
        return<br>
<br>
      if token.__old_value is NO_VALUE:<br>
        ts.context_data = data.delete(token.__var)<br>
      else:<br>
        ts.context_data = data.set(token.__var,<br>
                      token.__old_value)<br>
<br>
      token.__used = True<br>
<br>
<br>
  class Token:<br>
<br>
    def __init__(self, var, old_value):<br>
      self.__var = var<br>
      self.__old_value = old_value<br>
      self.__used = False<br>
<br>
<br>
Backwards Compatibility<br>
=======================<br>
<br>
This proposal preserves 100% backwards compatibility.<br>
<br>
Libraries that use ``threading.local()`` to store context-related<br>
values, currently work correctly only for synchronous code. Switching<br>
them to use the proposed API will keep their behavior for synchronous<br>
code unmodified, but will automatically enable support for<br>
asynchronous code.<br>
<br>
<br>
Appendix: HAMT Performance Analysis<br>
===================================<br>
<br>
.. figure:: pep-0550-hamt_vs_dict-v2.png<br>
  :align: center<br>
  :width: 100%<br>
<br>
  Figure 1. Benchmark code can be found here: [1]_.<br>
<br>
The above chart demonstrates that:<br>
<br>
* HAMT displays near O(1) performance for all benchmarked<br>
 dictionary sizes.<br>
<br>
* ``dict.copy()`` becomes very slow around 100 items.<br>
<br>
.. figure:: pep-0550-lookup_hamt.png<br>
  :align: center<br>
  :width: 100%<br>
<br>
  Figure 2. Benchmark code can be found here: [2]_.<br>
<br>
Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based<br>
immutable mapping. HAMT lookup time is 30-40% slower than Python dict<br>
lookups on average, which is a very good result, considering that the<br>
latter is very well optimized.<br>
<br>
The reference implementation of HAMT for CPython can be found here:<br>
[3]_.<br>
<br>
<br>
References<br>
==========<br>
<br>
.. [1] <a href="https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd" rel="noreferrer" target="_blank">https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd</a><br>
<br>
.. [2] <a href="https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e" rel="noreferrer" target="_blank">https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e</a><br>
<br>
.. [3] <a href="https://github.com/1st1/cpython/tree/hamt" rel="noreferrer" target="_blank">https://github.com/1st1/cpython/tree/hamt</a><br>
<br>
<br>
Copyright<br>
=========<br>
<br>
This document has been placed in the public domain.<br>
<br>
<br>
..<br>
  Local Variables:<br>
  mode: indented-text<br>
  indent-tabs-mode: nil<br>
  sentence-end-double-space: t<br>
  fill-column: 70<br>
  coding: utf-8<br>
  End:<br>
_______________________________________________<br>
Python-Dev mailing list<br>
<a href="mailto:Python-Dev@python.org" target="_blank">Python-Dev@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br>
Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/ben%40bendarnell.com" rel="noreferrer" target="_blank">https://mail.python.org/mailman/options/python-dev/ben%40bendarnell.com</a><br>
</blockquote></div></div>