<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2017-08-16 1:55 GMT+02:00 Yury Selivanov <span dir="ltr"><<a href="mailto:yselivanov.ml@gmail.com" target="_blank">yselivanov.ml@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

Here's the PEP 550 version 2.  Thanks to a very active and insightful<br>

discussion here on Python-ideas, we've discovered a number of<br>

problems with the first version of the PEP.  This version is a complete<br>

rewrite (only Abstract, Rationale, and Goals sections were not updated).<br>

<br>

The updated PEP is live on <a href="http://python.org" rel="noreferrer" target="_blank">python.org</a>:<br>

<a href="https://www.python.org/dev/peps/pep-0550/" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0550/</a><br>

<br>

There is no reference implementation at this point, but I'm confident<br>

that this version of the spec will have the same extremely low<br>

runtime overhead as the first version.  Thanks to the new ContextItem<br>

design, accessing values in the context is even faster now.<br>

<br>

Thank you!<br>

<br>

<br>

PEP: 550<br>

Title: Execution Context<br>

Version: $Revision$<br>

Last-Modified: $Date$<br>

Author: Yury Selivanov <<a href="mailto:yury@magic.io">yury@magic.io</a>><br>

Status: Draft<br>

Type: Standards Track<br>

Content-Type: text/x-rst<br>

Created: 11-Aug-2017<br>

Python-Version: 3.7<br>

Post-History: 11-Aug-2017, 15-Aug-2017<br>

<br>

<br>

Abstract<br>

========<br>

<br>

This PEP proposes a new mechanism to manage execution state--the<br>

logical environment in which a function, a thread, a generator,<br>

or a coroutine executes in.<br>

<br>

A few examples of where having a reliable state storage is required:<br>

<br>

* Context managers like decimal contexts, ``numpy.errstate``,<br>

  and ``warnings.catch_warnings``;<br>

<br>

* Storing request-related data such as security tokens and request<br>

  data in web applications, implementing i18n;<br>

<br>

* Profiling, tracing, and logging in complex and large code bases.<br>

<br>

The usual solution for storing state is to use a Thread-local Storage<br>

(TLS), implemented in the standard library as ``threading.local()``.<br>

Unfortunately, TLS does not work for the purpose of state isolation<br>

for generators or asynchronous code, because such code executes<br>

concurrently in a single thread.<br>

<br>

<br>

Rationale<br>

=========<br>

<br>

Traditionally, a Thread-local Storage (TLS) is used for storing the<br>

state.  However, the major flaw of using the TLS is that it works only<br>

for multi-threaded code.  It is not possible to reliably contain the<br>

state within a generator or a coroutine.  For example, consider<br>

the following generator::<br>

<br>

    def calculate(precision, ...):<br>

        with decimal.localcontext() as ctx:<br>

            # Set the precision for decimal calculations<br>

            # inside this block<br>

            ctx.prec = precision<br>

<br>

            yield calculate_something()<br>

            yield calculate_something_else()<br>

<br>

Decimal context is using a TLS to store the state, and because TLS is<br>

not aware of generators, the state can leak.  If a user iterates over<br>

the ``calculate()`` generator with different precisions one by one<br>

using a ``zip()`` built-in, the above code will not work correctly.<br>

For example::<br>

<br>

    g1 = calculate(precision=100)<br>

    g2 = calculate(precision=50)<br>

<br>

    items = list(zip(g1, g2))<br>

<br>

    # items[0] will be a tuple of:<br>

    #   first value from g1 calculated with 100 precision,<br>

    #   first value from g2 calculated with 50 precision.<br>

    #<br>

    # items[1] will be a tuple of:<br>

    #   second value from g1 calculated with 50 precision (!!!),<br>

    #   second value from g2 calculated with 50 precision.<br>

<br>

An even scarier example would be using decimals to represent money<br>

in an async/await application: decimal calculations can suddenly<br>

lose precision in the middle of processing a request.  Currently,<br>

bugs like this are extremely hard to find and fix.<br>

<br>

Another common need for web applications is to have access to the<br>

current request object, or security context, or, simply, the request<br>

URL for logging or submitting performance tracing data::<br>

<br>

    async def handle_http_request(request):<br>

        context.current_http_request = request<br>

<br>

        await ...<br>

        # Invoke your framework code, render templates,<br>

        # make DB queries, etc, and use the global<br>

        # 'current_http_request' in that code.<br>

<br>

        # This isn't currently possible to do reliably<br>

        # in asyncio out of the box.<br>

<br>

These examples are just a few out of many, where a reliable way to<br>

store context data is absolutely needed.<br>

<br>

The inability to use TLS for asynchronous code has lead to<br>

proliferation of ad-hoc solutions, which are limited in scope and<br>

do not support all required use cases.<br>

<br>

Current status quo is that any library, including the standard<br>

library, that uses a TLS, will likely not work as expected in<br>

asynchronous code or with generators (see [3]_ as an example issue.)<br>

<br>

Some languages that have coroutines or generators recommend to<br>

manually pass a ``context`` object to every function, see [1]_<br>

describing the pattern for Go.  This approach, however, has limited<br>

use for Python, where we have a huge ecosystem that was built to work<br>

with a TLS-like context.  Moreover, passing the context explicitly<br>

does not work at all for libraries like ``decimal`` or ``numpy``,<br>

which use operator overloading.<br>

<br>

.NET runtime, which has support for async/await, has a generic<br>

solution of this problem, called ``ExecutionContext`` (see [2]_).<br>

On the surface, working with it is very similar to working with a TLS,<br>

but the former explicitly supports asynchronous code.<br>

<br>

<br>

Goals<br>

=====<br>

<br>

The goal of this PEP is to provide a more reliable alternative to<br>

``threading.local()``.  It should be explicitly designed to work with<br>

Python execution model, equally supporting threads, generators, and<br>

coroutines.<br>

<br>

An acceptable solution for Python should meet the following<br>

requirements:<br>

<br>

* Transparent support for code executing in threads, coroutines,<br>

  and generators with an easy to use API.<br>

<br>

* Negligible impact on the performance of the existing code or the<br>

  code that will be using the new mechanism.<br>

<br>

* Fast C API for packages like ``decimal`` and ``numpy``.<br>

<br>

Explicit is still better than implicit, hence the new APIs should only<br>

be used when there is no acceptable way of passing the state<br>

explicitly.<br>

<br>

<br>

Specification<br>

=============<br>

<br>

Execution Context is a mechanism of storing and accessing data specific<br>

to a logical thread of execution.  We consider OS threads,<br>

generators, and chains of coroutines (such as ``asyncio.Task``)<br>

to be variants of a logical thread.<br>

<br>

In this specification, we will use the following terminology:<br>

<br>

* **Local Context**, or LC, is a key/value mapping that stores the<br>

  context of a logical thread.<br>

<br>

* **Execution Context**, or EC, is an OS-thread-specific dynamic<br>

  stack of Local Contexts.<br>

<br>

* **Context Item**, or CI, is an object used to set and get values<br>

  from the Execution Context.<br>

<br>

Please note that throughout the specification we use simple<br>

pseudo-code to illustrate how the EC machinery works.  The actual<br>

algorithms and data structures that we will use to implement the PEP<br>

are discussed in the `Implementation Strategy`_ section.<br>

<br>

<br>

Context Item Object<br>

-------------------<br>

<br>

The ``sys.new_context_item(<wbr>description)`` function creates a<br>

new ``ContextItem`` object.  The ``description`` parameter is a<br>

``str``, explaining the nature of the context key for introspection<br>

and debugging purposes.<br>

<br>

``ContextItem`` objects have the following methods and attributes:<br>

<br>

* ``.description``: read-only description;<br>

<br>

* ``.set(o)`` method: set the value to ``o`` for the context item<br>

  in the execution context.<br>

<br>

* ``.get()`` method: return the current EC value for the context item.<br>

  Context items are initialized with ``None`` when created, so<br>

  this method call never fails.<br>

<br>

The below is an example of how context items can be used::<br>

<br>

    my_context = sys.new_context_item(<wbr>description='mylib.context')<br>

    my_context.set('spam')<br></blockquote><div><br></div><div>Minor suggestion: Could we allow something like `sys.set_new_context_item(description='mylib.context', initial_value='spam')`? That would make it easier for type checkers to infer the type of a ContextItem, and it would save a line of code in the common case.</div><div><br></div><div>With this modification, the type of new_context_item would be</div><div><br></div><div>@overload</div><div>def new_context_item(*, description: str, initial_value: T) -> ContextItem[T]: ...</div><div>@overload</div><div>def new_context_item(*, description: str) -> ContextItem[Any]: ...</div><div><br></div><div>If we only allow the second variant, type checkers would need some sort of special casing to figure out that after .set(), .get() will return the same type. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

    # Later, to access the value of my_context:<br>

    print(my_context.get())<br>

<br>

<br>

Thread State and Multi-threaded code<br>

------------------------------<wbr>------<br>

<br>

Execution Context is implemented on top of Thread-local Storage.<br>

For every thread there is a separate stack of Local Contexts --<br>

mappings of ``ContextItem`` objects to their values in the LC.<br>

New threads always start with an empty EC.<br>

<br>

For CPython::<br>

<br>

    PyThreadState:<br>

        execution_context: ExecutionContext([<br>

            LocalContext({ci1: val1, ci2: val2, ...}),<br>

            ...<br>

        ])<br>

<br>

The ``ContextItem.get()`` and ``.set()`` methods are defined as<br>

follows (in pseudo-code)::<br>

<br>

    class ContextItem:<br>

<br>

        def get(self):<br>

            tstate = PyThreadState_Get()<br>

<br>

            for local_context in reversed(tstate.execution_<wbr>context):<br>

                if self in local_context:<br>

                    return local_context[self]<br>

<br>

        def set(self, value):<br>

            tstate = PyThreadState_Get()<br>

<br>

            if not tstate.execution_context:<br>

                tstate.execution_context = [LocalContext()]<br>

<br>

            tstate.execution_context[-1][<wbr>self] = value<br>

<br>

With the semantics defined so far, the Execution Context can already<br>

be used as an alternative to ``threading.local()``::<br>

<br>

    def print_foo():<br>

        print(ci.get() or 'nothing')<br>

<br>

    ci = sys.new_context_item(<wbr>description='test')<br>

    ci.set('foo')<br>

<br>

    # Will print "foo":<br>

    print_foo()<br>

<br>

    # Will print "nothing":<br>

    threading.Thread(target=print_<wbr>foo).start()<br>

<br>

<br>

Manual Context Management<br>

-------------------------<br>

<br>

Execution Context is generally managed by the Python interpreter,<br>

but sometimes it is desirable for the user to take the control<br>

over it.  A few examples when this is needed:<br>

<br>

* running a computation in ``concurrent.futures.<wbr>ThreadPoolExecutor``<br>

  with the current EC;<br>

<br>

* reimplementing generators with iterators (more on that later);<br>

<br>

* managing contexts in asynchronous frameworks (implement proper<br>

  EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)<br>

<br>

For these purposes we add a set of new APIs (they will be used in<br>

later sections of this specification):<br>

<br>

* ``sys.new_local_context()``: create an empty ``LocalContext``<br>

  object.<br>

<br>

* ``sys.new_execution_context()`<wbr>`: create an empty<br>

  ``ExecutionContext`` object.<br>

<br>

* Both ``LocalContext`` and ``ExecutionContext`` objects are opaque<br>

  to Python code, and there are no APIs to modify them.<br>

<br>

* ``sys.get_execution_context()`<wbr>` function.  The function returns a<br>

  copy of the current EC: an ``ExecutionContext`` instance.<br>

<br>

  The runtime complexity of the actual implementation of this function<br>

  can be O(1), but for the purposes of this section it is equivalent<br>

  to::<br>

<br>

    def get_execution_context():<br>

        tstate = PyThreadState_Get()<br>

        return copy(tstate.execution_context)<br>

<br>

* ``sys.run_with_execution_<wbr>context(ec: ExecutionContext, func, *args,<br>

  **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution<br>

  context::<br>

<br>

    def run_with_execution_context(ec, func, *args, **kwargs):<br>

        tstate = PyThreadState_Get()<br>

<br>

        old_ec = tstate.execution_context<br>

<br>

        tstate.execution_context = ExecutionContext(<br>

            ec.local_contexts + [LocalContext()]<br>

        )<br>

<br>

        try:<br>

            return func(*args, **kwargs)<br>

        finally:<br>

            tstate.execution_context = old_ec<br>

<br>

  Any changes to Local Context by ``func`` will be ignored.<br>

  This allows to reuse one ``ExecutionContext`` object for multiple<br>

  invocations of different functions, without them being able to<br>

  affect each other's environment::<br>

<br>

      ci = sys.new_context_item('example'<wbr>)<br>

      ci.set('spam')<br>

<br>

      def func():<br>

          print(ci.get())<br>

          ci.set('ham')<br>

<br>

      ec = sys.get_execution_context()<br>

<br>

      sys.run_with_execution_<wbr>context(ec, func)<br>

      sys.run_with_execution_<wbr>context(ec, func)<br>

<br>

      # Will print:<br>

      #   spam<br>

      #   spam<br>

<br>

* ``sys.run_with_local_context(<wbr>lc: LocalContext, func, *args,<br>

  **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution<br>

  context using the specified local context.<br>

<br>

  Any changes that ``func`` does to the local context will be<br>

  persisted in ``lc``.  This behaviour is different from the<br>

  ``run_with_execution_context()<wbr>`` function, which always creates<br>

  a new throw-away local context.<br>

<br>

  In pseudo-code::<br>

<br>

    def run_with_local_context(lc, func, *args, **kwargs):<br>

        tstate = PyThreadState_Get()<br>

<br>

        old_ec = tstate.execution_context<br>

<br>

        tstate.execution_context = ExecutionContext(<br>

            old_ec.local_contexts + [lc]<br>

        )<br>

<br>

        try:<br>

            return func(*args, **kwargs)<br>

        finally:<br>

            tstate.execution_context = old_ec<br>

<br>

  Using the previous example::<br>

<br>

      ci = sys.new_context_item('example'<wbr>)<br>

      ci.set('spam')<br>

<br>

      def func():<br>

          print(ci.get())<br>

          ci.set('ham')<br>

<br>

      ec = sys.get_execution_context()<br>

      lc = sys.new_local_context()<br>

<br>

      sys.run_with_local_context(lc, func)<br>

      sys.run_with_local_context(lc, func)<br>

<br>

      # Will print:<br>

      #   spam<br>

      #   ham<br>

<br>

As an example, let's make a subclass of<br>

``concurrent.futures.<wbr>ThreadPoolExecutor`` that preserves the execution<br>

context for scheduled functions::<br>

<br>

    class Executor(concurrent.futures.<wbr>ThreadPoolExecutor):<br>

<br>

        def submit(self, fn, *args, **kwargs):<br>

            context = sys.get_execution_context()<br>

<br>

            fn = functools.partial(<br>

                sys.run_with_execution_<wbr>context, context,<br>

                fn, *args, **kwargs)<br>

<br>

            return super().submit(fn)<br>

<br>

<br>

EC Semantics for Coroutines<br>

---------------------------<br>

<br>

Python :pep:`492` coroutines are used to implement cooperative<br>

multitasking.  For a Python end-user they are similar to threads,<br>

especially when it comes to sharing resources or modifying<br>

the global state.<br>

<br>

An event loop is needed to schedule coroutines.  Coroutines that<br>

are explicitly scheduled by the user are usually called Tasks.<br>

When a coroutine is scheduled, it can schedule other coroutines using<br>

an ``await`` expression.  In async/await world, awaiting a coroutine<br>

is equivalent to a regular function call in synchronous code.  Thus,<br>

Tasks are similar to threads.<br>

<br>

By drawing a parallel between regular multithreaded code and<br>

async/await, it becomes apparent that any modification of the<br>

execution context within one Task should be visible to all coroutines<br>

scheduled within it.  Any execution context modifications, however,<br>

must not be visible to other Tasks executing within the same OS<br>

thread.<br>

<br>

<br>

Coroutine Object Modifications<br>

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>

<br>

To achieve this, a small set of modifications to the coroutine object<br>

is needed:<br>

<br>

* New ``cr_local_context`` attribute.  This attribute is readable<br>

  and writable for Python code.<br>

<br>

* When a coroutine object is instantiated, its ``cr_local_context``<br>

  is initialized with an empty Local Context.<br>

<br>

* Coroutine's ``.send()`` and ``.throw()`` methods are modified as<br>

  follows (in pseudo-C)::<br>

<br>

    if coro.cr_local_context is not None:<br>

        tstate = PyThreadState_Get()<br>

<br>

        tstate.execution_context.push(<wbr>coro.cr_local_context)<br>

<br>

        try:<br>

            # Perform the actual `Coroutine.send()` or<br>

            # `Coroutine.throw()` call.<br>

            return coro.send(...)<br>

        finally:<br>

            coro.cr_local_context = tstate.execution_context.pop()<br>

    else:<br>

        # Perform the actual `Coroutine.send()` or<br>

        # `Coroutine.throw()` call.<br>

        return coro.send(...)<br>

<br>

* When Python interpreter sees an ``await`` instruction, it inspects<br>

  the ``cr_local_context`` attribute of the coroutine that is about<br>

  to be awaited.  For ``await coro``:<br>

<br>

  * If ``coro.cr_local_context`` is an empty ``LocalContext`` object<br>

    that ``coro`` was created with, the interpreter will set<br>

    ``coro.cr_local_context`` to ``None``.<br>

<br>

  * If ``coro.cr_local_context`` was modified by Python code, the<br>

    interpreter will leave it as is.<br>

<br>

  This makes any changes to execution context made by nested coroutine<br>

  calls within a Task to be visible throughout the Task::<br>

<br>

      ci = sys.new_context_item('example'<wbr>)<br>

<br>

      async def nested():<br>

          ci.set('nested')<br>

<br>

      asynd def main():<br>

          ci.set('main')<br>

          print('before:', ci.get())<br>

          await nested()<br>

          print('after:', ci.get())<br>

<br>

      # Will print:<br>

      #   before: main<br>

      #   after: nested<br>

<br>

  Essentially, coroutines work with Execution Context items similarly<br>

  to threads, and ``await`` expression acts like a function call.<br>

<br>

  This mechanism also works for ``yield from`` in generators decorated<br>

  with ``@types.coroutine`` or ``@asyncio.coroutine``, which are<br>

  called "generator-based coroutines" according to :pep:`492`,<br>

  and should be fully compatible with native async/await coroutines.<br>

<br>

<br>

Tasks<br>

^^^^^<br>

<br>

In asynchronous frameworks like asyncio, coroutines are run by<br>

an event loop, and need to be explicitly scheduled (in asyncio<br>

coroutines are run by ``asyncio.Task``.)<br>

<br>

With the currently defined semantics, the interpreter makes<br>

coroutines linked by an ``await`` expression share the same<br>

Local Context.<br>

<br>

The interpreter, however, is not aware of the Task concept, and<br>

cannot help with ensuring that new Tasks started in coroutines,<br>

use the correct EC::<br>

<br>

    current_request = sys.new_context_item(<wbr>description='request')<br>

<br>

    async def child():<br>

        print('current request:', repr(current_request.get()))<br>

<br>

    async def handle_request(request):<br>

        current_request.set(request)<br>

        event_loop.create_task(child)<br>

<br>

    run(top_coro())<br>

<br>

    # Will print:<br>

    #   current_request: None<br>

<br>

To enable correct Execution Context propagation into Tasks, the<br>

asynchronous framework needs to assist the interpreter:<br>

<br>

* When ``create_task`` is called, it should capture the current<br>

  execution context with ``sys.get_execution_context()`<wbr>` and save it<br>

  on the Task object.<br>

<br>

* When the Task object runs its coroutine object, it should execute<br>

  ``.send()`` and ``.throw()`` methods within the captured<br>

  execution context, using the ``sys.run_with_execution_<wbr>context()``<br>

  function.<br>

<br>

With help from the asynchronous framework, the above snippet will<br>

run correctly, and the ``child()`` coroutine will be able to access<br>

the current request object through the ``current_request``<br>

Context Item.<br>

<br>

<br>

Event Loop Callbacks<br>

^^^^^^^^^^^^^^^^^^^^<br>

<br>

Similarly to Tasks, functions like asyncio's ``loop.call_soon()``<br>

should capture the current execution context with<br>

``sys.get_execution_context()`<wbr>` and execute callbacks<br>

within it with ``sys.run_with_execution_<wbr>context()``.<br>

<br>

This way the following code will work::<br>

<br>

    current_request = sys.new_context_item(<wbr>description='request')<br>

<br>

    def log():<br>

        request = current_request.get()<br>

        print(request)<br>

<br>

    async def request_handler(request):<br>

        current_request.set(request)<br>

        get_event_loop.call_soon(log)<br>

<br>

<br>

Generators<br>

----------<br>

<br>

Generators in Python, while similar to Coroutines, are used in a<br>

fundamentally different way.  They are producers of data, and<br>

they use ``yield`` expression to suspend/resume their execution.<br>

<br>

A crucial difference between ``await coro`` and ``yield value`` is<br>

that the former expression guarantees that the ``coro`` will be<br>

executed fully, while the latter is producing ``value`` and<br>

suspending the generator until it gets iterated again.<br>

<br>

Generators, similarly to coroutines, have a ``gi_local_context``<br>

attribute, which is set to an empty Local Context when created.<br>

<br>

Contrary to coroutines though, ``yield from o`` expression in<br>

generators (that are not generator-based coroutines) is semantically<br>

equivalent to ``for v in o: yield v``, therefore the interpreter does<br>

not attempt to control their ``gi_local_context``.<br>

<br>

<br>

EC Semantics for Generators<br>

^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>

<br>

Every generator object has its own Local Context that stores<br>

only its own local modifications of the context.  When a generator<br>

is being iterated, its local context will be put in the EC stack<br>

of the current thread.  This means that the generator will be able<br>

to see access items from the surrounding context::<br>

<br>

    local = sys.new_context_item("local")<br>

    global = sys.new_context_item("global")<br>

<br>

    def generator():<br>

        local.set('inside gen:')<br>

        while True:<br>

            print(local.get(), global.get())<br>

            yield<br>

<br>

    g = gen()<br>

<br>

    local.set('hello')<br>

    global.set('spam')<br>

    next(g)<br>

<br>

    local.set('world')<br>

    global.set('ham')<br>

    next(g)<br>

<br>

    # Will print:<br>

    #   inside gen: spam<br>

    #   inside gen: ham<br>

<br>

Any changes to the EC in nested generators are invisible to the outer<br>

generator::<br>

<br>

    local = sys.new_context_item("local")<br>

<br>

    def inner_gen():<br>

        local.set('spam')<br>

        yield<br>

<br>

    def outer_gen():<br>

        local.set('ham')<br>

        yield from gen()<br>

        print(local.get())<br>

<br>

    list(outer_gen())<br>

<br>

    # Will print:<br>

    #   ham<br>

<br>

<br>

Running generators without LC<br>

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>

<br>

Similarly to coroutines, generators with ``gi_local_context``<br>

set to ``None`` simply use the outer Local Context.<br>

<br>

The ``@contextlib.contextmanager`` decorator uses this mechanism to<br>

allow its generator to affect the EC::<br>

<br>

    item = sys.new_context_item('test')<br>

<br>

    @contextmanager<br>

    def context(x):<br>

        old = item.get()<br>

        item.set('x')<br>

        try:<br>

            yield<br>

        finally:<br>

            item.set(old)<br>

<br>

    with context('spam'):<br>

<br>

        with context('ham'):<br>

            print(1, item.get())<br>

<br>

        print(2, item.get())<br>

<br>

    # Will print:<br>

    #   1 ham<br>

    #   2 spam<br>

<br>

<br>

Implementing Generators with Iterators<br>

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<wbr>^^^^^^^^<br>

<br>

The Execution Context API allows to fully replicate EC behaviour<br>

imposed on generators with a regular Python iterator class::<br>

<br>

    class Gen:<br>

<br>

        def __init__(self):<br>

            self.local_context = sys.new_local_context()<br>

<br>

        def __iter__(self):<br>

            return self<br>

<br>

        def __next__(self):<br>

            return sys.run_with_local_context(<br>

                self.local_context, self._next_impl)<br>

<br>

        def _next_impl(self):<br>

            # Actual __next__ implementation.<br>

            ...<br>

<br>

<br>

Asynchronous Generators<br>

-----------------------<br>

<br>

Asynchronous Generators (AG) interact with the Execution Context<br>

similarly to regular generators.<br>

<br>

They have an ``ag_local_context`` attribute, which, similarly to<br>

regular generators, can be set to ``None`` to make them use the outer<br>

Local Context.  This is used by the new<br>

``contextlib.<wbr>asynccontextmanager`` decorator.<br>

<br>

The EC support of ``await`` expression is implemented using the same<br>

approach as in coroutines, see the `Coroutine Object Modifications`_<br>

section.<br>

<br>

<br>

Greenlets<br>

---------<br>

<br>

Greenlet is an alternative implementation of cooperative<br>

scheduling for Python.  Although greenlet package is not part of<br>

CPython, popular frameworks like gevent rely on it, and it is<br>

important that greenlet can be modified to support execution<br>

contexts.<br>

<br>

In a nutshell, greenlet design is very similar to design of<br>

generators.  The main difference is that for generators, the stack<br>

is managed by the Python interpreter.  Greenlet works outside of the<br>

Python interpreter, and manually saves some ``PyThreadState``<br>

fields and pushes/pops the C-stack.  Thus the ``greenlet`` package<br>

can be easily updated to use the new low-level `C API`_ to enable<br>

full support of EC.<br>

<br>

<br>

New APIs<br>

========<br>

<br>

Python<br>

------<br>

<br>

Python APIs were designed to completely hide the internal<br>

implementation details, but at the same time provide enough control<br>

over EC and LC to re-implement all of Python built-in objects<br>

in pure Python.<br>

<br>

1. ``sys.new_context_item(<wbr>description='...')``: create a<br>

   ``ContextItem`` object used to access/set values in EC.<br>

<br>

2. ``ContextItem``:<br>

<br>

   * ``.description``: read-only attribute.<br>

   * ``.get()``: return the current value for the item.<br>

   * ``.set(o)``: set the current value in the EC for the item.<br>

<br>

3. ``sys.get_execution_context()`<wbr>`: return the current<br>

   ``ExecutionContext``.<br>

<br>

4. ``sys.new_execution_context()`<wbr>`: create a new empty<br>

   ``ExecutionContext``.<br>

<br>

5. ``sys.new_local_context()``: create a new empty ``LocalContext``.<br>

<br>

6. ``sys.run_with_execution_<wbr>context(ec: ExecutionContext,<br>

   func, *args, **kwargs)``.<br>

<br>

7. ``sys.run_with_local_context(<wbr>lc:LocalContext,<br>

   func, *args, **kwargs)``.<br>

<br>

<br>

C API<br>

-----<br>

<br>

1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a<br>

   ``PyContextItem`` object.<br>

<br>

2. ``PyObject * PyContext_GetItem(<wbr>PyContextItem *)``: get the<br>

   current value for the context item.<br>

<br>

3. ``int PyContext_SetItem(<wbr>PyContextItem *, PyObject *)``: set<br>

   the current value for the context item.<br>

<br>

4. ``PyLocalContext * PyLocalContext_New()``: create a new empty<br>

   ``PyLocalContext``.<br>

<br>

5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty<br>

   ``PyExecutionContext``.<br>

<br>

6. ``PyExecutionContext * PyExecutionContext_Get()``: get the<br>

   EC for the active thread state.<br>

<br>

7. ``int PyExecutionContext_Set(<wbr>PyExecutionContext *)``: set the<br>

   passed EC object as the current for the active thread state.<br>

<br>

8. ``int PyExecutionContext_<wbr>SetWithLocalContext(<wbr>PyExecutionContext *,<br>

   PyLocalContext *)``: allows to implement<br>

   ``sys.run_with_local_context`` Python API.<br>

<br>

<br>

Implementation Strategy<br>

=======================<br>

<br>

LocalContext is a Weak Key Mapping<br>

------------------------------<wbr>----<br>

<br>

Using a weak key mapping for ``LocalContext`` implementation<br>

enables the following properties with regards to garbage<br>

collection:<br>

<br>

* ``ContextItem`` objects are strongly-referenced only from the<br>

  application code, not from any of the Execution Context<br>

  machinery or values they point to.  This means that there<br>

  are no reference cycles that could extend their lifespan<br>

  longer than necessary, or prevent their garbage collection.<br>

<br>

* Values put in the Execution Context are guaranteed to be kept<br>

  alive while there is a ``ContextItem`` key referencing them in<br>

  the thread.<br>

<br>

* If a ``ContextItem`` is garbage collected, all of its values will<br>

  be removed from all contexts, allowing them to be GCed if needed.<br>

<br>

* If a thread has ended its execution, its thread state will be<br>

  cleaned up along with its ``ExecutionContext``, cleaning<br>

  up all values bound to all Context Items in the thread.<br>

<br>

<br>

ContextItem.get() Cache<br>

-----------------------<br>

<br>

We can add three new fields to ``PyThreadState`` and<br>

``PyInterpreterState`` structs:<br>

<br>

* ``uint64_t PyThreadState->unique_id``: a globally unique<br>

  thread state identifier (we can add a counter to<br>

  ``PyInterpreterState`` and increment it when a new thread state is<br>

  created.)<br>

<br>

* ``uint64_t PyInterpreterState->context_<wbr>item_deallocs``: every time<br>

  a ``ContextItem`` is GCed, all Execution Contexts in all threads<br>

  will lose track of it.  ``context_item_deallocs`` will simply<br>

  count all ``ContextItem`` deallocations.<br>

<br>

* ``uint64_t PyThreadState->execution_<wbr>context_ver``: every time<br>

  a new item is set, or an existing item is updated, or the stack<br>

  of execution contexts is changed in the thread, we increment this<br>

  counter.<br>

<br>

The above two fields allow implementing a fast cache path in<br>

``ContextItem.get()``, in pseudo-code::<br>

<br>

    class ContextItem:<br>

<br>

        def get(self):<br>

            tstate = PyThreadState_Get()<br>

<br>

            if (self.last_tstate_id == tstate.unique_id and<br>

                self.last_ver == tstate.execution_context_ver<br>

                self.last_deallocs ==<br>

                    tstate.iterp.context_item_<wbr>deallocs):<br>

                return self.last_value<br>

<br>

            value = None<br>

            for mapping in reversed(tstate.execution_<wbr>context):<br>

                if self in mapping:<br>

                    value = mapping[self]<br>

                    break<br>

<br>

            self.last_value = value<br>

            self.last_tstate_id = tstate.unique_id<br>

            self.last_ver = tstate.execution_context_ver<br>

            self.last_deallocs = tstate.interp.context_item_<wbr>deallocs<br>

<br>

            return value<br>

<br>

This is similar to the trick that decimal C implementation uses<br>

for caching the current decimal context, and will have the same<br>

performance characteristics, but available to all<br>

Execution Context users.<br>

<br>

<br>

Approach #1: Use a dict for LocalContext<br>

------------------------------<wbr>----------<br>

<br>

The straightforward way of implementing the proposed EC<br>

mechanisms is to create a ``WeakKeyDict`` on top of Python<br>

``dict`` type.<br>

<br>

To implement the ``ExecutionContext`` type we can use Python<br>

``list`` (or a custom stack implementation with some<br>

pre-allocation optimizations).<br>

<br>

This approach will have the following runtime complexity:<br>

<br>

* O(M) for ``ContextItem.get()``, where ``M`` is the number of<br>

  Local Contexts in the stack.<br>

<br>

  It is important to note that ``ContextItem.get()`` will implement<br>

  a cache making the operation O(1) for packages like ``decimal``<br>

  and ``numpy``.<br>

<br>

* O(1) for ``ContextItem.set()``.<br>

<br>

* O(N) for ``sys.get_execution_context()`<wbr>`, where ``N`` is the<br>

  total number of items in the current **execution** context.<br>

<br>

<br>

Approach #2: Use HAMT for LocalContext<br>

------------------------------<wbr>--------<br>

<br>

Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)<br>

to implement high performance immutable collections [5]_, [6]_.<br>

<br>

Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)<br>

performance for both ``set()``, ``get()``, and ``merge()`` operations,<br>

which is essentially O(1) for relatively small mappings<br>

(read about HAMT performance in CPython in the<br>

`Appendix: HAMT Performance`_ section.)<br>

<br>

In this approach we use the same design of the ``ExecutionContext``<br>

as in Approach #1, but we will use HAMT backed weak key Local Context<br>

implementation.  With that we will have the following runtime<br>

complexity:<br>

<br>

* O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``,<br>

  where ``M`` is the number of Local Contexts in the stack,<br>

  and ``N`` is the number of items in the EC.  The operation will<br>

  essentially be O(M), because execution contexts are normally not<br>

  expected to have more than a few dozen of items.<br>

<br>

  (``ContextItem.get()`` will have the same caching mechanism as in<br>

  Approach #1.)<br>

<br>

* O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the<br>

  number of items in the current **local** context.  This will<br>

  essentially be an O(1) operation most of the time.<br>

<br>

* O(log\ :sub:`32`\ N) for ``sys.get_execution_context()`<wbr>`, where<br>

  ``N`` is the total number of items in the current **execution**<br>

  context.<br>

<br>

Essentially, using HAMT for Local Contexts instead of Python dicts,<br>

allows to bring down the complexity of ``sys.get_execution_context()`<wbr>`<br>

from O(N) to O(log\ :sub:`32`\ N) because of the more efficient<br>

merge algorithm.<br>

<br>

<br>

Approach #3: Use HAMT and Immutable Linked List<br>

------------------------------<wbr>-----------------<br>

<br>

We can make an alternative ``ExecutionContext`` design by using<br>

a linked list.  Each ``LocalContext`` in the ``ExecutionContext``<br>

object will be wrapped in a linked-list node.<br>

<br>

``LocalContext`` objects will use an HAMT backed weak key<br>

implementation described in the Approach #2.<br>

<br>

Every modification to the current ``LocalContext`` will produce a<br>

new version of it, which will be wrapped in a **new linked list<br>

node**.  Essentially this means, that ``ExecutionContext`` is an<br>

immutable forest of ``LocalContext`` objects, and can be safely<br>

copied by reference in ``sys.get_execution_context()`<wbr>` (eliminating<br>

the expensive "merge" operation.)<br>

<br>

With this approach, ``sys.get_execution_context()`<wbr>` will be an<br>

**O(1) operation**.<br>

<br>

<br>

Summary<br>

-------<br>

<br>

We believe that approach #3 enables an efficient and complete<br>

Execution Context implementation, with excellent runtime performance.<br>

<br>

`ContextItem.get() Cache`_ enables fast retrieval of context items<br>

for performance critical libraries like decimal and numpy.<br>

<br>

Fast ``sys.get_execution_context()`<wbr>` enables efficient management<br>

of execution contexts in asynchronous libraries like asyncio.<br>

<br>

<br>

Design Considerations<br>

=====================<br>

<br>

Can we fix ``PyThreadState_GetDict()``?<br>

------------------------------<wbr>---------<br>

<br>

``PyThreadState_GetDict`` is a TLS, and some of its existing users<br>

might depend on it being just a TLS.  Changing its behaviour to follow<br>

the Execution Context semantics would break backwards compatibility.<br>

<br>

<br>

PEP 521<br>

-------<br>

<br>

:pep:`521` proposes an alternative solution to the problem:<br>

enhance Context Manager Protocol with two new methods: ``__suspend__``<br>

and ``__resume__``.  To make it compatible with async/await,<br>

the Asynchronous Context Manager Protocol will also need to be<br>

extended with ``__asuspend__`` and ``__aresume__``.<br>

<br>

This allows to implement context managers like decimal context and<br>

``numpy.errstate`` for generators and coroutines.<br>

<br>

The following code::<br>

<br>

    class Context:<br>

<br>

        def __enter__(self):<br>

            self.old_x = get_execution_context_item('x'<wbr>)<br>

            set_execution_context_item('x'<wbr>, 'something')<br>

<br>

        def __exit__(self, *err):<br>

            set_execution_context_item('x'<wbr>, self.old_x)<br>

<br>

would become this::<br>

<br>

    local = threading.local()<br>

<br>

    class Context:<br>

<br>

        def __enter__(self):<br>

            self.old_x = getattr(local, 'x', None)<br>

            local.x = 'something'<br>

<br>

        def __suspend__(self):<br>

            local.x = self.old_x<br>

<br>

        def __resume__(self):<br>

            local.x = 'something'<br>

<br>

        def __exit__(self, *err):<br>

            local.x = self.old_x<br>

<br>

Besides complicating the protocol, the implementation will likely<br>

negatively impact performance of coroutines, generators, and any code<br>

that uses context managers, and will notably complicate the<br>

interpreter implementation.<br>

<br>

:pep:`521` also does not provide any mechanism to propagate state<br>

in a local context, like storing a request object in an HTTP request<br>

handler to have better logging.  Nor does it solve the leaking state<br>

problem for greenlet/gevent.<br>

<br>

<br>

Can Execution Context be implemented outside of CPython?<br>

------------------------------<wbr>--------------------------<br>

<br>

Because async/await code needs an event loop to run it, an EC-like<br>

solution can be implemented in a limited way for coroutines.<br>

<br>

Generators, on the other hand, do not have an event loop or<br>

trampoline, making it impossible to intercept their ``yield`` points<br>

outside of the Python interpreter.<br>

<br>

<br>

Backwards Compatibility<br>

=======================<br>

<br>

This proposal preserves 100% backwards compatibility.<br>

<br>

<br>

Appendix: HAMT Performance<br>

==========================<br>

<br>

To assess if HAMT can be used for Execution Context, we implemented<br>

it in CPython [7]_.<br>

<br>

.. figure:: pep-0550-hamt_vs_dict.png<br>

   :align: center<br>

   :width: 100%<br>

<br>

   Figure 1.  Benchmark code can be found here: [9]_.<br>

<br>

Figure 1 shows that HAMT indeed displays O(1) performance for all<br>

benchmarked dictionary sizes.  For dictionaries with less than 100<br>

items, HAMT is a bit slower than Python dict/shallow copy.<br>

<br>

.. figure:: pep-0550-lookup_hamt.png<br>

   :align: center<br>

   :width: 100%<br>

<br>

   Figure 2.  Benchmark code can be found here: [10]_.<br>

<br>

Figure 2 shows comparison of lookup costs between Python dict<br>

and an HAMT immutable mapping.  HAMT lookup time is 30-40% worse<br>

than Python dict lookups on average, which is a very good result,<br>

considering how well Python dicts are optimized.<br>

<br>

Note, that according to [8]_, HAMT design can be further improved.<br>

<br>

<br>

Acknowledgments<br>

===============<br>

<br>

I thank Elvis Pranskevichus and Victor Petrovykh for countless<br>

discussions around the topic and PEP proof reading and edits.<br>

<br>

Thanks to Nathaniel Smith for proposing the ``ContextItem`` design<br>

[17]_ [18]_, for pushing the PEP towards a more complete design, and<br>

coming up with the idea of having a stack of contexts in the thread<br>

state.<br>

<br>

Thanks to Nick Coghlan for numerous suggestions and ideas on the<br>

mailing list, and for coming up with a case that cause the complete<br>

rewrite of the initial PEP version [19]_.<br>

<br>

<br>

References<br>

==========<br>

<br>

.. [1] <a href="https://blog.golang.org/context" rel="noreferrer" target="_blank">https://blog.golang.org/<wbr>context</a><br>

<br>

.. [2] <a href="https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx" rel="noreferrer" target="_blank">https://msdn.microsoft.com/en-<wbr>us/library/system.threading.<wbr>executioncontext.aspx</a><br>

<br>

.. [3] <a href="https://github.com/numpy/numpy/issues/9444" rel="noreferrer" target="_blank">https://github.com/numpy/<wbr>numpy/issues/9444</a><br>

<br>

.. [4] <a href="http://bugs.python.org/issue31179" rel="noreferrer" target="_blank">http://bugs.python.org/<wbr>issue31179</a><br>

<br>

.. [5] <a href="https://en.wikipedia.org/wiki/Hash_array_mapped_trie" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/<wbr>Hash_array_mapped_trie</a><br>

<br>

.. [6] <a href="http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html" rel="noreferrer" target="_blank">http://blog.higher-order.net/<wbr>2010/08/16/assoc-and-clojures-<wbr>persistenthashmap-part-ii.html</a><br>

<br>

.. [7] <a href="https://github.com/1st1/cpython/tree/hamt" rel="noreferrer" target="_blank">https://github.com/1st1/<wbr>cpython/tree/hamt</a><br>

<br>

.. [8] <a href="https://michael.steindorfer.name/publications/oopsla15.pdf" rel="noreferrer" target="_blank">https://michael.steindorfer.<wbr>name/publications/oopsla15.pdf</a><br>

<br>

.. [9] <a href="https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd" rel="noreferrer" target="_blank">https://gist.github.com/1st1/<wbr>9004813d5576c96529527d44c5457d<wbr>cd</a><br>

<br>

.. [10] <a href="https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e" rel="noreferrer" target="_blank">https://gist.github.com/1st1/<wbr>dbe27f2e14c30cce6f0b5fddfc8c43<wbr>7e</a><br>

<br>

.. [11] <a href="https://github.com/1st1/cpython/tree/pep550" rel="noreferrer" target="_blank">https://github.com/1st1/<wbr>cpython/tree/pep550</a><br>

<br>

.. [12] <a href="https://www.python.org/dev/peps/pep-0492/#async-await" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0492/#async-await</a><br>

<br>

.. [13] <a href="https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py" rel="noreferrer" target="_blank">https://github.com/MagicStack/<wbr>uvloop/blob/master/examples/<wbr>bench/echoserver.py</a><br>

<br>

.. [14] <a href="https://github.com/MagicStack/pgbench" rel="noreferrer" target="_blank">https://github.com/MagicStack/<wbr>pgbench</a><br>

<br>

.. [15] <a href="https://github.com/python/performance" rel="noreferrer" target="_blank">https://github.com/python/<wbr>performance</a><br>

<br>

.. [16] <a href="https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c" rel="noreferrer" target="_blank">https://gist.github.com/1st1/<wbr>6b7a614643f91ead3edf37c4451a6b<wbr>4c</a><br>

<br>

.. [17] <a href="https://mail.python.org/pipermail/python-ideas/2017-August/046752.html" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>pipermail/python-ideas/2017-<wbr>August/046752.html</a><br>

<br>

.. [18] <a href="https://mail.python.org/pipermail/python-ideas/2017-August/046772.html" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>pipermail/python-ideas/2017-<wbr>August/046772.html</a><br>

<br>

.. [19] <a href="https://mail.python.org/pipermail/python-ideas/2017-August/046780.html" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>pipermail/python-ideas/2017-<wbr>August/046780.html</a><br>

<br>

<br>

Copyright<br>

=========<br>

<br>

This document has been placed in the public domain.<br>

______________________________<wbr>_________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/<wbr>codeofconduct/</a><br>

</blockquote></div><br></div></div>