[Python-ideas] New PEP 550: Execution Context

Fri Aug 11 18:37:47 EDT 2017

Hi,

This is a new PEP to implement Execution Contexts in Python.

The PEP is in-flight to python.org, and in the meanwhile can
be read on GitHub:

https://github.com/python/peps/blob/master/pep-0550.rst

(it contains a few diagrams and charts, so please read it there.)

Thank you!
Yury

PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury at magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2017
Python-Version: 3.7
Post-History: 11-Aug-2017

Abstract
========

This PEP proposes a new mechanism to manage execution state--the
logical environment in which a function, a thread, a generator,
or a coroutine executes in.

A few examples of where having a reliable state storage is required:

* Context managers like decimal contexts, ``numpy.errstate``,
  and ``warnings.catch_warnings``;

* Storing request-related data such as security tokens and request
  data in web applications;

* Profiling, tracing, and logging in complex and large code bases.

The usual solution for storing state is to use a Thread-local Storage
(TLS), implemented in the standard library as ``threading.local()``.
Unfortunately, TLS does not work for isolating state of generators or
asynchronous code because such code shares a single thread.

Rationale
=========

Traditionally a Thread-local Storage (TLS) is used for storing the
state.  However, the major flaw of using the TLS is that it works only
for multi-threaded code.  It is not possible to reliably contain the
state within a generator or a coroutine.  For example, consider
the following generator::

    def calculate(precision, ...):
        with decimal.localcontext() as ctx:
            # Set the precision for decimal calculations
            # inside this block
            ctx.prec = precision

            yield calculate_something()
            yield calculate_something_else()

Decimal context is using a TLS to store the state, and because TLS is
not aware of generators, the state can leak.  The above code will
not work correctly, if a user iterates over the ``calculate()``
generator with different precisions in parallel::

    g1 = calculate(100)
    g2 = calculate(50)

    items = list(zip(g1, g2))

    # items[0] will be a tuple of:
    #   first value from g1 calculated with 100 precision,
    #   first value from g2 calculated with 50 precision.
    #
    # items[1] will be a tuple of:
    #   second value from g1 calculated with 50 precision,
    #   second value from g2 calculated with 50 precision.

An even scarier example would be using decimals to represent money
in an async/await application: decimal calculations can suddenly
lose precision in the middle of processing a request.  Currently,
bugs like this are extremely hard to find and fix.

Another common need for web applications is to have access to the
current request object, or security context, or, simply, the request
URL for logging or submitting performance tracing data::

    async def handle_http_request(request):
        context.current_http_request = request

        await ...
        # Invoke your framework code, render templates,
        # make DB queries, etc, and use the global
        # 'current_http_request' in that code.

        # This isn't currently possible to do reliably
        # in asyncio out of the box.

These examples are just a few out of many, where a reliable way to
store context data is absolutely needed.

The inability to use TLS for asynchronous code has lead to
proliferation of ad-hoc solutions, limited to be supported only by
code that was explicitly enabled to work with them.

Current status quo is that any library, including the standard
library, that uses a TLS, will likely not work as expected in
asynchronous code or with generators (see [3]_ as an example issue.)

Some languages that have coroutines or generators recommend to
manually pass a ``context`` object to every function, see [1]_
describing the pattern for Go.  This approach, however, has limited
use for Python, where we have a huge ecosystem that was built to work
with a TLS-like context.  Moreover, passing the context explicitly
does not work at all for libraries like ``decimal`` or ``numpy``,
which use operator overloading.

.NET runtime, which has support for async/await, has a generic
solution of this problem, called ``ExecutionContext`` (see [2]_).
On the surface, working with it is very similar to working with a TLS,
but the former explicitly supports asynchronous code.

Goals
=====

The goal of this PEP is to provide a more reliable alternative to
``threading.local()``.  It should be explicitly designed to work with
Python execution model, equally supporting threads, generators, and
coroutines.

An acceptable solution for Python should meet the following
requirements:

* Transparent support for code executing in threads, coroutines,
  and generators with an easy to use API.

* Negligible impact on the performance of the existing code or the
  code that will be using the new mechanism.

* Fast C API for packages like ``decimal`` and ``numpy``.

Explicit is still better than implicit, hence the new APIs should only
be used when there is no option to pass the state explicitly.

With this PEP implemented, it should be possible to update a context
manager like the below::

    _local = threading.local()

    @contextmanager
    def context(x):
        old_x = getattr(_local, 'x', None)
        _local.x = x
        try:
            yield
        finally:
            _local.x = old_x

to a more robust version that can be reliably used in generators
and async/await code, with a simple transformation::

    @contextmanager
    def context(x):
        old_x = get_execution_context_item('x')
        set_execution_context_item('x', x)
        try:
            yield
        finally:
            set_execution_context_item('x', old_x)

Specification
=============

This proposal introduces a new concept called Execution Context (EC),
along with a set of Python APIs and C APIs to interact with it.

EC is implemented using an immutable mapping.  Every modification
of the mapping produces a new copy of it.  To illustrate what it
means let's compare it to how we work with tuples in Python::

    a0 = ()
    a1 = a0 + (1,)
    a2 = a1 + (2,)

    # a0 is an empty tuple
    # a1 is (1,)
    # a2 is (1, 2)

Manipulating an EC object would be similar::

    a0 = EC()
    a1 = a0.set('foo', 'bar')
    a2 = a1.set('spam', 'ham')

    # a0 is an empty mapping
    # a1 is {'foo': 'bar'}
    # a2 is {'foo': 'bar', 'spam': 'ham'}

In CPython, every thread that can execute Python code has a
corresponding ``PyThreadState`` object.  It encapsulates important
runtime information like a pointer to the current frame, and is
being used by the ceval loop extensively.  We add a new field to
``PyThreadState``, called ``exec_context``, which points to the
current EC object.

We also introduce a set of APIs to work with Execution Context.
In this section we will only cover two functions that are needed to
explain how Execution Context works.  See the full list of new APIs
in the `New APIs`_ section.

* ``sys.get_execution_context_item(key, default=None)``: lookup
  ``key`` in the EC of the executing thread.  If not found,
  return ``default``.

* ``sys.set_execution_context_item(key, value)``: get the
  current EC of the executing thread.  Add a ``key``/``value``
  item to it, which will produce a new EC object.  Set the
  new object as the current one for the executing thread.
  In pseudo-code::

      tstate = PyThreadState_GET()
      ec = tstate.exec_context
      ec2 = ec.set(key, value)
      tstate.exec_context = ec2

Note, that some important implementation details and optimizations
are omitted here, and will be covered in later sections of this PEP.

Now let's see how Execution Contexts work with regular multi-threaded
code, generators, and coroutines.

Regular & Multithreaded Code
----------------------------

For regular Python code, EC behaves just like a thread-local.  Any
modification of the EC object produces a new one, which is immediately
set as the current one for the thread state.

.. figure:: pep-0550/functions.png
   :align: center
   :width: 90%

   Figure 1.  Execution Context flow in a thread.

As Figure 1 illustrates, if a function calls
``set_execution_context_item()``, the modification of the execution
context will be visible to all subsequent calls and to the caller::

    def set_foo():
        set_execution_context_item('foo', 'spam')

    set_execution_context_item('foo', 'bar')
    print(get_execution_context_item('foo'))

    set_foo()
    print(get_execution_context_item('foo'))

    # will print:
    #   bar
    #   spam

Coroutines
----------

Python :pep:`492` coroutines are used to implement cooperative
multitasking.  For a Python end-user they are similar to threads,
especially when it comes to sharing resources or modifying
the global state.

An event loop is needed to schedule coroutines.  Coroutines that
are explicitly scheduled by the user are usually called Tasks.
When a coroutine is scheduled, it can schedule other coroutines using
an ``await`` expression.  In async/await world, awaiting a coroutine
can be viewed as a different calling convention: Tasks are similar to
threads, and awaiting on coroutines within a Task is similar to
calling functions within a thread.

By drawing a parallel between regular multithreaded code and
async/await, it becomes apparent that any modification of the
execution context within one Task should be visible to all coroutines
scheduled within it.  Any execution context modifications, however,
must not be visible to other Tasks executing within the same thread.

To achieve this, a small set of modifications to the coroutine object
is needed:

* When a coroutine object is instantiated, it saves a reference to
  the current execution context object to its ``cr_execution_context``
  attribute.

* Coroutine's ``.send()`` and ``.throw()`` methods are modified as
  follows (in pseudo-C)::

    if coro->cr_isolated_execution_context:
        # Save a reference to the current execution context
        old_context = tstate->execution_context

        # Set our saved execution context as the current
        # for the current thread.
        tstate->execution_context = coro->cr_execution_context

        try:
            # Perform the actual `Coroutine.send()` or
            # `Coroutine.throw()` call.
            return coro->send(...)
        finally:
            # Save a reference to the updated execution_context.
            # We will need it later, when `.send()` or `.throw()`
            # are called again.
            coro->cr_execution_context = tstate->execution_context

            # Restore thread's execution context to what it was before
            # invoking this coroutine.
            tstate->execution_context = old_context
    else:
        # Perform the actual `Coroutine.send()` or
        # `Coroutine.throw()` call.
        return coro->send(...)

* ``cr_isolated_execution_context`` is a new attribute on coroutine
  objects.  Set to ``True`` by default, it makes any execution context
  modifications performed by coroutine to stay visible only to that
  coroutine.

  When Python interpreter sees an ``await`` instruction, it flips
  ``cr_isolated_execution_context`` to ``False`` for the coroutine
  that is about to be awaited.  This makes any changes to execution
  context made by nested coroutine calls within a Task to be visible
  throughout the Task.

  Because the top-level coroutine (Task) cannot be scheduled with
  ``await`` (in asyncio you need to call ``loop.create_task()`` or
  ``asyncio.ensure_future()`` to schedule a Task), all execution
  context modifications are guaranteed to stay within the Task.

* We always work with ``tstate->exec_context``.  We use
  ``coro->cr_execution_context`` only to store coroutine's execution
  context when it is not executing.

Figure 2 below illustrates how execution context mutations work with
coroutines.

.. figure:: pep-0550/coroutines.png
   :align: center
   :width: 90%

   Figure 2.  Execution Context flow in coroutines.

In the above diagram:

* When "coro1" is created, it saves a reference to the current
  execution context "2".

* If it makes any change to the context, it will have its own
  execution context branch "2.1".

* When it awaits on "coro2", any subsequent changes it does to
  the execution context are visible to "coro1", but not outside
  of it.

In code::

    async def inner_foo():
        print('inner_foo:', get_execution_context_item('key'))
        set_execution_context_item('key', 2)

    async def foo():
        print('foo:', get_execution_context_item('key'))

        set_execution_context_item('key', 1)
        await inner_foo()

        print('foo:', get_execution_context_item('key'))

    set_execution_context_item('key', 'spam')
    print('main:', get_execution_context_item('key'))

    asyncio.get_event_loop().run_until_complete(foo())

    print('main:', get_execution_context_item('key'))

which will output::

    main: spam
    foo: spam
    inner_foo: 1
    foo: 2
    main: spam

Generator-based coroutines (generators decorated with
``types.coroutine`` or ``asyncio.coroutine``) behave exactly as
native coroutines with regards to execution context management:
their ``yield from`` expression is semantically equivalent to
``await``.

Generators
----------

Generators in Python, while similar to Coroutines, are used in a
fundamentally different way.  They are producers of data, and
they use ``yield`` expression to suspend/resume their execution.

A crucial difference between ``await coro`` and ``yield value`` is
that the former expression guarantees that the ``coro`` will be
executed to the end, while the latter is producing ``value`` and
suspending the generator until it gets iterated again.

Generators share 99% of their implementation with coroutines, and
thus have similar new attributes ``gi_execution_context`` and
``gi_isolated_execution_context``.  Similar to coroutines, generators
save a reference to the current execution context when they are
instantiated.  The have the same implementation of ``.send()`` and
``.throw()`` methods.

The only difference is that
``gi_isolated_execution_context`` is always set to ``True``, and
is never modified by the interpreter.  ``yield from o`` expression in
regular generators that are not decorated with ``types.coroutine``,
is semantically equivalent to ``for v in o: yield v``.

.. figure:: pep-0550/generators.png
   :align: center
   :width: 90%

   Figure 3.  Execution Context flow in a generator.

In the above diagram:

* When "gen1" is created, it saves a reference to the current
  execution context "2".

* If it makes any change to the context, it will have its own
  execution context branch "2.1".

* When "gen2" is created, it saves a reference to the current
  execution context for it -- "2.1".

* Any subsequent execution context updated in "gen2" will only
  be visible to "gen2".

* Likewise, any context changes that "gen1" will do after it
  created "gen2" will not be visible to "gen2".

In code::

    def inner_foo():
        for i in range(3):
            print('inner_foo:', get_execution_context_item('key'))
            set_execution_context_item('key', i)
            yield i

    def foo():
        set_execution_context_item('key', 'spam')
        print('foo:', get_execution_context_item('key'))

        inner = inner_foo()

        while True:
            val = next(inner, None)
            if val is None:
                break
            yield val
            print('foo:', get_execution_context_item('key'))

    set_execution_context_item('key', 'spam')
    print('main:', get_execution_context_item('key'))

    list(foo())

    print('main:', get_execution_context_item('key'))

which will output::

    main: ham
    foo: spam
    inner_foo: spam
    foo: spam
    inner_foo: 0
    foo: spam
    inner_foo: 1
    foo: spam
    main: ham

As we see, any modification of the execution context in a generator
is visible only to the generator itself.

There is one use-case where it is desired for generators to affect
the surrounding execution context: ``contextlib.contextmanager``
decorator.  To make the following work::

    @contextmanager
    def context(x):
        old_x = get_execution_context_item('x')
        set_execution_context_item('x', x)
        try:
            yield
        finally:
            set_execution_context_item('x', old_x)

we modified ``contextmanager`` to flip
``gi_isolated_execution_context`` flag to ``False`` on its generator.

Greenlets
---------

Greenlet is an alternative implementation of cooperative
scheduling for Python.  Although greenlet package is not part of
CPython, popular frameworks like gevent rely on it, and it is
important that greenlet can be modified to support execution
contexts.

In a nutshell, greenlet design is very similar to design of
generators.  The main difference is that for generators, the stack
is managed by the Python interpreter.  Greenlet works outside of the
Python interpreter, and manually saves some ``PyThreadState``
fields and pushes/pops the C-stack.  Since Execution Context is
implemented on top of ``PyThreadState``, it's easy to add
transparent support of it to greenlet.

New APIs
========

Even though this PEP adds a number of new APIs, please keep in mind,
that most Python users will likely ever use only two of them:
``sys.get_execution_context_item()`` and
``sys.set_execution_context_item()``.

Python
------

1. ``sys.get_execution_context_item(key, default=None)``: lookup
   ``key`` for the current Execution Context.  If not found,
   return ``default``.

2. ``sys.set_execution_context_item(key, value)``: set
   ``key``/``value`` item for the current Execution Context.
   If ``value`` is ``None``, the item will be removed.

3. ``sys.get_execution_context()``: return the current Execution
   Context object: ``sys.ExecutionContext``.

4. ``sys.set_execution_context(ec)``: set the passed
   ``sys.ExecutionContext`` instance as a current one for the current
   thread.

5. ``sys.ExecutionContext`` object.

   Implementation detail: ``sys.ExecutionContext`` wraps a low-level
   ``PyExecContextData`` object.  ``sys.ExecutionContext`` has a
   mutable mapping API, abstracting away the real immutable
   ``PyExecContextData``.

   * ``ExecutionContext()``: construct a new, empty, execution
     context.

   * ``ec.run(func, *args)`` method: run ``func(*args)`` in the
     ``ec`` execution context.

   * ``ec[key]``: lookup ``key`` in ``ec`` context.

   * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``.

   * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and
     ``ec.copy()`` are similar to that of ``dict`` object.

C API
-----

C API is different from the Python one because it operates directly
on the low-level immutable ``PyExecContextData`` object.

1. New ``PyThreadState->exec_context`` field, pointing to a
   ``PyExecContextData`` object.

2. ``PyThreadState_SetExecContextItem`` and
   ``PyThreadState_GetExecContextItem`` similar to
   ``sys.set_execution_context_item()`` and
   ``sys.get_execution_context_item()``.

3. ``PyThreadState_GetExecContext``: similar to
   ``sys.get_execution_context()``.  Always returns an
   ``PyExecContextData`` object.  If ``PyThreadState->exec_context``
   is ``NULL`` an new and empty one will be created and assigned
   to ``PyThreadState->exec_context``.

4. ``PyThreadState_SetExecContext``: similar to
   ``sys.set_execution_context()``.

5. ``PyExecContext_New``: create a new empty ``PyExecContextData``
   object.

6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``.

The exact layout ``PyExecContextData`` is private, which allows
to switch it to a different implementation later.  More on that
in the `Implementation Details`_ section.

Modifications in Standard Library
=================================

* ``contextlib.contextmanager`` was updated to flip the new
  ``gi_isolated_execution_context`` attribute on the generator.

* ``asyncio.events.Handle`` object now captures the current
  execution context when it is created, and uses the saved
  execution context to run the callback (with
  ``ExecutionContext.run()`` method.)  This makes
  ``loop.call_soon()`` to run callbacks in the execution context
  they were scheduled.

  No modifications in ``asyncio.Task`` or ``asyncio.Future`` were
  necessary.

Some standard library modules like ``warnings`` and ``decimal``
can be updated to use new execution contexts.  This will be considered
in separate issues if this PEP is accepted.

Backwards Compatibility
=======================

This proposal preserves 100% backwards compatibility.

Performance
===========

Implementation Details
----------------------

The new ``PyExecContextData`` object is wrapping a ``dict`` object.
Any modification requires creating a shallow copy of the dict.

While working on the reference implementation of this PEP, we were
able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for
details.

.. figure:: pep-0550/dict_copy.png
   :align: center
   :width: 100%

   Figure 4.

Figure 4 shows that the performance of immutable dict implemented
with shallow copying is expectedly O(n) for the ``set()`` operation.
However, this is tolerable until dict has more than 100 items
(1 ``set()`` takes about a microsecond.)

Judging by the number of modules that need EC in Standard Library
it is likely that real world Python applications will use
significantly less than 100 execution context variables.

The important point is that the cost of accessing a key in
Execution Context is always O(1).

If the ``set()`` operation performance is a major concern, we discuss
alternative approaches that have O(1) or close ``set()`` performance
in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and
`Copy-on-write Execution Context`_ sections.

Generators and Coroutines
-------------------------

Using a microbenchmark for generators and coroutines from :pep:`492`
([12]_), it was possible to observe 0.5 to 1% performance degradation.

asyncio echoserver microbechmarks from the uvloop project [13]_
showed 1-1.5% performance degradation for asyncio code.

asyncpg benchmarks [14]_, that execute more code and are closer to a
real-world application did not exhibit any noticeable performance
change.

Overall Performance Impact
--------------------------

The total number of changed lines in the ceval loop is 2 -- in the
``YIELD_FROM`` opcode implementation.  Only performance of generators
and coroutines can be affected by the proposal.

This was confirmed by running Python Performance Benchmark Suite
[15]_, which demonstrated that there is no difference between
3.7 master branch and this PEP reference implementation branch
(full benchmark results can be found here [16]_.)

Design Considerations
=====================

Alternative Immutable Dict Implementation
-----------------------------------------

Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
to implement high performance immutable collections [5]_, [6]_.

Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
performance for both ``set()`` and ``get()`` operations, which will
be essentially O(1) for relatively small mappings in EC.

To assess if HAMT can be used for Execution Context, we implemented
it in CPython [7]_.

.. figure:: pep-0550/hamt_vs_dict.png
   :align: center
   :width: 100%

   Figure 5.  Benchmark code can be found here: [9]_.

Figure 5 shows that HAMT indeed displays O(1) performance for all
benchmarked dictionary sizes.  For dictionaries with less than 100
items, HAMT is a bit slower than Python dict/shallow copy.

.. figure:: pep-0550/lookup_hamt.png
   :align: center
   :width: 100%

   Figure 6.  Benchmark code can be found here: [10]_.

Figure 6 below shows comparison of lookup costs between Python dict
and an HAMT immutable mapping.  HAMT lookup time is 30-40% worse
than Python dict lookups on average, which is a very good result,
considering how well Python dicts are optimized.

Note, that according to [8]_, HAMT design can be further improved.

The bottom line is that the current approach with implementing
an immutable mapping with shallow-copying dict will likely perform
adequately in real-life applications.  The HAMT solution is more
future proof, however.

The proposed API is designed in such a way that the underlying
implementation of the mapping can be changed completely without
affecting the Execution Context `Specification`_, which allows
us to switch to HAMT at some point if necessary.

Copy-on-write Execution Context
-------------------------------

The implementation of Execution Context in .NET is different from
this PEP. .NET uses copy-on-write mechanism and a regular mutable
mapping.

One way to implement this in CPython would be to have two new
fields in ``PyThreadState``:

* ``exec_context`` pointing to the current Execution Context mapping;
* ``exec_context_copy_on_write`` flag, set to ``0`` initially.

The idea is that whenever we are modifying the EC, the copy-on-write
flag is checked, and if it is set to ``1``, the EC is copied.

Modifications to Coroutine and Generator ``.send()`` and ``.throw()``
methods described in the `Coroutines`_ section will be almost the
same, except that in addition to the ``gi_execution_context`` they
will have a ``gi_exec_context_copy_on_write`` flag.  When a coroutine
or a generator starts, the flag will be set to ``1``.  This will
ensure that any modification of the EC performed within a coroutine
or a generator will be isolated.

This approach has one advantage:

* For Execution Context that contains a large number of items,
  copy-on-write is a more efficient solution than the shallow-copy
  dict approach.

However, we believe that copy-on-write disadvantages are more
important to consider:

* Copy-on-write behaviour for generators and coroutines makes
  EC semantics less predictable.

  With immutable EC approach, generators and coroutines always
  execute in the EC that was current at the moment of their
  creation.  Any modifications to the outer EC while a generator
  or a coroutine is executing are not visible to them::

    def generator():
        yield 1
        print(get_execution_context_item('key'))
        yield 2

    set_execution_context_item('key', 'spam')
    gen = iter(generator())
    next(gen)
    set_execution_context_item('key', 'ham')
    next(gen)

  The above script will always print 'spam' with immutable EC.

  With a copy-on-write approach, the above script will print 'ham'.
  Now, consider that ``generator()`` was refactored to call some
  library function, that uses Execution Context::

    def generator():
        yield 1
        some_function_that_uses_decimal_context()
        print(get_execution_context_item('key'))
        yield 2

  Now, the script will print 'spam', because
  ``some_function_that_uses_decimal_context`` forced the EC to copy,
  and ``set_execution_context_item('key', 'ham')`` line did not
  affect the ``generator()`` code after all.

* Similarly to the previous point, ``sys.ExecutionContext.run()``
  method will also become less predictable, as
  ``sys.get_execution_context()`` would still return a reference to
  the current mutable EC.

  We can't modify ``sys.get_execution_context()`` to return a shallow
  copy of the current EC, because this would seriously harm
  performance of ``asyncio.call_soon()`` and similar places, where
  it is important to propagate the Execution Context.

* Even though copy-on-write requires to shallow copy the execution
  context object less frequently, copying will still take place
  in coroutines and generators.  In which case, HAMT approach will
  perform better for medium to large sized execution contexts.

All in all, we believe that the copy-on-write approach introduces
very subtle corner cases that could lead to bugs that are
exceptionally hard to discover and fix.

The immutable EC solution in comparison is always predictable and
easy to reason about.  Therefore we believe that any slight
performance gain that the copy-on-write solution might offer is not
worth it.

Faster C API
------------

Packages like numpy and standard library modules like decimal need
to frequently query the global state for some local context
configuration.  It is important that the APIs that they use is as
fast as possible.

The proposed ``PyThreadState_SetExecContextItem`` and
``PyThreadState_GetExecContextItem`` functions need to get the
current thread state with ``PyThreadState_GET()`` (fast) and then
perform a hash lookup (relatively slow).  We can eliminate the hash
lookup by adding three additional C API functions:

* ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``:
  a function similar to the existing ``_PyEval_RequestCodeExtraIndex``
  introduced :pep:`523`.  The idea is to request a unique index
  that can later be used to lookup context items.

  The ``key_name`` can later be used by ``sys.ExecutionContext`` to
  introspect items added with this API.

* ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)``
  and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)``
  to request an item by its index, avoiding the cost of hash lookup.

Why setting a key to None removes the item?
-------------------------------------------

Consider a context manager::

    @contextmanager
    def context(x):
        old_x = get_execution_context_item('x')
        set_execution_context_item('x', x)
        try:
            yield
        finally:
            set_execution_context_item('x', old_x)

With ``set_execution_context_item(key, None)`` call removing the
``key``, the user doesn't need to write additional code to remove
the ``key`` if it wasn't in the execution context already.

An alternative design with ``del_execution_context_item()`` method
would look like the following::

    @contextmanager
    def context(x):
        not_there = object()
        old_x = get_execution_context_item('x', not_there)
        set_execution_context_item('x', x)
        try:
            yield
        finally:
            if old_x is not_there:
                del_execution_context_item('x')
            else:
                set_execution_context_item('x', old_x)

Can we fix ``PyThreadState_GetDict()``?
---------------------------------------

``PyThreadState_GetDict`` is a TLS, and some of its existing users
might depend on it being just a TLS.  Changing its behaviour to follow
the Execution Context semantics would break backwards compatibility.

PEP 521
-------

:pep:`521` proposes an alternative solution to the problem:
enhance Context Manager Protocol with two new methods: ``__suspend__``
and ``__resume__``.  To make it compatible with async/await,
the Asynchronous Context Manager Protocol will also need to be
extended with ``__asuspend__`` and ``__aresume__``.

This allows to implement context managers like decimal context and
``numpy.errstate`` for generators and coroutines.

The following code::

    class Context:

        def __enter__(self):
            self.old_x = get_execution_context_item('x')
            set_execution_context_item('x', 'something')

        def __exit__(self, *err):
            set_execution_context_item('x', self.old_x)

would become this::

    class Context:

        def __enter__(self):
            self.old_x = get_execution_context_item('x')
            set_execution_context_item('x', 'something')

        def __suspend__(self):
            set_execution_context_item('x', self.old_x)

        def __resume__(self):
            set_execution_context_item('x', 'something')

        def __exit__(self, *err):
            set_execution_context_item('x', self.old_x)

Besides complicating the protocol, the implementation will likely
negatively impact performance of coroutines, generators, and any code
that uses context managers, and will notably complicate the
interpreter implementation.  It also does not solve the leaking state
problem for greenlet/gevent.

:pep:`521` also does not provide any mechanism to propagate state
in a local context, like storing a request object in an HTTP request
handler to have better logging.

Can Execution Context be implemented outside of CPython?
--------------------------------------------------------

Because async/await code needs an event loop to run it, an EC-like
solution can be implemented in a limited way for coroutines.

Generators, on the other hand, do not have an event loop or
trampoline, making it impossible to intercept their ``yield`` points
outside of the Python interpreter.

Reference Implementation
========================

The reference implementation can be found here: [11]_.

References
==========

.. [1] https://blog.golang.org/context

.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx

.. [3] https://github.com/numpy/numpy/issues/9444

.. [4] http://bugs.python.org/issue31179

.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie

.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html

.. [7] https://github.com/1st1/cpython/tree/hamt

.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf

.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd

.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e

.. [11] https://github.com/1st1/cpython/tree/pep550

.. [12] https://www.python.org/dev/peps/pep-0492/#async-await

.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py

.. [14] https://github.com/MagicStack/pgbench

.. [15] https://github.com/python/performance

.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c

Copyright
=========

This document has been placed in the public domain.