<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2017-08-16 1:55 GMT+02:00 Yury Selivanov <span dir="ltr"><<a href="mailto:yselivanov.ml@gmail.com" target="_blank">yselivanov.ml@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
Here's the PEP 550 version 2. Thanks to a very active and insightful<br>
discussion here on Python-ideas, we've discovered a number of<br>
problems with the first version of the PEP. This version is a complete<br>
rewrite (only Abstract, Rationale, and Goals sections were not updated).<br>
<br>
The updated PEP is live on <a href="http://python.org" rel="noreferrer" target="_blank">python.org</a>:<br>
<a href="https://www.python.org/dev/peps/pep-0550/" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0550/</a><br>
<br>
There is no reference implementation at this point, but I'm confident<br>
that this version of the spec will have the same extremely low<br>
runtime overhead as the first version. Thanks to the new ContextItem<br>
design, accessing values in the context is even faster now.<br>
<br>
Thank you!<br>
<br>
<br>
PEP: 550<br>
Title: Execution Context<br>
Version: $Revision$<br>
Last-Modified: $Date$<br>
Author: Yury Selivanov <<a href="mailto:yury@magic.io">yury@magic.io</a>><br>
Status: Draft<br>
Type: Standards Track<br>
Content-Type: text/x-rst<br>
Created: 11-Aug-2017<br>
Python-Version: 3.7<br>
Post-History: 11-Aug-2017, 15-Aug-2017<br>
<br>
<br>
Abstract<br>
========<br>
<br>
This PEP proposes a new mechanism to manage execution state--the<br>
logical environment in which a function, a thread, a generator,<br>
or a coroutine executes in.<br>
<br>
A few examples of where having a reliable state storage is required:<br>
<br>
* Context managers like decimal contexts, ``numpy.errstate``,<br>
and ``warnings.catch_warnings``;<br>
<br>
* Storing request-related data such as security tokens and request<br>
data in web applications, implementing i18n;<br>
<br>
* Profiling, tracing, and logging in complex and large code bases.<br>
<br>
The usual solution for storing state is to use a Thread-local Storage<br>
(TLS), implemented in the standard library as ``threading.local()``.<br>
Unfortunately, TLS does not work for the purpose of state isolation<br>
for generators or asynchronous code, because such code executes<br>
concurrently in a single thread.<br>
<br>
<br>
Rationale<br>
=========<br>
<br>
Traditionally, a Thread-local Storage (TLS) is used for storing the<br>
state. However, the major flaw of using the TLS is that it works only<br>
for multi-threaded code. It is not possible to reliably contain the<br>
state within a generator or a coroutine. For example, consider<br>
the following generator::<br>
<br>
def calculate(precision, ...):<br>
with decimal.localcontext() as ctx:<br>
# Set the precision for decimal calculations<br>
# inside this block<br>
ctx.prec = precision<br>
<br>
yield calculate_something()<br>
yield calculate_something_else()<br>
<br>
Decimal context is using a TLS to store the state, and because TLS is<br>
not aware of generators, the state can leak. If a user iterates over<br>
the ``calculate()`` generator with different precisions one by one<br>
using a ``zip()`` built-in, the above code will not work correctly.<br>
For example::<br>
<br>
g1 = calculate(precision=100)<br>
g2 = calculate(precision=50)<br>
<br>
items = list(zip(g1, g2))<br>
<br>
# items[0] will be a tuple of:<br>
# first value from g1 calculated with 100 precision,<br>
# first value from g2 calculated with 50 precision.<br>
#<br>
# items[1] will be a tuple of:<br>
# second value from g1 calculated with 50 precision (!!!),<br>
# second value from g2 calculated with 50 precision.<br>
<br>
An even scarier example would be using decimals to represent money<br>
in an async/await application: decimal calculations can suddenly<br>
lose precision in the middle of processing a request. Currently,<br>
bugs like this are extremely hard to find and fix.<br>
<br>
Another common need for web applications is to have access to the<br>
current request object, or security context, or, simply, the request<br>
URL for logging or submitting performance tracing data::<br>
<br>
async def handle_http_request(request):<br>
context.current_http_request = request<br>
<br>
await ...<br>
# Invoke your framework code, render templates,<br>
# make DB queries, etc, and use the global<br>
# 'current_http_request' in that code.<br>
<br>
# This isn't currently possible to do reliably<br>
# in asyncio out of the box.<br>
<br>
These examples are just a few out of many, where a reliable way to<br>
store context data is absolutely needed.<br>
<br>
The inability to use TLS for asynchronous code has lead to<br>
proliferation of ad-hoc solutions, which are limited in scope and<br>
do not support all required use cases.<br>
<br>
Current status quo is that any library, including the standard<br>
library, that uses a TLS, will likely not work as expected in<br>
asynchronous code or with generators (see [3]_ as an example issue.)<br>
<br>
Some languages that have coroutines or generators recommend to<br>
manually pass a ``context`` object to every function, see [1]_<br>
describing the pattern for Go. This approach, however, has limited<br>
use for Python, where we have a huge ecosystem that was built to work<br>
with a TLS-like context. Moreover, passing the context explicitly<br>
does not work at all for libraries like ``decimal`` or ``numpy``,<br>
which use operator overloading.<br>
<br>
.NET runtime, which has support for async/await, has a generic<br>
solution of this problem, called ``ExecutionContext`` (see [2]_).<br>
On the surface, working with it is very similar to working with a TLS,<br>
but the former explicitly supports asynchronous code.<br>
<br>
<br>
Goals<br>
=====<br>
<br>
The goal of this PEP is to provide a more reliable alternative to<br>
``threading.local()``. It should be explicitly designed to work with<br>
Python execution model, equally supporting threads, generators, and<br>
coroutines.<br>
<br>
An acceptable solution for Python should meet the following<br>
requirements:<br>
<br>
* Transparent support for code executing in threads, coroutines,<br>
and generators with an easy to use API.<br>
<br>
* Negligible impact on the performance of the existing code or the<br>
code that will be using the new mechanism.<br>
<br>
* Fast C API for packages like ``decimal`` and ``numpy``.<br>
<br>
Explicit is still better than implicit, hence the new APIs should only<br>
be used when there is no acceptable way of passing the state<br>
explicitly.<br>
<br>
<br>
Specification<br>
=============<br>
<br>
Execution Context is a mechanism of storing and accessing data specific<br>
to a logical thread of execution. We consider OS threads,<br>
generators, and chains of coroutines (such as ``asyncio.Task``)<br>
to be variants of a logical thread.<br>
<br>
In this specification, we will use the following terminology:<br>
<br>
* **Local Context**, or LC, is a key/value mapping that stores the<br>
context of a logical thread.<br>
<br>
* **Execution Context**, or EC, is an OS-thread-specific dynamic<br>
stack of Local Contexts.<br>
<br>
* **Context Item**, or CI, is an object used to set and get values<br>
from the Execution Context.<br>
<br>
Please note that throughout the specification we use simple<br>
pseudo-code to illustrate how the EC machinery works. The actual<br>
algorithms and data structures that we will use to implement the PEP<br>
are discussed in the `Implementation Strategy`_ section.<br>
<br>
<br>
Context Item Object<br>
-------------------<br>
<br>
The ``sys.new_context_item(<wbr>description)`` function creates a<br>
new ``ContextItem`` object. The ``description`` parameter is a<br>
``str``, explaining the nature of the context key for introspection<br>
and debugging purposes.<br>
<br>
``ContextItem`` objects have the following methods and attributes:<br>
<br>
* ``.description``: read-only description;<br>
<br>
* ``.set(o)`` method: set the value to ``o`` for the context item<br>
in the execution context.<br>
<br>
* ``.get()`` method: return the current EC value for the context item.<br>
Context items are initialized with ``None`` when created, so<br>
this method call never fails.<br>
<br>
The below is an example of how context items can be used::<br>
<br>
my_context = sys.new_context_item(<wbr>description='mylib.context')<br>
my_context.set('spam')<br></blockquote><div><br></div><div>Minor suggestion: Could we allow something like `sys.set_new_context_item(description='mylib.context', initial_value='spam')`? That would make it easier for type checkers to infer the type of a ContextItem, and it would save a line of code in the common case.</div><div><br></div><div>With this modification, the type of new_context_item would be</div><div><br></div><div>@overload</div><div>def new_context_item(*, description: str, initial_value: T) -> ContextItem[T]: ...</div><div>@overload</div><div>def new_context_item(*, description: str) -> ContextItem[Any]: ...</div><div><br></div><div>If we only allow the second variant, type checkers would need some sort of special casing to figure out that after .set(), .get() will return the same type. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
# Later, to access the value of my_context:<br>
print(my_context.get())<br>
<br>
<br>
Thread State and Multi-threaded code<br>
------------------------------<wbr>------<br>
<br>
Execution Context is implemented on top of Thread-local Storage.<br>
For every thread there is a separate stack of Local Contexts --<br>
mappings of ``ContextItem`` objects to their values in the LC.<br>
New threads always start with an empty EC.<br>
<br>
For CPython::<br>
<br>
PyThreadState:<br>
execution_context: ExecutionContext([<br>
LocalContext({ci1: val1, ci2: val2, ...}),<br>
...<br>
])<br>
<br>
The ``ContextItem.get()`` and ``.set()`` methods are defined as<br>
follows (in pseudo-code)::<br>
<br>
class ContextItem:<br>
<br>
def get(self):<br>
tstate = PyThreadState_Get()<br>
<br>
for local_context in reversed(tstate.execution_<wbr>context):<br>
if self in local_context:<br>
return local_context[self]<br>
<br>
def set(self, value):<br>
tstate = PyThreadState_Get()<br>
<br>
if not tstate.execution_context:<br>
tstate.execution_context = [LocalContext()]<br>
<br>
tstate.execution_context[-1][<wbr>self] = value<br>
<br>
With the semantics defined so far, the Execution Context can already<br>
be used as an alternative to ``threading.local()``::<br>
<br>
def print_foo():<br>
print(ci.get() or 'nothing')<br>
<br>
ci = sys.new_context_item(<wbr>description='test')<br>
ci.set('foo')<br>
<br>
# Will print "foo":<br>
print_foo()<br>
<br>
# Will print "nothing":<br>
threading.Thread(target=print_<wbr>foo).start()<br>
<br>
<br>
Manual Context Management<br>
-------------------------<br>
<br>
Execution Context is generally managed by the Python interpreter,<br>
but sometimes it is desirable for the user to take the control<br>
over it. A few examples when this is needed:<br>
<br>
* running a computation in ``concurrent.futures.<wbr>ThreadPoolExecutor``<br>
with the current EC;<br>
<br>
* reimplementing generators with iterators (more on that later);<br>
<br>
* managing contexts in asynchronous frameworks (implement proper<br>
EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)<br>
<br>
For these purposes we add a set of new APIs (they will be used in<br>
later sections of this specification):<br>
<br>
* ``sys.new_local_context()``: create an empty ``LocalContext``<br>
object.<br>
<br>
* ``sys.new_execution_context()`<wbr>`: create an empty<br>
``ExecutionContext`` object.<br>
<br>
* Both ``LocalContext`` and ``ExecutionContext`` objects are opaque<br>
to Python code, and there are no APIs to modify them.<br>
<br>
* ``sys.get_execution_context()`<wbr>` function. The function returns a<br>
copy of the current EC: an ``ExecutionContext`` instance.<br>
<br>
The runtime complexity of the actual implementation of this function<br>
can be O(1), but for the purposes of this section it is equivalent<br>
to::<br>
<br>
def get_execution_context():<br>
tstate = PyThreadState_Get()<br>
return copy(tstate.execution_context)<br>
<br>
* ``sys.run_with_execution_<wbr>context(ec: ExecutionContext, func, *args,<br>
**kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution<br>
context::<br>
<br>
def run_with_execution_context(ec, func, *args, **kwargs):<br>
tstate = PyThreadState_Get()<br>
<br>
old_ec = tstate.execution_context<br>
<br>
tstate.execution_context = ExecutionContext(<br>
ec.local_contexts + [LocalContext()]<br>
)<br>
<br>
try:<br>
return func(*args, **kwargs)<br>
finally:<br>
tstate.execution_context = old_ec<br>
<br>
Any changes to Local Context by ``func`` will be ignored.<br>
This allows to reuse one ``ExecutionContext`` object for multiple<br>
invocations of different functions, without them being able to<br>
affect each other's environment::<br>
<br>
ci = sys.new_context_item('example'<wbr>)<br>
ci.set('spam')<br>
<br>
def func():<br>
print(ci.get())<br>
ci.set('ham')<br>
<br>
ec = sys.get_execution_context()<br>
<br>
sys.run_with_execution_<wbr>context(ec, func)<br>
sys.run_with_execution_<wbr>context(ec, func)<br>
<br>
# Will print:<br>
# spam<br>
# spam<br>
<br>
* ``sys.run_with_local_context(<wbr>lc: LocalContext, func, *args,<br>
**kwargs)`` runs ``func(*args, **kwargs)`` in the current execution<br>
context using the specified local context.<br>
<br>
Any changes that ``func`` does to the local context will be<br>
persisted in ``lc``. This behaviour is different from the<br>
``run_with_execution_context()<wbr>`` function, which always creates<br>
a new throw-away local context.<br>
<br>
In pseudo-code::<br>
<br>
def run_with_local_context(lc, func, *args, **kwargs):<br>
tstate = PyThreadState_Get()<br>
<br>
old_ec = tstate.execution_context<br>
<br>
tstate.execution_context = ExecutionContext(<br>
old_ec.local_contexts + [lc]<br>
)<br>
<br>
try:<br>
return func(*args, **kwargs)<br>
finally:<br>
tstate.execution_context = old_ec<br>
<br>
Using the previous example::<br>
<br>
ci = sys.new_context_item('example'<wbr>)<br>
ci.set('spam')<br>
<br>
def func():<br>
print(ci.get())<br>
ci.set('ham')<br>
<br>
ec = sys.get_execution_context()<br>
lc = sys.new_local_context()<br>
<br>
sys.run_with_local_context(lc, func)<br>
sys.run_with_local_context(lc, func)<br>
<br>
# Will print:<br>
# spam<br>
# ham<br>
<br>
As an example, let's make a subclass of<br>
``concurrent.futures.<wbr>ThreadPoolExecutor`` that preserves the execution<br>
context for scheduled functions::<br>
<br>
class Executor(concurrent.futures.<wbr>ThreadPoolExecutor):<br>
<br>
def submit(self, fn, *args, **kwargs):<br>
context = sys.get_execution_context()<br>
<br>
fn = functools.partial(<br>
sys.run_with_execution_<wbr>context, context,<br>
fn, *args, **kwargs)<br>
<br>
return super().submit(fn)<br>
<br>
<br>
EC Semantics for Coroutines<br>
---------------------------<br>
<br>
Python :pep:`492` coroutines are used to implement cooperative<br>
multitasking. For a Python end-user they are similar to threads,<br>
especially when it comes to sharing resources or modifying<br>
the global state.<br>
<br>
An event loop is needed to schedule coroutines. Coroutines that<br>
are explicitly scheduled by the user are usually called Tasks.<br>
When a coroutine is scheduled, it can schedule other coroutines using<br>
an ``await`` expression. In async/await world, awaiting a coroutine<br>
is equivalent to a regular function call in synchronous code. Thus,<br>
Tasks are similar to threads.<br>
<br>
By drawing a parallel between regular multithreaded code and<br>
async/await, it becomes apparent that any modification of the<br>
execution context within one Task should be visible to all coroutines<br>
scheduled within it. Any execution context modifications, however,<br>
must not be visible to other Tasks executing within the same OS<br>
thread.<br>
<br>
<br>
Coroutine Object Modifications<br>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
<br>
To achieve this, a small set of modifications to the coroutine object<br>
is needed:<br>
<br>
* New ``cr_local_context`` attribute. This attribute is readable<br>
and writable for Python code.<br>
<br>
* When a coroutine object is instantiated, its ``cr_local_context``<br>
is initialized with an empty Local Context.<br>
<br>
* Coroutine's ``.send()`` and ``.throw()`` methods are modified as<br>
follows (in pseudo-C)::<br>
<br>
if coro.cr_local_context is not None:<br>
tstate = PyThreadState_Get()<br>
<br>
tstate.execution_context.push(<wbr>coro.cr_local_context)<br>
<br>
try:<br>
# Perform the actual `Coroutine.send()` or<br>
# `Coroutine.throw()` call.<br>
return coro.send(...)<br>
finally:<br>
coro.cr_local_context = tstate.execution_context.pop()<br>
else:<br>
# Perform the actual `Coroutine.send()` or<br>
# `Coroutine.throw()` call.<br>
return coro.send(...)<br>
<br>
* When Python interpreter sees an ``await`` instruction, it inspects<br>
the ``cr_local_context`` attribute of the coroutine that is about<br>
to be awaited. For ``await coro``:<br>
<br>
* If ``coro.cr_local_context`` is an empty ``LocalContext`` object<br>
that ``coro`` was created with, the interpreter will set<br>
``coro.cr_local_context`` to ``None``.<br>
<br>
* If ``coro.cr_local_context`` was modified by Python code, the<br>
interpreter will leave it as is.<br>
<br>
This makes any changes to execution context made by nested coroutine<br>
calls within a Task to be visible throughout the Task::<br>
<br>
ci = sys.new_context_item('example'<wbr>)<br>
<br>
async def nested():<br>
ci.set('nested')<br>
<br>
asynd def main():<br>
ci.set('main')<br>
print('before:', ci.get())<br>
await nested()<br>
print('after:', ci.get())<br>
<br>
# Will print:<br>
# before: main<br>
# after: nested<br>
<br>
Essentially, coroutines work with Execution Context items similarly<br>
to threads, and ``await`` expression acts like a function call.<br>
<br>
This mechanism also works for ``yield from`` in generators decorated<br>
with ``@types.coroutine`` or ``@asyncio.coroutine``, which are<br>
called "generator-based coroutines" according to :pep:`492`,<br>
and should be fully compatible with native async/await coroutines.<br>
<br>
<br>
Tasks<br>
^^^^^<br>
<br>
In asynchronous frameworks like asyncio, coroutines are run by<br>
an event loop, and need to be explicitly scheduled (in asyncio<br>
coroutines are run by ``asyncio.Task``.)<br>
<br>
With the currently defined semantics, the interpreter makes<br>
coroutines linked by an ``await`` expression share the same<br>
Local Context.<br>
<br>
The interpreter, however, is not aware of the Task concept, and<br>
cannot help with ensuring that new Tasks started in coroutines,<br>
use the correct EC::<br>
<br>
current_request = sys.new_context_item(<wbr>description='request')<br>
<br>
async def child():<br>
print('current request:', repr(current_request.get()))<br>
<br>
async def handle_request(request):<br>
current_request.set(request)<br>
event_loop.create_task(child)<br>
<br>
run(top_coro())<br>
<br>
# Will print:<br>
# current_request: None<br>
<br>
To enable correct Execution Context propagation into Tasks, the<br>
asynchronous framework needs to assist the interpreter:<br>
<br>
* When ``create_task`` is called, it should capture the current<br>
execution context with ``sys.get_execution_context()`<wbr>` and save it<br>
on the Task object.<br>
<br>
* When the Task object runs its coroutine object, it should execute<br>
``.send()`` and ``.throw()`` methods within the captured<br>
execution context, using the ``sys.run_with_execution_<wbr>context()``<br>
function.<br>
<br>
With help from the asynchronous framework, the above snippet will<br>
run correctly, and the ``child()`` coroutine will be able to access<br>
the current request object through the ``current_request``<br>
Context Item.<br>
<br>
<br>
Event Loop Callbacks<br>
^^^^^^^^^^^^^^^^^^^^<br>
<br>
Similarly to Tasks, functions like asyncio's ``loop.call_soon()``<br>
should capture the current execution context with<br>
``sys.get_execution_context()`<wbr>` and execute callbacks<br>
within it with ``sys.run_with_execution_<wbr>context()``.<br>
<br>
This way the following code will work::<br>
<br>
current_request = sys.new_context_item(<wbr>description='request')<br>
<br>
def log():<br>
request = current_request.get()<br>
print(request)<br>
<br>
async def request_handler(request):<br>
current_request.set(request)<br>
get_event_loop.call_soon(log)<br>
<br>
<br>
Generators<br>
----------<br>
<br>
Generators in Python, while similar to Coroutines, are used in a<br>
fundamentally different way. They are producers of data, and<br>
they use ``yield`` expression to suspend/resume their execution.<br>
<br>
A crucial difference between ``await coro`` and ``yield value`` is<br>
that the former expression guarantees that the ``coro`` will be<br>
executed fully, while the latter is producing ``value`` and<br>
suspending the generator until it gets iterated again.<br>
<br>
Generators, similarly to coroutines, have a ``gi_local_context``<br>
attribute, which is set to an empty Local Context when created.<br>
<br>
Contrary to coroutines though, ``yield from o`` expression in<br>
generators (that are not generator-based coroutines) is semantically<br>
equivalent to ``for v in o: yield v``, therefore the interpreter does<br>
not attempt to control their ``gi_local_context``.<br>
<br>
<br>
EC Semantics for Generators<br>
^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
<br>
Every generator object has its own Local Context that stores<br>
only its own local modifications of the context. When a generator<br>
is being iterated, its local context will be put in the EC stack<br>
of the current thread. This means that the generator will be able<br>
to see access items from the surrounding context::<br>
<br>
local = sys.new_context_item("local")<br>
global = sys.new_context_item("global")<br>
<br>
def generator():<br>
local.set('inside gen:')<br>
while True:<br>
print(local.get(), global.get())<br>
yield<br>
<br>
g = gen()<br>
<br>
local.set('hello')<br>
global.set('spam')<br>
next(g)<br>
<br>
local.set('world')<br>
global.set('ham')<br>
next(g)<br>
<br>
# Will print:<br>
# inside gen: spam<br>
# inside gen: ham<br>
<br>
Any changes to the EC in nested generators are invisible to the outer<br>
generator::<br>
<br>
local = sys.new_context_item("local")<br>
<br>
def inner_gen():<br>
local.set('spam')<br>
yield<br>
<br>
def outer_gen():<br>
local.set('ham')<br>
yield from gen()<br>
print(local.get())<br>
<br>
list(outer_gen())<br>
<br>
# Will print:<br>
# ham<br>
<br>
<br>
Running generators without LC<br>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
<br>
Similarly to coroutines, generators with ``gi_local_context``<br>
set to ``None`` simply use the outer Local Context.<br>
<br>
The ``@contextlib.contextmanager`` decorator uses this mechanism to<br>
allow its generator to affect the EC::<br>
<br>
item = sys.new_context_item('test')<br>
<br>
@contextmanager<br>
def context(x):<br>
old = item.get()<br>
item.set('x')<br>
try:<br>
yield<br>
finally:<br>
item.set(old)<br>
<br>
with context('spam'):<br>
<br>
with context('ham'):<br>
print(1, item.get())<br>
<br>
print(2, item.get())<br>
<br>
# Will print:<br>
# 1 ham<br>
# 2 spam<br>
<br>
<br>
Implementing Generators with Iterators<br>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<wbr>^^^^^^^^<br>
<br>
The Execution Context API allows to fully replicate EC behaviour<br>
imposed on generators with a regular Python iterator class::<br>
<br>
class Gen:<br>
<br>
def __init__(self):<br>
self.local_context = sys.new_local_context()<br>
<br>
def __iter__(self):<br>
return self<br>
<br>
def __next__(self):<br>
return sys.run_with_local_context(<br>
self.local_context, self._next_impl)<br>
<br>
def _next_impl(self):<br>
# Actual __next__ implementation.<br>
...<br>
<br>
<br>
Asynchronous Generators<br>
-----------------------<br>
<br>
Asynchronous Generators (AG) interact with the Execution Context<br>
similarly to regular generators.<br>
<br>
They have an ``ag_local_context`` attribute, which, similarly to<br>
regular generators, can be set to ``None`` to make them use the outer<br>
Local Context. This is used by the new<br>
``contextlib.<wbr>asynccontextmanager`` decorator.<br>
<br>
The EC support of ``await`` expression is implemented using the same<br>
approach as in coroutines, see the `Coroutine Object Modifications`_<br>
section.<br>
<br>
<br>
Greenlets<br>
---------<br>
<br>
Greenlet is an alternative implementation of cooperative<br>
scheduling for Python. Although greenlet package is not part of<br>
CPython, popular frameworks like gevent rely on it, and it is<br>
important that greenlet can be modified to support execution<br>
contexts.<br>
<br>
In a nutshell, greenlet design is very similar to design of<br>
generators. The main difference is that for generators, the stack<br>
is managed by the Python interpreter. Greenlet works outside of the<br>
Python interpreter, and manually saves some ``PyThreadState``<br>
fields and pushes/pops the C-stack. Thus the ``greenlet`` package<br>
can be easily updated to use the new low-level `C API`_ to enable<br>
full support of EC.<br>
<br>
<br>
New APIs<br>
========<br>
<br>
Python<br>
------<br>
<br>
Python APIs were designed to completely hide the internal<br>
implementation details, but at the same time provide enough control<br>
over EC and LC to re-implement all of Python built-in objects<br>
in pure Python.<br>
<br>
1. ``sys.new_context_item(<wbr>description='...')``: create a<br>
``ContextItem`` object used to access/set values in EC.<br>
<br>
2. ``ContextItem``:<br>
<br>
* ``.description``: read-only attribute.<br>
* ``.get()``: return the current value for the item.<br>
* ``.set(o)``: set the current value in the EC for the item.<br>
<br>
3. ``sys.get_execution_context()`<wbr>`: return the current<br>
``ExecutionContext``.<br>
<br>
4. ``sys.new_execution_context()`<wbr>`: create a new empty<br>
``ExecutionContext``.<br>
<br>
5. ``sys.new_local_context()``: create a new empty ``LocalContext``.<br>
<br>
6. ``sys.run_with_execution_<wbr>context(ec: ExecutionContext,<br>
func, *args, **kwargs)``.<br>
<br>
7. ``sys.run_with_local_context(<wbr>lc:LocalContext,<br>
func, *args, **kwargs)``.<br>
<br>
<br>
C API<br>
-----<br>
<br>
1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a<br>
``PyContextItem`` object.<br>
<br>
2. ``PyObject * PyContext_GetItem(<wbr>PyContextItem *)``: get the<br>
current value for the context item.<br>
<br>
3. ``int PyContext_SetItem(<wbr>PyContextItem *, PyObject *)``: set<br>
the current value for the context item.<br>
<br>
4. ``PyLocalContext * PyLocalContext_New()``: create a new empty<br>
``PyLocalContext``.<br>
<br>
5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty<br>
``PyExecutionContext``.<br>
<br>
6. ``PyExecutionContext * PyExecutionContext_Get()``: get the<br>
EC for the active thread state.<br>
<br>
7. ``int PyExecutionContext_Set(<wbr>PyExecutionContext *)``: set the<br>
passed EC object as the current for the active thread state.<br>
<br>
8. ``int PyExecutionContext_<wbr>SetWithLocalContext(<wbr>PyExecutionContext *,<br>
PyLocalContext *)``: allows to implement<br>
``sys.run_with_local_context`` Python API.<br>
<br>
<br>
Implementation Strategy<br>
=======================<br>
<br>
LocalContext is a Weak Key Mapping<br>
------------------------------<wbr>----<br>
<br>
Using a weak key mapping for ``LocalContext`` implementation<br>
enables the following properties with regards to garbage<br>
collection:<br>
<br>
* ``ContextItem`` objects are strongly-referenced only from the<br>
application code, not from any of the Execution Context<br>
machinery or values they point to. This means that there<br>
are no reference cycles that could extend their lifespan<br>
longer than necessary, or prevent their garbage collection.<br>
<br>
* Values put in the Execution Context are guaranteed to be kept<br>
alive while there is a ``ContextItem`` key referencing them in<br>
the thread.<br>
<br>
* If a ``ContextItem`` is garbage collected, all of its values will<br>
be removed from all contexts, allowing them to be GCed if needed.<br>
<br>
* If a thread has ended its execution, its thread state will be<br>
cleaned up along with its ``ExecutionContext``, cleaning<br>
up all values bound to all Context Items in the thread.<br>
<br>
<br>
ContextItem.get() Cache<br>
-----------------------<br>
<br>
We can add three new fields to ``PyThreadState`` and<br>
``PyInterpreterState`` structs:<br>
<br>
* ``uint64_t PyThreadState->unique_id``: a globally unique<br>
thread state identifier (we can add a counter to<br>
``PyInterpreterState`` and increment it when a new thread state is<br>
created.)<br>
<br>
* ``uint64_t PyInterpreterState->context_<wbr>item_deallocs``: every time<br>
a ``ContextItem`` is GCed, all Execution Contexts in all threads<br>
will lose track of it. ``context_item_deallocs`` will simply<br>
count all ``ContextItem`` deallocations.<br>
<br>
* ``uint64_t PyThreadState->execution_<wbr>context_ver``: every time<br>
a new item is set, or an existing item is updated, or the stack<br>
of execution contexts is changed in the thread, we increment this<br>
counter.<br>
<br>
The above two fields allow implementing a fast cache path in<br>
``ContextItem.get()``, in pseudo-code::<br>
<br>
class ContextItem:<br>
<br>
def get(self):<br>
tstate = PyThreadState_Get()<br>
<br>
if (self.last_tstate_id == tstate.unique_id and<br>
self.last_ver == tstate.execution_context_ver<br>
self.last_deallocs ==<br>
tstate.iterp.context_item_<wbr>deallocs):<br>
return self.last_value<br>
<br>
value = None<br>
for mapping in reversed(tstate.execution_<wbr>context):<br>
if self in mapping:<br>
value = mapping[self]<br>
break<br>
<br>
self.last_value = value<br>
self.last_tstate_id = tstate.unique_id<br>
self.last_ver = tstate.execution_context_ver<br>
self.last_deallocs = tstate.interp.context_item_<wbr>deallocs<br>
<br>
return value<br>
<br>
This is similar to the trick that decimal C implementation uses<br>
for caching the current decimal context, and will have the same<br>
performance characteristics, but available to all<br>
Execution Context users.<br>
<br>
<br>
Approach #1: Use a dict for LocalContext<br>
------------------------------<wbr>----------<br>
<br>
The straightforward way of implementing the proposed EC<br>
mechanisms is to create a ``WeakKeyDict`` on top of Python<br>
``dict`` type.<br>
<br>
To implement the ``ExecutionContext`` type we can use Python<br>
``list`` (or a custom stack implementation with some<br>
pre-allocation optimizations).<br>
<br>
This approach will have the following runtime complexity:<br>
<br>
* O(M) for ``ContextItem.get()``, where ``M`` is the number of<br>
Local Contexts in the stack.<br>
<br>
It is important to note that ``ContextItem.get()`` will implement<br>
a cache making the operation O(1) for packages like ``decimal``<br>
and ``numpy``.<br>
<br>
* O(1) for ``ContextItem.set()``.<br>
<br>
* O(N) for ``sys.get_execution_context()`<wbr>`, where ``N`` is the<br>
total number of items in the current **execution** context.<br>
<br>
<br>
Approach #2: Use HAMT for LocalContext<br>
------------------------------<wbr>--------<br>
<br>
Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)<br>
to implement high performance immutable collections [5]_, [6]_.<br>
<br>
Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)<br>
performance for both ``set()``, ``get()``, and ``merge()`` operations,<br>
which is essentially O(1) for relatively small mappings<br>
(read about HAMT performance in CPython in the<br>
`Appendix: HAMT Performance`_ section.)<br>
<br>
In this approach we use the same design of the ``ExecutionContext``<br>
as in Approach #1, but we will use HAMT backed weak key Local Context<br>
implementation. With that we will have the following runtime<br>
complexity:<br>
<br>
* O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``,<br>
where ``M`` is the number of Local Contexts in the stack,<br>
and ``N`` is the number of items in the EC. The operation will<br>
essentially be O(M), because execution contexts are normally not<br>
expected to have more than a few dozen of items.<br>
<br>
(``ContextItem.get()`` will have the same caching mechanism as in<br>
Approach #1.)<br>
<br>
* O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the<br>
number of items in the current **local** context. This will<br>
essentially be an O(1) operation most of the time.<br>
<br>
* O(log\ :sub:`32`\ N) for ``sys.get_execution_context()`<wbr>`, where<br>
``N`` is the total number of items in the current **execution**<br>
context.<br>
<br>
Essentially, using HAMT for Local Contexts instead of Python dicts,<br>
allows to bring down the complexity of ``sys.get_execution_context()`<wbr>`<br>
from O(N) to O(log\ :sub:`32`\ N) because of the more efficient<br>
merge algorithm.<br>
<br>
<br>
Approach #3: Use HAMT and Immutable Linked List<br>
------------------------------<wbr>-----------------<br>
<br>
We can make an alternative ``ExecutionContext`` design by using<br>
a linked list. Each ``LocalContext`` in the ``ExecutionContext``<br>
object will be wrapped in a linked-list node.<br>
<br>
``LocalContext`` objects will use an HAMT backed weak key<br>
implementation described in the Approach #2.<br>
<br>
Every modification to the current ``LocalContext`` will produce a<br>
new version of it, which will be wrapped in a **new linked list<br>
node**. Essentially this means, that ``ExecutionContext`` is an<br>
immutable forest of ``LocalContext`` objects, and can be safely<br>
copied by reference in ``sys.get_execution_context()`<wbr>` (eliminating<br>
the expensive "merge" operation.)<br>
<br>
With this approach, ``sys.get_execution_context()`<wbr>` will be an<br>
**O(1) operation**.<br>
<br>
<br>
Summary<br>
-------<br>
<br>
We believe that approach #3 enables an efficient and complete<br>
Execution Context implementation, with excellent runtime performance.<br>
<br>
`ContextItem.get() Cache`_ enables fast retrieval of context items<br>
for performance critical libraries like decimal and numpy.<br>
<br>
Fast ``sys.get_execution_context()`<wbr>` enables efficient management<br>
of execution contexts in asynchronous libraries like asyncio.<br>
<br>
<br>
Design Considerations<br>
=====================<br>
<br>
Can we fix ``PyThreadState_GetDict()``?<br>
------------------------------<wbr>---------<br>
<br>
``PyThreadState_GetDict`` is a TLS, and some of its existing users<br>
might depend on it being just a TLS. Changing its behaviour to follow<br>
the Execution Context semantics would break backwards compatibility.<br>
<br>
<br>
PEP 521<br>
-------<br>
<br>
:pep:`521` proposes an alternative solution to the problem:<br>
enhance Context Manager Protocol with two new methods: ``__suspend__``<br>
and ``__resume__``. To make it compatible with async/await,<br>
the Asynchronous Context Manager Protocol will also need to be<br>
extended with ``__asuspend__`` and ``__aresume__``.<br>
<br>
This allows to implement context managers like decimal context and<br>
``numpy.errstate`` for generators and coroutines.<br>
<br>
The following code::<br>
<br>
class Context:<br>
<br>
def __enter__(self):<br>
self.old_x = get_execution_context_item('x'<wbr>)<br>
set_execution_context_item('x'<wbr>, 'something')<br>
<br>
def __exit__(self, *err):<br>
set_execution_context_item('x'<wbr>, self.old_x)<br>
<br>
would become this::<br>
<br>
local = threading.local()<br>
<br>
class Context:<br>
<br>
def __enter__(self):<br>
self.old_x = getattr(local, 'x', None)<br>
local.x = 'something'<br>
<br>
def __suspend__(self):<br>
local.x = self.old_x<br>
<br>
def __resume__(self):<br>
local.x = 'something'<br>
<br>
def __exit__(self, *err):<br>
local.x = self.old_x<br>
<br>
Besides complicating the protocol, the implementation will likely<br>
negatively impact performance of coroutines, generators, and any code<br>
that uses context managers, and will notably complicate the<br>
interpreter implementation.<br>
<br>
:pep:`521` also does not provide any mechanism to propagate state<br>
in a local context, like storing a request object in an HTTP request<br>
handler to have better logging. Nor does it solve the leaking state<br>
problem for greenlet/gevent.<br>
<br>
<br>
Can Execution Context be implemented outside of CPython?<br>
------------------------------<wbr>--------------------------<br>
<br>
Because async/await code needs an event loop to run it, an EC-like<br>
solution can be implemented in a limited way for coroutines.<br>
<br>
Generators, on the other hand, do not have an event loop or<br>
trampoline, making it impossible to intercept their ``yield`` points<br>
outside of the Python interpreter.<br>
<br>
<br>
Backwards Compatibility<br>
=======================<br>
<br>
This proposal preserves 100% backwards compatibility.<br>
<br>
<br>
Appendix: HAMT Performance<br>
==========================<br>
<br>
To assess if HAMT can be used for Execution Context, we implemented<br>
it in CPython [7]_.<br>
<br>
.. figure:: pep-0550-hamt_vs_dict.png<br>
:align: center<br>
:width: 100%<br>
<br>
Figure 1. Benchmark code can be found here: [9]_.<br>
<br>
Figure 1 shows that HAMT indeed displays O(1) performance for all<br>
benchmarked dictionary sizes. For dictionaries with less than 100<br>
items, HAMT is a bit slower than Python dict/shallow copy.<br>
<br>
.. figure:: pep-0550-lookup_hamt.png<br>
:align: center<br>
:width: 100%<br>
<br>
Figure 2. Benchmark code can be found here: [10]_.<br>
<br>
Figure 2 shows comparison of lookup costs between Python dict<br>
and an HAMT immutable mapping. HAMT lookup time is 30-40% worse<br>
than Python dict lookups on average, which is a very good result,<br>
considering how well Python dicts are optimized.<br>
<br>
Note, that according to [8]_, HAMT design can be further improved.<br>
<br>
<br>
Acknowledgments<br>
===============<br>
<br>
I thank Elvis Pranskevichus and Victor Petrovykh for countless<br>
discussions around the topic and PEP proof reading and edits.<br>
<br>
Thanks to Nathaniel Smith for proposing the ``ContextItem`` design<br>
[17]_ [18]_, for pushing the PEP towards a more complete design, and<br>
coming up with the idea of having a stack of contexts in the thread<br>
state.<br>
<br>
Thanks to Nick Coghlan for numerous suggestions and ideas on the<br>
mailing list, and for coming up with a case that cause the complete<br>
rewrite of the initial PEP version [19]_.<br>
<br>
<br>
References<br>
==========<br>
<br>
.. [1] <a href="https://blog.golang.org/context" rel="noreferrer" target="_blank">https://blog.golang.org/<wbr>context</a><br>
<br>
.. [2] <a href="https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx" rel="noreferrer" target="_blank">https://msdn.microsoft.com/en-<wbr>us/library/system.threading.<wbr>executioncontext.aspx</a><br>
<br>
.. [3] <a href="https://github.com/numpy/numpy/issues/9444" rel="noreferrer" target="_blank">https://github.com/numpy/<wbr>numpy/issues/9444</a><br>
<br>
.. [4] <a href="http://bugs.python.org/issue31179" rel="noreferrer" target="_blank">http://bugs.python.org/<wbr>issue31179</a><br>
<br>
.. [5] <a href="https://en.wikipedia.org/wiki/Hash_array_mapped_trie" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/<wbr>Hash_array_mapped_trie</a><br>
<br>
.. [6] <a href="http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html" rel="noreferrer" target="_blank">http://blog.higher-order.net/<wbr>2010/08/16/assoc-and-clojures-<wbr>persistenthashmap-part-ii.html</a><br>
<br>
.. [7] <a href="https://github.com/1st1/cpython/tree/hamt" rel="noreferrer" target="_blank">https://github.com/1st1/<wbr>cpython/tree/hamt</a><br>
<br>
.. [8] <a href="https://michael.steindorfer.name/publications/oopsla15.pdf" rel="noreferrer" target="_blank">https://michael.steindorfer.<wbr>name/publications/oopsla15.pdf</a><br>
<br>
.. [9] <a href="https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd" rel="noreferrer" target="_blank">https://gist.github.com/1st1/<wbr>9004813d5576c96529527d44c5457d<wbr>cd</a><br>
<br>
.. [10] <a href="https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e" rel="noreferrer" target="_blank">https://gist.github.com/1st1/<wbr>dbe27f2e14c30cce6f0b5fddfc8c43<wbr>7e</a><br>
<br>
.. [11] <a href="https://github.com/1st1/cpython/tree/pep550" rel="noreferrer" target="_blank">https://github.com/1st1/<wbr>cpython/tree/pep550</a><br>
<br>
.. [12] <a href="https://www.python.org/dev/peps/pep-0492/#async-await" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0492/#async-await</a><br>
<br>
.. [13] <a href="https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py" rel="noreferrer" target="_blank">https://github.com/MagicStack/<wbr>uvloop/blob/master/examples/<wbr>bench/echoserver.py</a><br>
<br>
.. [14] <a href="https://github.com/MagicStack/pgbench" rel="noreferrer" target="_blank">https://github.com/MagicStack/<wbr>pgbench</a><br>
<br>
.. [15] <a href="https://github.com/python/performance" rel="noreferrer" target="_blank">https://github.com/python/<wbr>performance</a><br>
<br>
.. [16] <a href="https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c" rel="noreferrer" target="_blank">https://gist.github.com/1st1/<wbr>6b7a614643f91ead3edf37c4451a6b<wbr>4c</a><br>
<br>
.. [17] <a href="https://mail.python.org/pipermail/python-ideas/2017-August/046752.html" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>pipermail/python-ideas/2017-<wbr>August/046752.html</a><br>
<br>
.. [18] <a href="https://mail.python.org/pipermail/python-ideas/2017-August/046772.html" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>pipermail/python-ideas/2017-<wbr>August/046772.html</a><br>
<br>
.. [19] <a href="https://mail.python.org/pipermail/python-ideas/2017-August/046780.html" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>pipermail/python-ideas/2017-<wbr>August/046780.html</a><br>
<br>
<br>
Copyright<br>
=========<br>
<br>
This document has been placed in the public domain.<br>
______________________________<wbr>_________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/<wbr>codeofconduct/</a><br>
</blockquote></div><br></div></div>