
Hi, This is a new PEP to implement Execution Contexts in Python. The PEP is in-flight to python.org, and in the meanwhile can be read on GitHub: https://github.com/python/peps/blob/master/pep-0550.rst (it contains a few diagrams and charts, so please read it there.) Thank you! Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov <yury@magic.io> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for isolating state of generators or asynchronous code because such code shares a single thread. Rationale ========= Traditionally a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. The above code will not work correctly, if a user iterates over the ``calculate()`` generator with different precisions in parallel:: g1 = calculate(100) g2 = calculate(50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision, # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, limited to be supported only by code that was explicitly enabled to work with them. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no option to pass the state explicitly. With this PEP implemented, it should be possible to update a context manager like the below:: _local = threading.local() @contextmanager def context(x): old_x = getattr(_local, 'x', None) _local.x = x try: yield finally: _local.x = old_x to a more robust version that can be reliably used in generators and async/await code, with a simple transformation:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) Specification ============= This proposal introduces a new concept called Execution Context (EC), along with a set of Python APIs and C APIs to interact with it. EC is implemented using an immutable mapping. Every modification of the mapping produces a new copy of it. To illustrate what it means let's compare it to how we work with tuples in Python:: a0 = () a1 = a0 + (1,) a2 = a1 + (2,) # a0 is an empty tuple # a1 is (1,) # a2 is (1, 2) Manipulating an EC object would be similar:: a0 = EC() a1 = a0.set('foo', 'bar') a2 = a1.set('spam', 'ham') # a0 is an empty mapping # a1 is {'foo': 'bar'} # a2 is {'foo': 'bar', 'spam': 'ham'} In CPython, every thread that can execute Python code has a corresponding ``PyThreadState`` object. It encapsulates important runtime information like a pointer to the current frame, and is being used by the ceval loop extensively. We add a new field to ``PyThreadState``, called ``exec_context``, which points to the current EC object. We also introduce a set of APIs to work with Execution Context. In this section we will only cover two functions that are needed to explain how Execution Context works. See the full list of new APIs in the `New APIs`_ section. * ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` in the EC of the executing thread. If not found, return ``default``. * ``sys.set_execution_context_item(key, value)``: get the current EC of the executing thread. Add a ``key``/``value`` item to it, which will produce a new EC object. Set the new object as the current one for the executing thread. In pseudo-code:: tstate = PyThreadState_GET() ec = tstate.exec_context ec2 = ec.set(key, value) tstate.exec_context = ec2 Note, that some important implementation details and optimizations are omitted here, and will be covered in later sections of this PEP. Now let's see how Execution Contexts work with regular multi-threaded code, generators, and coroutines. Regular & Multithreaded Code ---------------------------- For regular Python code, EC behaves just like a thread-local. Any modification of the EC object produces a new one, which is immediately set as the current one for the thread state. .. figure:: pep-0550/functions.png :align: center :width: 90% Figure 1. Execution Context flow in a thread. As Figure 1 illustrates, if a function calls ``set_execution_context_item()``, the modification of the execution context will be visible to all subsequent calls and to the caller:: def set_foo(): set_execution_context_item('foo', 'spam') set_execution_context_item('foo', 'bar') print(get_execution_context_item('foo')) set_foo() print(get_execution_context_item('foo')) # will print: # bar # spam Coroutines ---------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine can be viewed as a different calling convention: Tasks are similar to threads, and awaiting on coroutines within a Task is similar to calling functions within a thread. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same thread. To achieve this, a small set of modifications to the coroutine object is needed: * When a coroutine object is instantiated, it saves a reference to the current execution context object to its ``cr_execution_context`` attribute. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro->cr_isolated_execution_context: # Save a reference to the current execution context old_context = tstate->execution_context # Set our saved execution context as the current # for the current thread. tstate->execution_context = coro->cr_execution_context try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) finally: # Save a reference to the updated execution_context. # We will need it later, when `.send()` or `.throw()` # are called again. coro->cr_execution_context = tstate->execution_context # Restore thread's execution context to what it was before # invoking this coroutine. tstate->execution_context = old_context else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) * ``cr_isolated_execution_context`` is a new attribute on coroutine objects. Set to ``True`` by default, it makes any execution context modifications performed by coroutine to stay visible only to that coroutine. When Python interpreter sees an ``await`` instruction, it flips ``cr_isolated_execution_context`` to ``False`` for the coroutine that is about to be awaited. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task. Because the top-level coroutine (Task) cannot be scheduled with ``await`` (in asyncio you need to call ``loop.create_task()`` or ``asyncio.ensure_future()`` to schedule a Task), all execution context modifications are guaranteed to stay within the Task. * We always work with ``tstate->exec_context``. We use ``coro->cr_execution_context`` only to store coroutine's execution context when it is not executing. Figure 2 below illustrates how execution context mutations work with coroutines. .. figure:: pep-0550/coroutines.png :align: center :width: 90% Figure 2. Execution Context flow in coroutines. In the above diagram: * When "coro1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When it awaits on "coro2", any subsequent changes it does to the execution context are visible to "coro1", but not outside of it. In code:: async def inner_foo(): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', 2) async def foo(): print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 1) await inner_foo() print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) asyncio.get_event_loop().run_until_complete(foo()) print('main:', get_execution_context_item('key')) which will output:: main: spam foo: spam inner_foo: 1 foo: 2 main: spam Generator-based coroutines (generators decorated with ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as native coroutines with regards to execution context management: their ``yield from`` expression is semantically equivalent to ``await``. Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed to the end, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators share 99% of their implementation with coroutines, and thus have similar new attributes ``gi_execution_context`` and ``gi_isolated_execution_context``. Similar to coroutines, generators save a reference to the current execution context when they are instantiated. The have the same implementation of ``.send()`` and ``.throw()`` methods. The only difference is that ``gi_isolated_execution_context`` is always set to ``True``, and is never modified by the interpreter. ``yield from o`` expression in regular generators that are not decorated with ``types.coroutine``, is semantically equivalent to ``for v in o: yield v``. .. figure:: pep-0550/generators.png :align: center :width: 90% Figure 3. Execution Context flow in a generator. In the above diagram: * When "gen1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When "gen2" is created, it saves a reference to the current execution context for it -- "2.1". * Any subsequent execution context updated in "gen2" will only be visible to "gen2". * Likewise, any context changes that "gen1" will do after it created "gen2" will not be visible to "gen2". In code:: def inner_foo(): for i in range(3): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', i) yield i def foo(): set_execution_context_item('key', 'spam') print('foo:', get_execution_context_item('key')) inner = inner_foo() while True: val = next(inner, None) if val is None: break yield val print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) list(foo()) print('main:', get_execution_context_item('key')) which will output:: main: ham foo: spam inner_foo: spam foo: spam inner_foo: 0 foo: spam inner_foo: 1 foo: spam main: ham As we see, any modification of the execution context in a generator is visible only to the generator itself. There is one use-case where it is desired for generators to affect the surrounding execution context: ``contextlib.contextmanager`` decorator. To make the following work:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) we modified ``contextmanager`` to flip ``gi_isolated_execution_context`` flag to ``False`` on its generator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Since Execution Context is implemented on top of ``PyThreadState``, it's easy to add transparent support of it to greenlet. New APIs ======== Even though this PEP adds a number of new APIs, please keep in mind, that most Python users will likely ever use only two of them: ``sys.get_execution_context_item()`` and ``sys.set_execution_context_item()``. Python ------ 1. ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` for the current Execution Context. If not found, return ``default``. 2. ``sys.set_execution_context_item(key, value)``: set ``key``/``value`` item for the current Execution Context. If ``value`` is ``None``, the item will be removed. 3. ``sys.get_execution_context()``: return the current Execution Context object: ``sys.ExecutionContext``. 4. ``sys.set_execution_context(ec)``: set the passed ``sys.ExecutionContext`` instance as a current one for the current thread. 5. ``sys.ExecutionContext`` object. Implementation detail: ``sys.ExecutionContext`` wraps a low-level ``PyExecContextData`` object. ``sys.ExecutionContext`` has a mutable mapping API, abstracting away the real immutable ``PyExecContextData``. * ``ExecutionContext()``: construct a new, empty, execution context. * ``ec.run(func, *args)`` method: run ``func(*args)`` in the ``ec`` execution context. * ``ec[key]``: lookup ``key`` in ``ec`` context. * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and ``ec.copy()`` are similar to that of ``dict`` object. C API ----- C API is different from the Python one because it operates directly on the low-level immutable ``PyExecContextData`` object. 1. New ``PyThreadState->exec_context`` field, pointing to a ``PyExecContextData`` object. 2. ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` similar to ``sys.set_execution_context_item()`` and ``sys.get_execution_context_item()``. 3. ``PyThreadState_GetExecContext``: similar to ``sys.get_execution_context()``. Always returns an ``PyExecContextData`` object. If ``PyThreadState->exec_context`` is ``NULL`` an new and empty one will be created and assigned to ``PyThreadState->exec_context``. 4. ``PyThreadState_SetExecContext``: similar to ``sys.set_execution_context()``. 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` object. 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. The exact layout ``PyExecContextData`` is private, which allows to switch it to a different implementation later. More on that in the `Implementation Details`_ section. Modifications in Standard Library ================================= * ``contextlib.contextmanager`` was updated to flip the new ``gi_isolated_execution_context`` attribute on the generator. * ``asyncio.events.Handle`` object now captures the current execution context when it is created, and uses the saved execution context to run the callback (with ``ExecutionContext.run()`` method.) This makes ``loop.call_soon()`` to run callbacks in the execution context they were scheduled. No modifications in ``asyncio.Task`` or ``asyncio.Future`` were necessary. Some standard library modules like ``warnings`` and ``decimal`` can be updated to use new execution contexts. This will be considered in separate issues if this PEP is accepted. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Performance =========== Implementation Details ---------------------- The new ``PyExecContextData`` object is wrapping a ``dict`` object. Any modification requires creating a shallow copy of the dict. While working on the reference implementation of this PEP, we were able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for details. .. figure:: pep-0550/dict_copy.png :align: center :width: 100% Figure 4. Figure 4 shows that the performance of immutable dict implemented with shallow copying is expectedly O(n) for the ``set()`` operation. However, this is tolerable until dict has more than 100 items (1 ``set()`` takes about a microsecond.) Judging by the number of modules that need EC in Standard Library it is likely that real world Python applications will use significantly less than 100 execution context variables. The important point is that the cost of accessing a key in Execution Context is always O(1). If the ``set()`` operation performance is a major concern, we discuss alternative approaches that have O(1) or close ``set()`` performance in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and `Copy-on-write Execution Context`_ sections. Generators and Coroutines ------------------------- Using a microbenchmark for generators and coroutines from :pep:`492` ([12]_), it was possible to observe 0.5 to 1% performance degradation. asyncio echoserver microbechmarks from the uvloop project [13]_ showed 1-1.5% performance degradation for asyncio code. asyncpg benchmarks [14]_, that execute more code and are closer to a real-world application did not exhibit any noticeable performance change. Overall Performance Impact -------------------------- The total number of changed lines in the ceval loop is 2 -- in the ``YIELD_FROM`` opcode implementation. Only performance of generators and coroutines can be affected by the proposal. This was confirmed by running Python Performance Benchmark Suite [15]_, which demonstrated that there is no difference between 3.7 master branch and this PEP reference implementation branch (full benchmark results can be found here [16]_.) Design Considerations ===================== Alternative Immutable Dict Implementation ----------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()`` and ``get()`` operations, which will be essentially O(1) for relatively small mappings in EC. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550/hamt_vs_dict.png :align: center :width: 100% Figure 5. Benchmark code can be found here: [9]_. Figure 5 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550/lookup_hamt.png :align: center :width: 100% Figure 6. Benchmark code can be found here: [10]_. Figure 6 below shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that the current approach with implementing an immutable mapping with shallow-copying dict will likely perform adequately in real-life applications. The HAMT solution is more future proof, however. The proposed API is designed in such a way that the underlying implementation of the mapping can be changed completely without affecting the Execution Context `Specification`_, which allows us to switch to HAMT at some point if necessary. Copy-on-write Execution Context ------------------------------- The implementation of Execution Context in .NET is different from this PEP. .NET uses copy-on-write mechanism and a regular mutable mapping. One way to implement this in CPython would be to have two new fields in ``PyThreadState``: * ``exec_context`` pointing to the current Execution Context mapping; * ``exec_context_copy_on_write`` flag, set to ``0`` initially. The idea is that whenever we are modifying the EC, the copy-on-write flag is checked, and if it is set to ``1``, the EC is copied. Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` methods described in the `Coroutines`_ section will be almost the same, except that in addition to the ``gi_execution_context`` they will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine or a generator starts, the flag will be set to ``1``. This will ensure that any modification of the EC performed within a coroutine or a generator will be isolated. This approach has one advantage: * For Execution Context that contains a large number of items, copy-on-write is a more efficient solution than the shallow-copy dict approach. However, we believe that copy-on-write disadvantages are more important to consider: * Copy-on-write behaviour for generators and coroutines makes EC semantics less predictable. With immutable EC approach, generators and coroutines always execute in the EC that was current at the moment of their creation. Any modifications to the outer EC while a generator or a coroutine is executing are not visible to them:: def generator(): yield 1 print(get_execution_context_item('key')) yield 2 set_execution_context_item('key', 'spam') gen = iter(generator()) next(gen) set_execution_context_item('key', 'ham') next(gen) The above script will always print 'spam' with immutable EC. With a copy-on-write approach, the above script will print 'ham'. Now, consider that ``generator()`` was refactored to call some library function, that uses Execution Context:: def generator(): yield 1 some_function_that_uses_decimal_context() print(get_execution_context_item('key')) yield 2 Now, the script will print 'spam', because ``some_function_that_uses_decimal_context`` forced the EC to copy, and ``set_execution_context_item('key', 'ham')`` line did not affect the ``generator()`` code after all. * Similarly to the previous point, ``sys.ExecutionContext.run()`` method will also become less predictable, as ``sys.get_execution_context()`` would still return a reference to the current mutable EC. We can't modify ``sys.get_execution_context()`` to return a shallow copy of the current EC, because this would seriously harm performance of ``asyncio.call_soon()`` and similar places, where it is important to propagate the Execution Context. * Even though copy-on-write requires to shallow copy the execution context object less frequently, copying will still take place in coroutines and generators. In which case, HAMT approach will perform better for medium to large sized execution contexts. All in all, we believe that the copy-on-write approach introduces very subtle corner cases that could lead to bugs that are exceptionally hard to discover and fix. The immutable EC solution in comparison is always predictable and easy to reason about. Therefore we believe that any slight performance gain that the copy-on-write solution might offer is not worth it. Faster C API ------------ Packages like numpy and standard library modules like decimal need to frequently query the global state for some local context configuration. It is important that the APIs that they use is as fast as possible. The proposed ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` functions need to get the current thread state with ``PyThreadState_GET()`` (fast) and then perform a hash lookup (relatively slow). We can eliminate the hash lookup by adding three additional C API functions: * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` introduced :pep:`523`. The idea is to request a unique index that can later be used to lookup context items. The ``key_name`` can later be used by ``sys.ExecutionContext`` to introspect items added with this API. * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` to request an item by its index, avoiding the cost of hash lookup. Why setting a key to None removes the item? ------------------------------------------- Consider a context manager:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) With ``set_execution_context_item(key, None)`` call removing the ``key``, the user doesn't need to write additional code to remove the ``key`` if it wasn't in the execution context already. An alternative design with ``del_execution_context_item()`` method would look like the following:: @contextmanager def context(x): not_there = object() old_x = get_execution_context_item('x', not_there) set_execution_context_item('x', x) try: yield finally: if old_x is not_there: del_execution_context_item('x') else: set_execution_context_item('x', old_x) Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __suspend__(self): set_execution_context_item('x', self.old_x) def __resume__(self): set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. It also does not solve the leaking state problem for greenlet/gevent. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Reference Implementation ======================== The reference implementation can be found here: [11]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.a... .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap... .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.p... .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c Copyright ========= This document has been placed in the public domain.

[duplicating my reply cc-ing python-ideas]
Is a new EC type really needed? Cannot this be done with collections.ChainMap?
No, not really. ChainMap will have O(N) lookup performance where N is the number of contexts you have in the chain. This will degrade performance of lookups, which isn't acceptable for some potential EC users like decimal/numpy/etc. Inventing heuristics to manage the chain size is harder than making an immutable dict (which is easy to reason about.) Chaining contexts will also force then to reference each other, creating cycles that GC won't be able to break. Besides just performance considerations, with ChainMap design of contexts it's not possible to properly isolate state changes inside of generators or coroutines/tasks as it's done in the PEP. All in all, I don't think that chaining can solve the problem. It will likely lead to a more complicated solution in the end (this was my initial approach FWIW). Yury

This is exciting and I'm happy that you're addressing this problem. We've solved a similar problem in our asynchronous programming framework, asynq. Our solution (implemented at https://github.com/quora/asynq/blob/master/asynq/contexts.py) is similar to that in PEP 521: we enhance the context manager protocol with pause/resume methods instead of using an enhanced form of thread-local state. Some of our use cases can't be implemented using this PEP; notably, we use a timing context that times how long an asynchronous function takes by repeatedly pausing and resuming the timer. However, this timing context adds significant overhead because we have to call the pause/resume methods so often. Overall, your approach is almost certainly more performant. 2017-08-11 15:37 GMT-07:00 Yury Selivanov <yselivanov.ml@gmail.com>:

This is exciting and I'm happy that you're addressing this problem.
Thank you!
Some of our use cases can't be implemented using this PEP; notably, we use a timing context that times how long an asynchronous function takes by repeatedly pausing and resuming the timer.
Measuring performance of coroutines is a bit different kind of problem. With PEP 550 you will be able to decouple context management from collecting performance data. That would allow you to subclass asyncio.Task (let's call it InstrumentedTask) and implement all extra tracing functionality on it (by overriding its _send method for example). Then you could set a custom task factory that would use InstrumentedTask only for a fraction of requests. That would make it possible to collect performance metrics even in production (my 2c). Yury

On Aug 11, 2017 16:38, "Yury Selivanov" <yselivanov.ml@gmail.com> wrote: Hi, This is a new PEP to implement Execution Contexts in Python. Nice! I've had something like this on the back burner for a while as it helps solve some problems with encapsulating the import state (e.g. PEP 408). -eric

On 12 August 2017 at 15:45, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Thanks Eric!
PEP 408 -- Standard library __preview__ package?
Typo in the PEP number: PEP 406, which was an ultimately failed attempt to get away from the reliance on process globals to manage the import system by encapsulating the top level state as an "Import Engine": https://www.python.org/dev/peps/pep-0406/ We still like the idea in principle (hence the Withdrawn status rather then being Rejected), but someone needs to find time to take a run at designing a new version of it atop the cleaner PEP 451 import plugin API (hence why the *specific* proposal in PEP 406 has been withdrawn). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I may have missed this (I've just skimmed the doc), but what's the rationale for making the EC an *immutable* mapping? It's impressive that you managed to create a faster immutable dict, but why does the use case need one? -- --Guido van Rossum (python.org/~guido)

On Fri, Aug 11, 2017 at 10:17 PM, Guido van Rossum <guido@python.org> wrote:
In this proposal, you have lots and lots of semantically distinct ECs. Potentially every stack frame has its own (at least in async code). So instead of copying the EC every time they create a new one, they want to copy it when it's written to. This is a win if writes are relatively rare compared to the creation of ECs. You could probably optimize it a bit more by checking the refcnt before writing, and skipping the copy if it's exactly 1. But even simpler is to just always copy and throw away the old version. -n -- Nathaniel J. Smith -- https://vorpus.org

[replying to the list]
I may have missed this (I've just skimmed the doc), but what's the rationale for making the EC an *immutable* mapping?
It's possible to implement Execution Context with a mutable mapping and copy-on-write (as it's done in .NET) This is one of the approaches that I tried and I discovered that it causes a bunch of subtle inconsistencies in contexts for generators and coroutines. I've tried to cover this here: https://www.python.org/dev/peps/pep-0550/#copy-on-write-execution-context All in all, I believe that the immutable mapping approach gives the most predictable and easy to reason about model. If its performance on large number of items in EC is a concern, I'll be happy to implement it using HAMT (also covered in the PEP). Yury

Hi Yury, This is really cool. Some notes on a first read: 1. Excellent work on optimizing dict, that seems valuable independent of the rest of the details here. 2. The text doesn't mention async generators at all. I assume they also have an agi_isolated_execution_context flag that can be set, to enable @asyncontextmanager? 2a. Speaking of which I wonder if it's possible for async_generator to emulate this flag... I don't know if this matters -- at this point the main reason to use async_generator is for code that wants to support PyPy. If PyPy gains native async generator support before CPython 3.7 comes out then async_generator may be entirely irrelevant before PEP 550 matters. But right now async_generator is still quite handy... 2b. BTW, the contextmanager trick is quite nice -- I actually noticed last week that PEP 521 had a problem here, but didn't think of a solution :-). 3. You're right that numpy is *very* performance sensitive about accessing the context -- the errstate object is needed extremely frequently, even on trivial operations like adding two scalars, so a dict lookup is very noticeable. (Imagine adding a dict lookup to float.__add__.) Right now, the errstate object get stored in the threadstate dict, and then there are some dubious-looking hacks involving a global (not thread-local) counter to let us skip the lookup entirely if we think that no errstate object has been set. Really what we ought to be doing (currently, in a non PEP 550 world) is storing the errstate in a __thread variable -- it'd certainly be worth it. Adopting PEP 550 would definitely be easier if we knew that it wasn't ruling out that level of optimization. 4. I'm worried that all of your examples use string keys. One of the great things about threading.local objects is that each one is a new namespace, which is a honking great idea -- here it prevents accidental collisions between unrelated libraries. And while it's possible to implement threading.local in terms of the threadstate dict (that's how they work now!), it requires some extremely finicky code to get the memory management right: https://github.com/python/cpython/blob/dadca480c5b7c5cf425d423316cd695bc5db3... It seems like you're imagining that this API will be used directly by user code? Is that true? ...Are you sure that's a good idea? Are we just assuming that not many keys will be used and the keys will generally be immortal anyway, so leaking entries is OK? Maybe this is nit-picking, but this is hooking into the language semantics in such a deep way that I sorta feel like it would be bad to end up with something where we can never get garbage collection right. The suggested index-based API for super fast C lookup also has this problem, but that would be such a low-level API -- and not part of the language definition -- that the right answer is probably just to document that there's no way to unallocate indices so any given C library should only allocate, like... 1 of them. Maybe provide an explicit API to release an index, if we really want to get fancy. 5. Is there some performance-related reason that the API for getting/setting isn't just sys.get_execution_context()[...] = ...? Or even sys.execution_context[...]? 5a. Speaking of which I'm not a big fan of the None-means-delete behavior. Not only does Python have a nice standard way to describe all the mapping operations without such hacks, but you're actually implementing that whole interface anyway. Why not use it? 6. Should Thread.start inherit the execution context from the spawning thread? 7. Compatibility: it does sort of break 3rd party contextmanager implementations (contextlib2, asyncio_extras's acontextmanager, trio's internal acontextmanager, ...). This is extremely minor though. 8. You discuss how this works for asyncio and gevent. Have you looked at how it will interact with tornado's context handling system? Can they use this? It's the most important extant context implementation I can think of (aside from thread local storage itself). 9. OK, my big question, about semantics. The PEP's design is based on the assumption that all context-local state is scalar-like, and contexts split but never join. But there are some cases where this isn't true, in particular for values that have "stack-like" semantics. These are terms I just made up, but let me give some examples. Python's sys.exc_info is one. Another I ran into recently is for trio's cancel scopes. So basically the background is, in trio you can wrap a context manager around any arbitrary chunk of code and then set a timeout or explicitly cancel that code. It's called a "cancel scope". These are fully nestable. Full details here: https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-t... Currently, the implementation involves keeping a stack of cancel scopes in Task-local storage. This works fine for regular async code because when we switch Tasks, we also switch the cancel scope stack. But of course it falls apart for generators/async generators: async def agen(): with fail_after(10): # 10 second timeout for finishing this block await some_blocking_operation() yield await another_blocking_operation() async def caller(): with fail_after(20): ag = agen() await ag.__anext__() # now that cancel scope is on the stack, even though we're not # inside the context manager! this will not end well. await some_blocking_operation() # this might get cancelled when it shouldn't # even if it doesn't, we'll crash here when exiting the context manager # because we try to pop a cancel scope that isn't at the top of the stack So I was thinking about whether I could implement this using PEP 550. It requires some cleverness, but I could switch to representing the stack as a singly-linked list, and then snapshot it and pass it back to the coroutine runner every time I yield. That would fix the case above. But, I think there's another case that's kind of a showstopper. async def agen(): await some_blocking_operation() yield async def caller(): ag = agen() # context is captured here with fail_after(10): await ag.__anext__() Currently this case works correctly: the timeout is applied to the __anext__ call, as you'd expect. But with PEP 550, it wouldn't work: the generator's timeouts would all be fixed when it was instantiated, and we wouldn't be able to detect that the second call has a timeout imposed on it. So that's a pretty nasty footgun. Any time you have code that's supposed to have a timeout applied, but in fact has no timeout applied, then that's a really serious bug -- it can lead to hangs, trivial DoS, pagers going off, etc. Another problem is code like: async def caller(): with fail_after(10): ag = agen() # then exit the scope Can we clean up the cancel scope? (e.g., remove it from the global priority queue that tracks timeouts?) Normally yes, that's what __exit__ blocks are for, letting you know deterministically that an object can be cleaned up. But here it got captured by the async generator. I really don't want to have to rely on the GC, because on PyPy it means that we could leak an unbounded number of cancel scopes for a finite but unbounded number of time, and all those extra entries in the global timeout priority queue aren't free. (And sys.exc_info has had buggy behavior in analogous situations.) So, I'm wondering if you (or anyone) have any ideas how to fix this :-). Technically, PEP 521 is powerful enough to do it, but in practice the performance would be catastrophically bad. It's one thing to have some extra cost to yielding out of an np.errstate block, those are rare and yielding out of them is rare. But cancel scopes are different: essentially all code in trio runs inside one or more of them, so every coroutine suspend/resume would have to call all those suspend/resume hooks up and down the stack. OTOH PEP 550 is fast, but AFAICT its semantics are wrong for this use case. The basic invariant I want is: if at any given moment you stop and take a backtrace, and then look at the syntactic surroundings of each line in the backtrace and write down a list of all the 'with' blocks that the code *looks* like it's inside, then context lookups should give the same result as they would if you simply entered all of those with blocks in order. Generators make it tricky to maintain this invariant, because a generator frame's backtrace changes every time you call next(). But those are the semantics that make the most sense to me, and seem least surprising in practice. These are also IIUC the semantics that exc_info is supposed to follow (though historically the interaction of exc_info and generators has had lots of bugs, not sure if that's been fixed or not). ...and now that I've written that down, I sort of feel like that might be what you want for all the other sorts of context object too? Like, here's a convoluted example: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") print(a + b) yield print(a + b) def caller(): # let's pretend this context manager exists, the actual API is more complicated with decimal_context_precision(3): g = gen() with decimal_context_precision(2): next(g) with decimal_context_precision(1): next(g) Currently, this will print "3.3 3", because when the generator is resumed it inherits the context of the resuming site. With PEP 550, it would print "3.33 3.33" (or maybe "3.3 3.3"? it's not totally clear from the text), because it inherits the context when the generator is created and then ignores the calling context. It's hard to get strong intuitions, but I feel like the current behavior is actually more sensible -- each time the generator gets resumed, the next bit of code runs in the context of whoever called next(), and the generator is just passively inheriting context, so ... that makes sense. OTOH of course if you change the generator code to: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") with decimal_context_precision(4): print(a + b) yield print(a + b) then it should print "3.333 3.333", because the generator is overriding the caller -- now when we resume the frame we're re-entering the decimal_context_precision(4) block, so it should take priority. So ... maybe all context variables are "stack-like"? -n -- Nathaniel J. Smith -- https://vorpus.org

On 12 August 2017 at 17:54, Nathaniel Smith <njs@pobox.com> wrote:
Now that you raise this point, I think it means that generators need to retain their current context inheritance behaviour, simply for backwards compatibility purposes. This means that the case we need to enable is the one where the generator *doesn't* dynamically adjust its execution context to match that of the calling function. One way that could work (using the cr_back/gi_back convention I suggested): - generators start with gi_back not set - if gi_back is NULL/None, gi.send() and gi.throw() set it to the calling frame for the duration of the synchronous call and *don't* adjust the execution context (i.e. the inverse of coroutine behaviour) - if gi_back is already set, then gi.send() and gi.throw() *do* save and restore the execution context around synchronous calls in to the generator frame To create an autonomous generator (i.e. one that didn't dynamically update its execution context), you'd use a decorator like: def autonomous_generator(gf): @functools.wraps(gf) def wrapper(*args, **kwds): gi = genfunc(*args, **kwds) gi.gi_back = gi.gi_frame return gi return wrapper Asynchronous generators would then work like synchronous generators: ag_back would be NULL/None by default, and dynamically set for the duration of each __anext__ call. If you wanted to create an autonomous one, you'd make it's back reference a circular reference to itself to disable the implicit dynamic updates. When I put it in those terms though, I think the cr_back/gi_back/ag_back idea should actually be orthogonal to the "revert_context" flag (so you can record the link back to the caller even when maintaining an autonomous context). Given that, you'd have the following initial states for "revert context" (currently called "isolated context" in the PEP): * unawaited coroutines: true (same as PEP) * awaited coroutines: false (same as PEP) * generators (both sync & async): false (opposite of current PEP) * autonomous generators: true (set "gi_revert_context" or "ag_revert_context" explicitly) Open question: whether having "yield" inside a with statement implies the creation of an autonomous generator (synchronous or otherwise), or whether you'd need a decorator to get your context management right in such cases. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick, Nathaniel, I'll be replying in full to your emails when I have time to do some experiments. Now I just want to address one point that I think is important: On Sat, Aug 12, 2017 at 1:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Nobody *intentionally* iterates a generator manually in different decimal contexts (or any other contexts). This is an extremely error prone thing to do, because one refactoring of generator -- rearranging yields -- would wreck your custom iteration/context logic. I don't think that any real code relies on this, and I don't think that we are breaking backwards compatibility here in any way. How many users need about this? If someone does need this, it's possible to flip `gi_isolated_execution_context` to `False` (as contextmanager does now) and get this behaviour. This might be needed for frameworks like Tornado which support coroutines via generators without 'yield from', but I'll have to verify this. What I'm saying here, is that any sort of context leaking *into* or *out of* generator *while* it is iterating will likely cause only bugs or undefined behaviour. Take a look at the precision example in the Rationale section of the PEP. Most of the time generators are created and are iterated in the same spot, you rarely create generator closures. One way the behaviour could be changed, however, is to capture the execution context when it's first iterated (as opposed to when it's instantiated), but I don't think it makes any real difference. Another idea: in one of my initial PEP implementations, I exposed gen.gi_execution_context (same for coroutines) to python as read/write attribute. That allowed to (a) get the execution context out of generator (for introspection or other purposes); (b) inject execution context for event loops; for instance asyncio.Task could do that for some purpose. Maybe this would be useful for someone who wants to mess with generators and contexts. [..]
Nick, I still have to fully grasp the idea of `gi_back`, but one quick thing: I specifically designed the PEP to avoid touching frames. The current design only needs TLS and a little help from the interpreter/core objects adjusting that TLS. It should be very straightforward to implement the PEP in any interpreter (with JIT or without) or compilers like Cython. [..]
If generators do not isolate their context, then the example in the Rationale section will not work as expected (or am I missing something?). Fixing generators state leak was one of the main goals of the PEP. Yury

On 13 August 2017 at 03:53, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
I think this is a reasonable stance for the PEP to take, but the hidden execution state around the "isolated or not" behaviour still bothers me. In some ways it reminds me of the way function parameters work: the bound parameters are effectively a *shallow* copy of the passed arguments, so callers can decide whether or not they want the callee to be able to modify them based on the arguments' mutability (or lack thereof). The execution context proposal uses copy-on-write semantics for runtime efficiency, but it's essentially the same shallow copy concept applied to __next__(), send() and throw() operations (and perhaps __anext__(), asend(), and athrow() - I haven't wrapped my head around the implications for async generators and context managers yet). That similarity makes me wonder whether the "isolated or not" behaviour could be moved from the object being executed and directly into the key/value pairs themselves based on whether or not the values were mutable, as that's the way function calls work: if the argument is immutable, the callee *can't* change it, while if it's mutable, the callee can mutate it, but it still can't rebind it to refer to a different object. The way I'd see that working with an always-reverted copy-on-write execution context: 1. If a parent context wants child contexts to be able to make changes, then it should put a *mutable* object in the context (e.g. a list or class instance) 2. If a parent context *does not* want child contexts to be able to make changes, then it should put an *immutable* object in the context (e.g. a tuple or number) 3. If a child context *wants to share a context key with its parent, then it should *mutate* it in place 4. If a child context *does not* want to share a context key with its parent, then it should *rebind* it to a different object That way, instead of reverted-or-not-reverted being an all-or-nothing interpreter level decision, it can be made on a key-by-key basis by choosing whether or not to use a mutable value. To make that a little less abstract, consider a concrete example like setting a "my_web_framework.request" key: 1. The step of *setting* the key will *not* be shared with the parent context, as that modifies the underlying copy-on-write namespace, and will hence be reverted when control is passed back to the parent 2. Any *mutation* of the request object *will* be shared, since mutating the value doesn't have any effect on the copy-on-write namespace Nathaniel's example of wanting stack-like behaviour could be modeled using tuples as values: when the child context appends to the tuple, it will necessarily have to create a new tuple and rebind the corresponding key, causing the changes to be invisible to the parent context. The contextlib.contextmanager use case could then be modeled as a *separate* method that skipped the save/revert context management step (e.g. "send_with_shared_context", "throw_with_shared_context")
Working through this above, I think the key points that bother me about the stateful revert-or-not setting is that whether or not context reversion is desirable depends mainly on two things: - the specific key in question (indicated by mutable vs immutable values) - the intent of the code in the parent context (which could be indicated by calling different methods) It *doesn't* seem to be an inherent property of a given generator or coroutine, except insofar as there's a correlation between the code that creates generators & coroutines and the code that subsequently invokes them.
Yeah, this would be useful, and could potentially avoid the need to expose a parallel set of "*_with_shared_context" methods - instead, contextlib.contextmanager could just invoke the underlying generator with an isolated context, and then set the parent context to the generator's one if it changed.
I think you can just ignore that idea for now, as I've convinced myself it's orthogonal to the question of how we handle execution contexts.
Agreed - see above :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 10:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Mutable default values for function arguments is one of the most confusing things to its users. I've seen numerous threads on StackOverflow/Reddit with people complaining about it.
I'm afraid that if we design EC context to behave differently for mutable/immutable values, it will be an even harder thing to understand to end users.
It's possible to put mutable values even with the current PEP 550 API. The issue that Nathaniel has with it, is that he actually wants the API to behave exactly like it does to implement his timeouts logic, but: there's a corner case, where isolating generator state at the time when it is created doesn't work in his favor. FWIW I believe that I now have a complete solution for the generator.send() problem that will make it possible for Nathaniel to implement his Trio APIs. The functional PoC is here: https://github.com/1st1/cpython/tree/pep550_gen The key change is to make generators and asynchronous generators to: 1. Have their own empty execution context when created. It will be used for whatever local modifications they do to it, ensuring that their state never escapes to the outside world (gi_isolated_execution_context flag is still here for contextmanager). 2. ExecutionContext has a new internal pointer called ec_back. In the Generator.send/throw method, ec_back is dynamically set to the current execution context. 3. This makes it possible for generators to see any outside changes in the execution context *and* have their own, where they can make *local* changes. So (pseudo-code): def gen(): print('1', context) yield print('2', context) with context(spam=ham): yield print('3', context) yield print('4', context) yield g = gen() context(foo=1, spam='bar') next(g) context(foo=2) next(g) context(foo=3) next(g) context(foo=4) next(g) will print: 1 {foo=1, spam=bar} 2 {foo=2, spam=bar} 3 {foo=3, spam=ham} 4 {foo=4, spam=bar} There are some downsides to the approach, mainly from the performance standpoint, but in a common case they will be negligible, if detectable at all. Yury

I'll start a new thread to discuss is we want this specific semantics change soon (with some updates). Yury

On 14 August 2017 at 02:33, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
There's nothing to design, as storing a list (or other mutable object) in an EC will necessarily be the same as storing one in a tuple: the fact you acquired the reference via an immutable container will do *nothing* to keep you from mutating the referenced object. And for use cases like web requests, that's exactly the behaviour we want - changing the active web request is an EC level operation, but making changes to the state of the currently active request (e.g. in a middleware processor) won't require anything special. [I'm going to snip the rest of the post, as it sounds pretty reasonable to me, and my questions about the interaction between sys.set_execution_context() and ec_back go away if sys.set_execution_context() doesn't exist as you're currently proposing]
(gi_isolated_execution_context flag is still here for contextmanager).
This hidden flag variable on the types managing suspendable frames is still the piece of the proposal that strikes me as being the most potentially problematic, as it at least doubles the number of flows of control that need to be tested. Essentially what we're aiming to model is: 1. Performing operations in a way that modifies the active execution context 2. Performing them in a way that saves & restores the execution context For synchronous calls, this distinction is straightforward: - plain calls may alter the active execution context via state mutation - use ec.run() to save/restore the execution context around the operation (The ec_back idea means we may also need an "ec.run()" variant that sets ec_back appropriately before making the call - for example, "ec.run()" could set ec_back, while a separate "ec.run_isolated()" could skip setting it. Alternatively, full isolation could be the default, and "ec.run_shared()" would set ec_back. If we go with the latter option, then "ec_shared" might be a better attribute name than "ec_back") A function can be marked as always having its own private context using a decorator like so: def private_context(f) @functools.wraps(f) def wrapper(*args, **kwds): ec = sys.get_active_context() return ec.run(f, *args, **kwds) return wrapper For next/send/throw and anext/asend/athrow, however, the proposal is to bake the save/restore into the *target objects*, rather than having to request it explicitly in the way those objects get called. This means that unless we apply some implicit decorator magic to the affected slot definitions, there's now going to be a major behavioural difference between: some_state = sys.new_context_item() def local_state_changer(x): for i in range(x): some_state.set(x) yield x class ParentStateChanger: def __init__(self, x): self._itr = iter(range(x)) def __iter__(self): return self def __next__(self): x = next(self._itr) some_state.set(x) return x The latter would need the equivalent of `@private_context` on the `__next__` method definition to get the behaviour that generators would have by default (and similarly for __anext__ and asynchronous generators). I haven't fully thought through the implications of this problem yet, but some initial unordered thoughts: - implicit method decorators are always suspicious, but skipping them in this case feels like we'd be setting up developers of custom iterators for really subtle context management bugs - contextlib's own helper classes would be fine, since they define __enter__ & __exit__, which wouldn't be affected by this - for lru_cache, we rely on `__wrapped__` to get access to the underlying function without caching applied. Might it make sense to do something similar for these implicitly context-restoring methods? If so, should we use a dedicated name so that additional wrapper layers don't overwrite it? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Could someone (perhaps in a new thread?) summarize the current proposal, with some examples of how typical use cases would look? This is an important topic but the discussion is way too voluminous for me to follow while I'm on vacation with my family, and the PEP spends too many words on motivation and not enough on crisply explaining how the proposed feature works (what state is stored where how it's accessed, and how it's manipulated behind the scenes). -- --Guido van Rossum (python.org/~guido)

Nick, you nailed it with your example. In short: current PEP 550 defines Execution Context in such a way, that generators and iterators will interact differently with it. That means that it won't be possible to refactor an iterator class to a generator and that's not acceptable. I'll be rewriting the whole specification section of the PEP today. Yury

On 15 August 2017 at 05:25, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Trying to summarise something I thought of this morning regarding ec_back and implicitly isolating iterator contexts: With the notion of generators running with their own private context by default, that means the state needed to call __next__ on the generator is as follows: - current thread EC - generator's private EC (stored on the generator) - the generator's __next__ method This means that if the EC manipulation were to live in the next() builtin rather than in the individual __next__() methods, then this can be made a general context isolation protocol: - provide a `sys.create_execution_context()` interface - set `__private_context__` on your iterable if you want `next()` to use `ec.run()` (and update __private_context__ afterwards) - set `__private_context__ = None` if you want `next()` to just call `obj.__next__()` directly - generators have __private_context__ set by default, but wrappers like contextlib.contextmanager can clear it That would also suggest that ec.run() will need to return a 2-tuple: def run(self, f: Callable, *args, **kwds) -> Tuple[Any, ExecutionContext]: """Run the given function in this execution context Returns a 2-tuple containing the function result and the execution context that was active when the function returned. """ That way next(itr) will be able to update itr.__private_context__ appropriately if it was initially set and the call changes the active context. We could then give send(), throw() and their asynchronous counterparts the builtin+protocol method treatment, and put the EC manipulation in their builtins as well. Anyway, potentially a useful option to consider as you work on revising the proposal - I'll refrain from further comments until you have an updated draft available :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, Thanks for writing this! You reminded me that it's crucial to have an ability to fully recreate generator behaviour in an iterator. Besides this being a requirement for a complete EC model, it is something that compilers like Cython absolutely need. I'm still working on a rewrite (which is now a completely different PEP), will probably finish it today. Yury

Nathaniel, Nick, I'll reply only to point 9 in this email to split this threads into manageable sub-threads. I'll cover other points in later emails. On Sat, Aug 12, 2017 at 3:54 AM, Nathaniel Smith <njs@pobox.com> wrote:
9. OK, my big question, about semantics.
FWIW I took me a good hour to fully understand what you are doing with "fail_after" and what you want from PEP 550, and the actual associated problems with generators :)
As you yourself show below, it's easy to implement stacks with the proposed EC spec. A linked list will work good enough.
Right. So the task always knows the EC at the point of "yield". It can then get the latest timeout from it and act accordingly if that yield did not resume in time. This should work.
As I tried to explain in my last email, I generally don't believe that people would do this partial iteration with timeouts or other contexts around it. The only use case I can come up so far is implementing some sort of receiver using an AG, and then "listening" on it through "__anext__" calls. But the case is interesting nevertheless, and maybe we can fix it without relaxing any guarantees of the PEP. The idea that I have is to allow linking of ExecutionContext (this is similar in a way to what Nick proposed, but has a stricter semantics): 1. The internal ExecutionContext object will have a new "back" attribute. 2. For regular code and coroutines everything that is already in the PEP will stay the same. 3. For generators and asynchronous generators, when a generator is created, an empty ExecutionContext will be created for it, with its "back" attribute pointing to the current EC. 4. The lookup function will be adjusted to to check the "EC.back" if the key is not found in the current EC. 5. The max level of "back" chain will be 1. 6. When a generator is created inside another generator, it will inherit another generator's EC. Because contexts are immutable this should be OK. 7. When a coroutine is created inside an EC with a "back" link, it will merge EC and EC.back in one new EC. Merge can be done very efficiently for HAMT mappings which I believe we will end up using for this anyways (an O(log32 N) operation). An illustration of what it will allow: def gen(): yield with context(key='spam'): yield yield g = gen() context(key=1) g.send(None) # The code around first yield will see "key=1" context(key=2) g.send(None) # The code around second yield will see "key=spam" context(key=3) g.send(None) # The code around thrird yield will see "key=3" Essentially, it makes generators "transparent" to the outside context changes, but OTOH fully isolate their local context changes from the outside world. This should solve the "fail_after" over a generator case. Nathaniel and Nick, what do you think? Yury

On 12 August 2017 at 08:37, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
The fully rendered version is also up now: https://www.python.org/dev/peps/pep-0550/ Thanks for this! The general approach looks good to me, so I just have some questions about specifics of the API: 1. Are you sure you want to expose the CoW type to pure Python code? The draft API looks fairly error prone to me, as I'm not sure of the intended differences in behaviour between the following: @contextmanager def context(x): old_x = sys.get_execution_context_item('x') sys.set_execution_context_item('x', x) try: yield finally: sys.set_execution_context_item('x', old_x) @contextmanager def context(x): old_x = sys.get_execution_context().get('x') sys.get_execution_context()['x'] = x try: yield finally: sys.get_execution_context()['x'] = old_x @contextmanager def context(x): ec = sys.get_execution_context() old_x = ec.get('x') ec['x'] = x try: yield finally: ec['x'] = old_x It seems to me that everything would be a lot safer if the *only* Python level API was a live dynamic view that completely hid the copy-on-write behaviour behind an "ExecutionContextProxy" type, such that the last two examples were functionally equivalent to each other and to the current PEP's get/set functions (rendering the latter redundant, and allowing it to be dropped from the PEP). If Python code wanted a snapshot of the current state, it would need to call sys.get_execution_context().copy(), which would give it a plain dictionary containing a shallow copy of the execution context at that particular point in time. If there's a genuine need to expose the raw copy-on-write machinery to Python level code (e.g. for asyncio's benefit), then that could be more clearly marked as "here be dragons" territory that most folks aren't going to want to touch (e.g. "sys.get_raw_execution_context()") 2. Do we need an ag_isolated_execution_context for asynchronous generators? (Modify this question as needed for the answer to the next question) 3. It bothers me that *_execution_context points to an actual execution context, while *_isolated_execution_context is a boolean. With names that similar I'd expect them to point to the same kind of object. Would it work to adjust that setting to say that rather than being an "isolated/not isolated" boolean, we instead made it a cr_back reverse pointer to the awaiting coroutine (akin to f_back in the frame stack), such that we had a doubly-linked list that defined the coroutine call stacks via their cr_await and cr_back attributes? If we did that, we'd have: Top-level Task: cr_back -> NULL (C) or None (Python) Awaited coroutine: cr_back -> coroutine that awaited this one (which would in turn have a cr_await reference back to here) coroutine.send()/throw() would then save and restore the execution context around the call if cr_back was NULL/None (equivalent to isolated==True in the current PEP), and leave it alone otherwise (equivalent to isolated==False). For generators, gi_back would normally be NULL/None (since we don't typically couple regular generators to a single managing object), but could be set appropriately by types.coroutine when the generator-based coroutine is awaited, and by contextlib.contextmanager before starting the underlying generator. (It may even make sense to break the naming symmetry for that attribute, and call it something like "gi_owner", since generators don't form a clean await-based logical call chain the way native coroutines do). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 10:12 AM, Nick Coghlan <ncoghlan@gmail.com> wrote: [..]
1. Are you sure you want to expose the CoW type to pure Python code?
Ultimately, why not? The execution context object you get with sys.get_execution_context() is yours to change. Any change to it won't be propagated anywhere, unless you execute something in that context with ExecutionContext.run or set it as a current one.
This one (the second example) won't do anything.
This one (the third one) won't do anything either. You can do this: ec = sys.get_execution_context() ec['x'] = x ec.run(my_function) or `sys.set_execution_context(ec)`
So there's no copy-on-write exposed to Python actually. What I am thinking about, though, is that we might not need the sys.set_execution_context() function. If you want to run something with a modified or empty execution context, do it through ExecutionContext.run method.
Yes, we'll need it for contextlib.asynccontextmanager at least.
I think we touched upon this in a parallel thread. But I think we can rename "gi_isolated_execution_context" to "gi_execution_context_isolated" or something more readable/obvious. Yury

Good work Yuri, going for all in one will help to not increase the diferences btw async and the sync world in Python. I do really like the idea of the immutable dicts, it makes easy inherit the context btw tasks/threads/whatever without put in risk the consistency if there is further key colisions. Ive just take a look at the asyncio modifications. Correct me if Im wrong, but the handler strategy has a side effect. The work done to save and restore the context will be done twice in some situations. It would happen when the callback is in charge of execute a task step, once by the run in context method and the other one by the coroutine. Is that correct? El 12/08/2017 00:38, "Yury Selivanov" <yselivanov.ml@gmail.com> escribió: Hi, This is a new PEP to implement Execution Contexts in Python. The PEP is in-flight to python.org, and in the meanwhile can be read on GitHub: https://github.com/python/peps/blob/master/pep-0550.rst (it contains a few diagrams and charts, so please read it there.) Thank you! Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov <yury@magic.io> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for isolating state of generators or asynchronous code because such code shares a single thread. Rationale ========= Traditionally a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. The above code will not work correctly, if a user iterates over the ``calculate()`` generator with different precisions in parallel:: g1 = calculate(100) g2 = calculate(50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision, # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, limited to be supported only by code that was explicitly enabled to work with them. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no option to pass the state explicitly. With this PEP implemented, it should be possible to update a context manager like the below:: _local = threading.local() @contextmanager def context(x): old_x = getattr(_local, 'x', None) _local.x = x try: yield finally: _local.x = old_x to a more robust version that can be reliably used in generators and async/await code, with a simple transformation:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) Specification ============= This proposal introduces a new concept called Execution Context (EC), along with a set of Python APIs and C APIs to interact with it. EC is implemented using an immutable mapping. Every modification of the mapping produces a new copy of it. To illustrate what it means let's compare it to how we work with tuples in Python:: a0 = () a1 = a0 + (1,) a2 = a1 + (2,) # a0 is an empty tuple # a1 is (1,) # a2 is (1, 2) Manipulating an EC object would be similar:: a0 = EC() a1 = a0.set('foo', 'bar') a2 = a1.set('spam', 'ham') # a0 is an empty mapping # a1 is {'foo': 'bar'} # a2 is {'foo': 'bar', 'spam': 'ham'} In CPython, every thread that can execute Python code has a corresponding ``PyThreadState`` object. It encapsulates important runtime information like a pointer to the current frame, and is being used by the ceval loop extensively. We add a new field to ``PyThreadState``, called ``exec_context``, which points to the current EC object. We also introduce a set of APIs to work with Execution Context. In this section we will only cover two functions that are needed to explain how Execution Context works. See the full list of new APIs in the `New APIs`_ section. * ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` in the EC of the executing thread. If not found, return ``default``. * ``sys.set_execution_context_item(key, value)``: get the current EC of the executing thread. Add a ``key``/``value`` item to it, which will produce a new EC object. Set the new object as the current one for the executing thread. In pseudo-code:: tstate = PyThreadState_GET() ec = tstate.exec_context ec2 = ec.set(key, value) tstate.exec_context = ec2 Note, that some important implementation details and optimizations are omitted here, and will be covered in later sections of this PEP. Now let's see how Execution Contexts work with regular multi-threaded code, generators, and coroutines. Regular & Multithreaded Code ---------------------------- For regular Python code, EC behaves just like a thread-local. Any modification of the EC object produces a new one, which is immediately set as the current one for the thread state. .. figure:: pep-0550/functions.png :align: center :width: 90% Figure 1. Execution Context flow in a thread. As Figure 1 illustrates, if a function calls ``set_execution_context_item()``, the modification of the execution context will be visible to all subsequent calls and to the caller:: def set_foo(): set_execution_context_item('foo', 'spam') set_execution_context_item('foo', 'bar') print(get_execution_context_item('foo')) set_foo() print(get_execution_context_item('foo')) # will print: # bar # spam Coroutines ---------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine can be viewed as a different calling convention: Tasks are similar to threads, and awaiting on coroutines within a Task is similar to calling functions within a thread. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same thread. To achieve this, a small set of modifications to the coroutine object is needed: * When a coroutine object is instantiated, it saves a reference to the current execution context object to its ``cr_execution_context`` attribute. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro->cr_isolated_execution_context: # Save a reference to the current execution context old_context = tstate->execution_context # Set our saved execution context as the current # for the current thread. tstate->execution_context = coro->cr_execution_context try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) finally: # Save a reference to the updated execution_context. # We will need it later, when `.send()` or `.throw()` # are called again. coro->cr_execution_context = tstate->execution_context # Restore thread's execution context to what it was before # invoking this coroutine. tstate->execution_context = old_context else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) * ``cr_isolated_execution_context`` is a new attribute on coroutine objects. Set to ``True`` by default, it makes any execution context modifications performed by coroutine to stay visible only to that coroutine. When Python interpreter sees an ``await`` instruction, it flips ``cr_isolated_execution_context`` to ``False`` for the coroutine that is about to be awaited. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task. Because the top-level coroutine (Task) cannot be scheduled with ``await`` (in asyncio you need to call ``loop.create_task()`` or ``asyncio.ensure_future()`` to schedule a Task), all execution context modifications are guaranteed to stay within the Task. * We always work with ``tstate->exec_context``. We use ``coro->cr_execution_context`` only to store coroutine's execution context when it is not executing. Figure 2 below illustrates how execution context mutations work with coroutines. .. figure:: pep-0550/coroutines.png :align: center :width: 90% Figure 2. Execution Context flow in coroutines. In the above diagram: * When "coro1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When it awaits on "coro2", any subsequent changes it does to the execution context are visible to "coro1", but not outside of it. In code:: async def inner_foo(): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', 2) async def foo(): print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 1) await inner_foo() print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) asyncio.get_event_loop().run_until_complete(foo()) print('main:', get_execution_context_item('key')) which will output:: main: spam foo: spam inner_foo: 1 foo: 2 main: spam Generator-based coroutines (generators decorated with ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as native coroutines with regards to execution context management: their ``yield from`` expression is semantically equivalent to ``await``. Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed to the end, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators share 99% of their implementation with coroutines, and thus have similar new attributes ``gi_execution_context`` and ``gi_isolated_execution_context``. Similar to coroutines, generators save a reference to the current execution context when they are instantiated. The have the same implementation of ``.send()`` and ``.throw()`` methods. The only difference is that ``gi_isolated_execution_context`` is always set to ``True``, and is never modified by the interpreter. ``yield from o`` expression in regular generators that are not decorated with ``types.coroutine``, is semantically equivalent to ``for v in o: yield v``. .. figure:: pep-0550/generators.png :align: center :width: 90% Figure 3. Execution Context flow in a generator. In the above diagram: * When "gen1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When "gen2" is created, it saves a reference to the current execution context for it -- "2.1". * Any subsequent execution context updated in "gen2" will only be visible to "gen2". * Likewise, any context changes that "gen1" will do after it created "gen2" will not be visible to "gen2". In code:: def inner_foo(): for i in range(3): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', i) yield i def foo(): set_execution_context_item('key', 'spam') print('foo:', get_execution_context_item('key')) inner = inner_foo() while True: val = next(inner, None) if val is None: break yield val print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) list(foo()) print('main:', get_execution_context_item('key')) which will output:: main: ham foo: spam inner_foo: spam foo: spam inner_foo: 0 foo: spam inner_foo: 1 foo: spam main: ham As we see, any modification of the execution context in a generator is visible only to the generator itself. There is one use-case where it is desired for generators to affect the surrounding execution context: ``contextlib.contextmanager`` decorator. To make the following work:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) we modified ``contextmanager`` to flip ``gi_isolated_execution_context`` flag to ``False`` on its generator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Since Execution Context is implemented on top of ``PyThreadState``, it's easy to add transparent support of it to greenlet. New APIs ======== Even though this PEP adds a number of new APIs, please keep in mind, that most Python users will likely ever use only two of them: ``sys.get_execution_context_item()`` and ``sys.set_execution_context_item()``. Python ------ 1. ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` for the current Execution Context. If not found, return ``default``. 2. ``sys.set_execution_context_item(key, value)``: set ``key``/``value`` item for the current Execution Context. If ``value`` is ``None``, the item will be removed. 3. ``sys.get_execution_context()``: return the current Execution Context object: ``sys.ExecutionContext``. 4. ``sys.set_execution_context(ec)``: set the passed ``sys.ExecutionContext`` instance as a current one for the current thread. 5. ``sys.ExecutionContext`` object. Implementation detail: ``sys.ExecutionContext`` wraps a low-level ``PyExecContextData`` object. ``sys.ExecutionContext`` has a mutable mapping API, abstracting away the real immutable ``PyExecContextData``. * ``ExecutionContext()``: construct a new, empty, execution context. * ``ec.run(func, *args)`` method: run ``func(*args)`` in the ``ec`` execution context. * ``ec[key]``: lookup ``key`` in ``ec`` context. * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and ``ec.copy()`` are similar to that of ``dict`` object. C API ----- C API is different from the Python one because it operates directly on the low-level immutable ``PyExecContextData`` object. 1. New ``PyThreadState->exec_context`` field, pointing to a ``PyExecContextData`` object. 2. ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` similar to ``sys.set_execution_context_item()`` and ``sys.get_execution_context_item()``. 3. ``PyThreadState_GetExecContext``: similar to ``sys.get_execution_context()``. Always returns an ``PyExecContextData`` object. If ``PyThreadState->exec_context`` is ``NULL`` an new and empty one will be created and assigned to ``PyThreadState->exec_context``. 4. ``PyThreadState_SetExecContext``: similar to ``sys.set_execution_context()``. 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` object. 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. The exact layout ``PyExecContextData`` is private, which allows to switch it to a different implementation later. More on that in the `Implementation Details`_ section. Modifications in Standard Library ================================= * ``contextlib.contextmanager`` was updated to flip the new ``gi_isolated_execution_context`` attribute on the generator. * ``asyncio.events.Handle`` object now captures the current execution context when it is created, and uses the saved execution context to run the callback (with ``ExecutionContext.run()`` method.) This makes ``loop.call_soon()`` to run callbacks in the execution context they were scheduled. No modifications in ``asyncio.Task`` or ``asyncio.Future`` were necessary. Some standard library modules like ``warnings`` and ``decimal`` can be updated to use new execution contexts. This will be considered in separate issues if this PEP is accepted. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Performance =========== Implementation Details ---------------------- The new ``PyExecContextData`` object is wrapping a ``dict`` object. Any modification requires creating a shallow copy of the dict. While working on the reference implementation of this PEP, we were able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for details. .. figure:: pep-0550/dict_copy.png :align: center :width: 100% Figure 4. Figure 4 shows that the performance of immutable dict implemented with shallow copying is expectedly O(n) for the ``set()`` operation. However, this is tolerable until dict has more than 100 items (1 ``set()`` takes about a microsecond.) Judging by the number of modules that need EC in Standard Library it is likely that real world Python applications will use significantly less than 100 execution context variables. The important point is that the cost of accessing a key in Execution Context is always O(1). If the ``set()`` operation performance is a major concern, we discuss alternative approaches that have O(1) or close ``set()`` performance in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and `Copy-on-write Execution Context`_ sections. Generators and Coroutines ------------------------- Using a microbenchmark for generators and coroutines from :pep:`492` ([12]_), it was possible to observe 0.5 to 1% performance degradation. asyncio echoserver microbechmarks from the uvloop project [13]_ showed 1-1.5% performance degradation for asyncio code. asyncpg benchmarks [14]_, that execute more code and are closer to a real-world application did not exhibit any noticeable performance change. Overall Performance Impact -------------------------- The total number of changed lines in the ceval loop is 2 -- in the ``YIELD_FROM`` opcode implementation. Only performance of generators and coroutines can be affected by the proposal. This was confirmed by running Python Performance Benchmark Suite [15]_, which demonstrated that there is no difference between 3.7 master branch and this PEP reference implementation branch (full benchmark results can be found here [16]_.) Design Considerations ===================== Alternative Immutable Dict Implementation ----------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()`` and ``get()`` operations, which will be essentially O(1) for relatively small mappings in EC. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550/hamt_vs_dict.png :align: center :width: 100% Figure 5. Benchmark code can be found here: [9]_. Figure 5 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550/lookup_hamt.png :align: center :width: 100% Figure 6. Benchmark code can be found here: [10]_. Figure 6 below shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that the current approach with implementing an immutable mapping with shallow-copying dict will likely perform adequately in real-life applications. The HAMT solution is more future proof, however. The proposed API is designed in such a way that the underlying implementation of the mapping can be changed completely without affecting the Execution Context `Specification`_, which allows us to switch to HAMT at some point if necessary. Copy-on-write Execution Context ------------------------------- The implementation of Execution Context in .NET is different from this PEP. .NET uses copy-on-write mechanism and a regular mutable mapping. One way to implement this in CPython would be to have two new fields in ``PyThreadState``: * ``exec_context`` pointing to the current Execution Context mapping; * ``exec_context_copy_on_write`` flag, set to ``0`` initially. The idea is that whenever we are modifying the EC, the copy-on-write flag is checked, and if it is set to ``1``, the EC is copied. Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` methods described in the `Coroutines`_ section will be almost the same, except that in addition to the ``gi_execution_context`` they will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine or a generator starts, the flag will be set to ``1``. This will ensure that any modification of the EC performed within a coroutine or a generator will be isolated. This approach has one advantage: * For Execution Context that contains a large number of items, copy-on-write is a more efficient solution than the shallow-copy dict approach. However, we believe that copy-on-write disadvantages are more important to consider: * Copy-on-write behaviour for generators and coroutines makes EC semantics less predictable. With immutable EC approach, generators and coroutines always execute in the EC that was current at the moment of their creation. Any modifications to the outer EC while a generator or a coroutine is executing are not visible to them:: def generator(): yield 1 print(get_execution_context_item('key')) yield 2 set_execution_context_item('key', 'spam') gen = iter(generator()) next(gen) set_execution_context_item('key', 'ham') next(gen) The above script will always print 'spam' with immutable EC. With a copy-on-write approach, the above script will print 'ham'. Now, consider that ``generator()`` was refactored to call some library function, that uses Execution Context:: def generator(): yield 1 some_function_that_uses_decimal_context() print(get_execution_context_item('key')) yield 2 Now, the script will print 'spam', because ``some_function_that_uses_decimal_context`` forced the EC to copy, and ``set_execution_context_item('key', 'ham')`` line did not affect the ``generator()`` code after all. * Similarly to the previous point, ``sys.ExecutionContext.run()`` method will also become less predictable, as ``sys.get_execution_context()`` would still return a reference to the current mutable EC. We can't modify ``sys.get_execution_context()`` to return a shallow copy of the current EC, because this would seriously harm performance of ``asyncio.call_soon()`` and similar places, where it is important to propagate the Execution Context. * Even though copy-on-write requires to shallow copy the execution context object less frequently, copying will still take place in coroutines and generators. In which case, HAMT approach will perform better for medium to large sized execution contexts. All in all, we believe that the copy-on-write approach introduces very subtle corner cases that could lead to bugs that are exceptionally hard to discover and fix. The immutable EC solution in comparison is always predictable and easy to reason about. Therefore we believe that any slight performance gain that the copy-on-write solution might offer is not worth it. Faster C API ------------ Packages like numpy and standard library modules like decimal need to frequently query the global state for some local context configuration. It is important that the APIs that they use is as fast as possible. The proposed ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` functions need to get the current thread state with ``PyThreadState_GET()`` (fast) and then perform a hash lookup (relatively slow). We can eliminate the hash lookup by adding three additional C API functions: * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` introduced :pep:`523`. The idea is to request a unique index that can later be used to lookup context items. The ``key_name`` can later be used by ``sys.ExecutionContext`` to introspect items added with this API. * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` to request an item by its index, avoiding the cost of hash lookup. Why setting a key to None removes the item? ------------------------------------------- Consider a context manager:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) With ``set_execution_context_item(key, None)`` call removing the ``key``, the user doesn't need to write additional code to remove the ``key`` if it wasn't in the execution context already. An alternative design with ``del_execution_context_item()`` method would look like the following:: @contextmanager def context(x): not_there = object() old_x = get_execution_context_item('x', not_there) set_execution_context_item('x', x) try: yield finally: if old_x is not_there: del_execution_context_item('x') else: set_execution_context_item('x', old_x) Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __suspend__(self): set_execution_context_item('x', self.old_x) def __resume__(self): set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. It also does not solve the leaking state problem for greenlet/gevent. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Reference Implementation ======================== The reference implementation can be found here: [11]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading. executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Finally got an almost decent internet connection. Seeing the changes related to that PEP I can confirm that the context will be saved twice in any "task switch" in an Asyncio environment. Once made by the run in context function executed by the Handler [1] and immediately after by the send [2] method belonging to the coroutine that belongs to that task. Formally from my understanding, there is no use of the context in the Asyncio layer, at least nowadays. Saving the context at the moment to schedule a Task is, at first sight, useless and might have a performance impact. Don't you think that this edge case that happens a lot might be in somehow optimized? Am I missing something? [1] https://github.com/1st1/cpython/blob/pep550/Lib/asyncio/events.py#L124 [2] https://github.com/1st1/cpython/blob/pep550/Lib/asyncio/tasks.py#L176 On Sat, Aug 12, 2017 at 11:03 PM, Pau Freixes <pfreixes@gmail.com> wrote:
-- --pau

Hi Pau, Re string keys collisions -- I decided to update the PEP to follow Nathaniel's suggestion to use a get_context_key api, which will eliminate this problem entirely. Re call_soon in asyncio.Task -- yes, it does use ec.run() to invoke coroutine.send(). However, this has almost no visible effect, as ExecutionContext.run() is a very cheap operation (think 1-2 function calls). It's possible to add a new keyword arg to call_soon like "ignore_execution_context" to eliminate even this small overhead, but this is something we can easily do later. Yury

I had an idea for an alternative API that exposes the same functionality/semantics as the current draft, but that might have some advantages. It would look like: # a "context item" is an object that holds a context-sensitive value # each call to create_context_item creates a new one ci = sys.create_context_item() # Set the value of this item in the current context ci.set(value) # Get the value of this item in the current context value = ci.get() value = ci.get(default) # To support async libraries, we need some way to capture the whole context # But an opaque token representing "all context item values" is enough state_token = sys.current_context_state_token() sys.set_context_state_token(state_token) coro.cr_state_token = state_token # etc. The advantages are: - Eliminates the current PEP's issues with namespace collision; every context item is automatically distinct from all others. - Eliminates the need for the None-means-del hack. - Lets the interpreter hide the details of garbage collecting context values. - Allows for more implementation flexibility. This could be implemented directly on top of Yury's current prototype. But it could also, for example, be implemented by storing the context values in a flat array, where each context item is assigned an index when it's allocated. In the current draft this is suggested as a possible extension for particularly performance-sensitive users, but this way we'd have the option of making everything fast without changing or extending the API. As precedent, this is basically the API that low-level thread-local storage implementations use; see e.g. pthread_key_create, pthread_getspecific, pthread_setspecific. (And the allocate-an-index-in-a-table is the implementation that fast thread-local storage implementations use too.) -n On Fri, Aug 11, 2017 at 3:37 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

Yes, I considered this idea myself, but ultimately rejected it because: 1. Current solution makes it easy to introspect things. Get the current EC and print it out. Although the context item idea could be extended to `sys.create_context_item('description')` to allow that. 2. What if we want to pickle the EC? If all items in it are pickleable, it's possible to dump the EC, send it over the network, and re-use in some other process. It's not something I want to consider in the PEP right now, but it's something that the current design theoretically allows. AFAIU, `ci = sys.create_context_item()` context item wouldn't be possible to pickle/unpickle correctly, no? Some more comments: On Sat, Aug 12, 2017 at 7:35 PM, Nathaniel Smith <njs@pobox.com> wrote: [..]
TBH I think that the collision issue is slightly exaggerated.
- Eliminates the need for the None-means-del hack.
I consider Execution Context to be an API, not a collection. It's an important distinction, If you view it that way, deletion on None is doesn't look that esoteric.
- Lets the interpreter hide the details of garbage collecting context values.
I'm not sure I understand how the current PEP design is bad from the GC standpoint. Or how this proposal can be different, FWIW.
You still want to have this optimization only for *some* keys. So I think a separate API is still needed. Yury

On Sat, Aug 12, 2017 at 6:27 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
My first draft actually had the description argument :-). But then I deleted it on the grounds that there's also no way to introspect a list of all threading.local objects, and no-one seems to be bothered by that, so why should we bother here. Obviously it'd be trivial to add though, yeah; I don't really care either way.
That's true. In this API, supporting pickling would require some kind of opt-in on the part of EC users. But... pickling would actually need to be opt-in anyway. Remember, the set of all EC items is a piece of global shared state; we expect new entries to appear when random 3rd party libraries are imported. So we have no idea what is in there or what it's being used for. Blindly pickling the whole context will lead to bugs (when code unexpectedly ends up with context that wasn't designed to go across processes) and crashes (there's no guarantee that all the objects are even pickleable). If we do decide we want to support this in the future then we could add a generic opt-in mechanism something like: MY_CI = sys.create_context_item(__name__, "MY_CI", pickleable=True) But I'm not sure that it even make sense to have a global flag enabling pickle. Probably it's better to have separate flags to opt-in to different libraries that might want to pickle in different situations for different reasons: pickleable-by-dask, pickleable-by-curio.run_in_process, ... And that's doable without any special interpreter support. E.g. you could have curio.Local(pickle=True) coordinate with curio.run_in_process.
Deletion on None is still a special case that API users need to remember, and it's a small footgun that you can't just take an arbitrary Python object and round-trip it through the context. Obviously these are both APIs and they can do anything that makes sense, but all else being equal I prefer APIs that have fewer special cases :-).
When the ContextItem object becomes unreachable and is collected, then the interpreter knows that all of the values associated with it in different contexts are also unreachable and can be collected. I mentioned this in my email yesterday -- look at the hoops threading.local jumps through to avoid breaking garbage collection. This is closely related to the previous point, actually -- AFAICT the only reason why it *really* matters that None deletes the item is that you need to be able to delete to free the item from the dictionary, which only matters if you want to dynamically allocate keys and then throw them away again. In the ContextItem approach, there's no need to manually delete the entry, you can just drop your reference to the ContextItem and the the garbage collector take care of it.
Wait, why is it a requirement that some keys be slow? That seems like weird requirement :-). -n -- Nathaniel J. Smith -- https://vorpus.org

As far as providing a thread-local like surrogate for coroutine based systems in Python, we had to solve this for Twisted with https://bitbucket.org/hipchat/txlocal. Because of the way the Twisted threadpooling works we also had to make a context system that was both coroutine and thread safe at the same time. We have a similar setup for asyncio but it seems we haven't open sourced it. I'll ask around for it if this group feels that an asyncio example would be beneficial. We implemented both of these in plain-old Python so they should be compatible beyond CPython. It's been over a year since I was directly involved with either of these projects, but added memory and CPU consumption were stats we watched closely and we found a negligible increase in both as we rolled out async context. On Sat, Aug 12, 2017 at 9:16 PM Nathaniel Smith <njs@pobox.com> wrote:

On 13 August 2017 at 12:15, Nathaniel Smith <njs@pobox.com> wrote:
In the TLS/TSS case, we have the design constraint of wanting to use the platform provided TLS/TSS implementation when available, and standard C APIs generally aren't designed to support rich runtime introspection from regular C code - instead, they expect the debugger, compiler, and standard library to be co-developed such that the debugger knows how to figure out where the latter two have put things at runtime.
Obviously it'd be trivial to add though, yeah; I don't really care either way.
As noted in my other email, I like the idea of making the context dependent state introspection API clearly distinct from the core context dependent state management API. That way the API implementation can focus on using the most efficient data structures for the purpose, rather than being limited to the most efficient data structures that can readily export a Python-style mapping interface. The latter can then be provided purely for introspection purposes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 9:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Excellent point.
Also an excellent point :-). -n -- Nathaniel J. Smith -- https://vorpus.org

On 13 August 2017 at 11:27, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
I think the TLS/TSS precedent means we should seriously consider the ContextItem + ContextStateToken approach for the core low level API. We also have a long history of pain and quirks arising from the locals() builtin being defined as returning a mapping even though function locals are managed as a linear array, so if we can avoid that for the execution context, it will likely be beneficial for both end users (due to less quirky runtime behaviour, especially across implementations) and language implementation developers (due to a reduced need to make something behave like an ordinary mapping when it really isn't). If we decide we want a separate context introspection API (akin to inspect.getcouroutinelocals() and inspect.getgeneratorlocals()), then an otherwise opaque ContextStateToken would be sufficient to enable that. Even if we don't need it for any other reason, having such an API available would be desirable for the regression test suite. For example, if context items are hashable, we could have the following arrangement: # Create new context items sys.create_context_item(name) # Opaque token for the current execution context sys.get_context_token() # Switch the current execution context to the given one sys.set_context(context_token) # Snapshot mapping context items to their values in given context sys.get_context_items(context_token) As Nathaniel suggestion, getting/setting/deleting individual items in the current context would be implemented as methods on the ContextItem objects, allowing the return value of "get_context_items" to be a plain dictionary, rather than a special type that directly supported updates to the underlying context.
As Nathaniel notes, cooperative partial pickling will be possible regardless of how the low level API works, and starting with a simpler low level API still doesn't rule out adding features like this at a later date. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: [..]
The current PEP 550 design returns a "snapshot" of the current EC with sys.get_execution_context(). I.e. if you do ec = sys.get_execution_context() ec['a'] = 'b' # sys.get_execution_context_item('a') will return None You did get a snapshot and you modified it -- but your modifications are not visible anywhere. You can run a function in that modified EC with `ec.run(function)` and that function will see that new 'a' key, but that's it. There's no "magical" updates to the underlying context. Yury

For what it's worth, as part of prompt_toolkit 2.0, I implemented something very similar to Nathaniel's idea some time ago. It works pretty well, but I don't have a strong opinion against an alternative implementation. - The active context is stored as a monotonically increasing integer. - For each local, the actual values are stored in a dictionary that maps the context ID to the value. (Could cause a GC issue - I'm not sure.) - Every time when an executor is started, I have to wrap the callable in a context manager that applies the current context to that thread. - When a new 'Future' is created, I grab the context ID and apply it to the callbacks when the result is set. https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad942... https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad942... FYI: In my case, I did not want to pass the currently active "Application" object around all of the code. But when I started supporting telnet, multiple applications could be alive at once, each with a different I/O backend. Therefore the active application needed to be stored in a kind of executing context. When PEP550 gets approved I'll probably make this compatible. It should at least be possible to run prompt_toolkit on the asyncio event loop. Jonathan 2017-08-13 1:35 GMT+02:00 Nathaniel Smith <njs@pobox.com>:

Hi Jonathan, Thanks for the feedback. I'll update the PEP to use Nathaniel's idea of of `sys.get_context_key`. It will be a pretty similar API to what you currently have in prompt_toolkit. Yury

Yury Selivanov wrote:
This is a new PEP to implement Execution Contexts in Python.
It dawns on me that I might be able to use ECs to do a better job of implementing flufl.i18n's translation contexts. I think this is another example of what the PEP's abstract describes as "Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings;" The _ object maintains a stack of the language codes being used, and you can push a new code onto the stack (typically using `with` so they get automatically popped when exiting). The use case for this is translating say a notification to multiple recipients in the same request, one who speaks French, one who speaks German, and another that speaks English. The problem is that _ is usually a global in a typical application, so in an async environment, if one request is translating to 'fr', another might be translating to 'de', or even a deferred context (e.g. because you want to mark a string but not translate it until some later use). While I haven't used it in an async environment yet, the current approach probably doesn't work very well, or at all. I'd probably start by recommending a separate _ object in each thread, but that's less convenient to use in practice. It seems like it would be better to either attach an _ object to each EC, or to implement the stack of codes in the EC and let the global _ access that stack. It feels a lot like `let` in lisp, but without the implicit addition of the contextual keys into the local namespace. E.g. in a PEP 550 world, you'd have to explicitly retrieve the key/values from the EC rather than have them magically appear in the local namespace, the former of course being the Pythonic way to do it. Cheers, -Barry

Hi Barry, Yes, i18n is another use-case for execution context, and ec should be a perfect fit for it. Yury

[duplicating my reply cc-ing python-ideas]
Is a new EC type really needed? Cannot this be done with collections.ChainMap?
No, not really. ChainMap will have O(N) lookup performance where N is the number of contexts you have in the chain. This will degrade performance of lookups, which isn't acceptable for some potential EC users like decimal/numpy/etc. Inventing heuristics to manage the chain size is harder than making an immutable dict (which is easy to reason about.) Chaining contexts will also force then to reference each other, creating cycles that GC won't be able to break. Besides just performance considerations, with ChainMap design of contexts it's not possible to properly isolate state changes inside of generators or coroutines/tasks as it's done in the PEP. All in all, I don't think that chaining can solve the problem. It will likely lead to a more complicated solution in the end (this was my initial approach FWIW). Yury

This is exciting and I'm happy that you're addressing this problem. We've solved a similar problem in our asynchronous programming framework, asynq. Our solution (implemented at https://github.com/quora/asynq/blob/master/asynq/contexts.py) is similar to that in PEP 521: we enhance the context manager protocol with pause/resume methods instead of using an enhanced form of thread-local state. Some of our use cases can't be implemented using this PEP; notably, we use a timing context that times how long an asynchronous function takes by repeatedly pausing and resuming the timer. However, this timing context adds significant overhead because we have to call the pause/resume methods so often. Overall, your approach is almost certainly more performant. 2017-08-11 15:37 GMT-07:00 Yury Selivanov <yselivanov.ml@gmail.com>:

This is exciting and I'm happy that you're addressing this problem.
Thank you!
Some of our use cases can't be implemented using this PEP; notably, we use a timing context that times how long an asynchronous function takes by repeatedly pausing and resuming the timer.
Measuring performance of coroutines is a bit different kind of problem. With PEP 550 you will be able to decouple context management from collecting performance data. That would allow you to subclass asyncio.Task (let's call it InstrumentedTask) and implement all extra tracing functionality on it (by overriding its _send method for example). Then you could set a custom task factory that would use InstrumentedTask only for a fraction of requests. That would make it possible to collect performance metrics even in production (my 2c). Yury

On Aug 11, 2017 16:38, "Yury Selivanov" <yselivanov.ml@gmail.com> wrote: Hi, This is a new PEP to implement Execution Contexts in Python. Nice! I've had something like this on the back burner for a while as it helps solve some problems with encapsulating the import state (e.g. PEP 408). -eric

On 12 August 2017 at 15:45, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Thanks Eric!
PEP 408 -- Standard library __preview__ package?
Typo in the PEP number: PEP 406, which was an ultimately failed attempt to get away from the reliance on process globals to manage the import system by encapsulating the top level state as an "Import Engine": https://www.python.org/dev/peps/pep-0406/ We still like the idea in principle (hence the Withdrawn status rather then being Rejected), but someone needs to find time to take a run at designing a new version of it atop the cleaner PEP 451 import plugin API (hence why the *specific* proposal in PEP 406 has been withdrawn). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I may have missed this (I've just skimmed the doc), but what's the rationale for making the EC an *immutable* mapping? It's impressive that you managed to create a faster immutable dict, but why does the use case need one? -- --Guido van Rossum (python.org/~guido)

On Fri, Aug 11, 2017 at 10:17 PM, Guido van Rossum <guido@python.org> wrote:
In this proposal, you have lots and lots of semantically distinct ECs. Potentially every stack frame has its own (at least in async code). So instead of copying the EC every time they create a new one, they want to copy it when it's written to. This is a win if writes are relatively rare compared to the creation of ECs. You could probably optimize it a bit more by checking the refcnt before writing, and skipping the copy if it's exactly 1. But even simpler is to just always copy and throw away the old version. -n -- Nathaniel J. Smith -- https://vorpus.org

[replying to the list]
I may have missed this (I've just skimmed the doc), but what's the rationale for making the EC an *immutable* mapping?
It's possible to implement Execution Context with a mutable mapping and copy-on-write (as it's done in .NET) This is one of the approaches that I tried and I discovered that it causes a bunch of subtle inconsistencies in contexts for generators and coroutines. I've tried to cover this here: https://www.python.org/dev/peps/pep-0550/#copy-on-write-execution-context All in all, I believe that the immutable mapping approach gives the most predictable and easy to reason about model. If its performance on large number of items in EC is a concern, I'll be happy to implement it using HAMT (also covered in the PEP). Yury

Hi Yury, This is really cool. Some notes on a first read: 1. Excellent work on optimizing dict, that seems valuable independent of the rest of the details here. 2. The text doesn't mention async generators at all. I assume they also have an agi_isolated_execution_context flag that can be set, to enable @asyncontextmanager? 2a. Speaking of which I wonder if it's possible for async_generator to emulate this flag... I don't know if this matters -- at this point the main reason to use async_generator is for code that wants to support PyPy. If PyPy gains native async generator support before CPython 3.7 comes out then async_generator may be entirely irrelevant before PEP 550 matters. But right now async_generator is still quite handy... 2b. BTW, the contextmanager trick is quite nice -- I actually noticed last week that PEP 521 had a problem here, but didn't think of a solution :-). 3. You're right that numpy is *very* performance sensitive about accessing the context -- the errstate object is needed extremely frequently, even on trivial operations like adding two scalars, so a dict lookup is very noticeable. (Imagine adding a dict lookup to float.__add__.) Right now, the errstate object get stored in the threadstate dict, and then there are some dubious-looking hacks involving a global (not thread-local) counter to let us skip the lookup entirely if we think that no errstate object has been set. Really what we ought to be doing (currently, in a non PEP 550 world) is storing the errstate in a __thread variable -- it'd certainly be worth it. Adopting PEP 550 would definitely be easier if we knew that it wasn't ruling out that level of optimization. 4. I'm worried that all of your examples use string keys. One of the great things about threading.local objects is that each one is a new namespace, which is a honking great idea -- here it prevents accidental collisions between unrelated libraries. And while it's possible to implement threading.local in terms of the threadstate dict (that's how they work now!), it requires some extremely finicky code to get the memory management right: https://github.com/python/cpython/blob/dadca480c5b7c5cf425d423316cd695bc5db3... It seems like you're imagining that this API will be used directly by user code? Is that true? ...Are you sure that's a good idea? Are we just assuming that not many keys will be used and the keys will generally be immortal anyway, so leaking entries is OK? Maybe this is nit-picking, but this is hooking into the language semantics in such a deep way that I sorta feel like it would be bad to end up with something where we can never get garbage collection right. The suggested index-based API for super fast C lookup also has this problem, but that would be such a low-level API -- and not part of the language definition -- that the right answer is probably just to document that there's no way to unallocate indices so any given C library should only allocate, like... 1 of them. Maybe provide an explicit API to release an index, if we really want to get fancy. 5. Is there some performance-related reason that the API for getting/setting isn't just sys.get_execution_context()[...] = ...? Or even sys.execution_context[...]? 5a. Speaking of which I'm not a big fan of the None-means-delete behavior. Not only does Python have a nice standard way to describe all the mapping operations without such hacks, but you're actually implementing that whole interface anyway. Why not use it? 6. Should Thread.start inherit the execution context from the spawning thread? 7. Compatibility: it does sort of break 3rd party contextmanager implementations (contextlib2, asyncio_extras's acontextmanager, trio's internal acontextmanager, ...). This is extremely minor though. 8. You discuss how this works for asyncio and gevent. Have you looked at how it will interact with tornado's context handling system? Can they use this? It's the most important extant context implementation I can think of (aside from thread local storage itself). 9. OK, my big question, about semantics. The PEP's design is based on the assumption that all context-local state is scalar-like, and contexts split but never join. But there are some cases where this isn't true, in particular for values that have "stack-like" semantics. These are terms I just made up, but let me give some examples. Python's sys.exc_info is one. Another I ran into recently is for trio's cancel scopes. So basically the background is, in trio you can wrap a context manager around any arbitrary chunk of code and then set a timeout or explicitly cancel that code. It's called a "cancel scope". These are fully nestable. Full details here: https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-t... Currently, the implementation involves keeping a stack of cancel scopes in Task-local storage. This works fine for regular async code because when we switch Tasks, we also switch the cancel scope stack. But of course it falls apart for generators/async generators: async def agen(): with fail_after(10): # 10 second timeout for finishing this block await some_blocking_operation() yield await another_blocking_operation() async def caller(): with fail_after(20): ag = agen() await ag.__anext__() # now that cancel scope is on the stack, even though we're not # inside the context manager! this will not end well. await some_blocking_operation() # this might get cancelled when it shouldn't # even if it doesn't, we'll crash here when exiting the context manager # because we try to pop a cancel scope that isn't at the top of the stack So I was thinking about whether I could implement this using PEP 550. It requires some cleverness, but I could switch to representing the stack as a singly-linked list, and then snapshot it and pass it back to the coroutine runner every time I yield. That would fix the case above. But, I think there's another case that's kind of a showstopper. async def agen(): await some_blocking_operation() yield async def caller(): ag = agen() # context is captured here with fail_after(10): await ag.__anext__() Currently this case works correctly: the timeout is applied to the __anext__ call, as you'd expect. But with PEP 550, it wouldn't work: the generator's timeouts would all be fixed when it was instantiated, and we wouldn't be able to detect that the second call has a timeout imposed on it. So that's a pretty nasty footgun. Any time you have code that's supposed to have a timeout applied, but in fact has no timeout applied, then that's a really serious bug -- it can lead to hangs, trivial DoS, pagers going off, etc. Another problem is code like: async def caller(): with fail_after(10): ag = agen() # then exit the scope Can we clean up the cancel scope? (e.g., remove it from the global priority queue that tracks timeouts?) Normally yes, that's what __exit__ blocks are for, letting you know deterministically that an object can be cleaned up. But here it got captured by the async generator. I really don't want to have to rely on the GC, because on PyPy it means that we could leak an unbounded number of cancel scopes for a finite but unbounded number of time, and all those extra entries in the global timeout priority queue aren't free. (And sys.exc_info has had buggy behavior in analogous situations.) So, I'm wondering if you (or anyone) have any ideas how to fix this :-). Technically, PEP 521 is powerful enough to do it, but in practice the performance would be catastrophically bad. It's one thing to have some extra cost to yielding out of an np.errstate block, those are rare and yielding out of them is rare. But cancel scopes are different: essentially all code in trio runs inside one or more of them, so every coroutine suspend/resume would have to call all those suspend/resume hooks up and down the stack. OTOH PEP 550 is fast, but AFAICT its semantics are wrong for this use case. The basic invariant I want is: if at any given moment you stop and take a backtrace, and then look at the syntactic surroundings of each line in the backtrace and write down a list of all the 'with' blocks that the code *looks* like it's inside, then context lookups should give the same result as they would if you simply entered all of those with blocks in order. Generators make it tricky to maintain this invariant, because a generator frame's backtrace changes every time you call next(). But those are the semantics that make the most sense to me, and seem least surprising in practice. These are also IIUC the semantics that exc_info is supposed to follow (though historically the interaction of exc_info and generators has had lots of bugs, not sure if that's been fixed or not). ...and now that I've written that down, I sort of feel like that might be what you want for all the other sorts of context object too? Like, here's a convoluted example: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") print(a + b) yield print(a + b) def caller(): # let's pretend this context manager exists, the actual API is more complicated with decimal_context_precision(3): g = gen() with decimal_context_precision(2): next(g) with decimal_context_precision(1): next(g) Currently, this will print "3.3 3", because when the generator is resumed it inherits the context of the resuming site. With PEP 550, it would print "3.33 3.33" (or maybe "3.3 3.3"? it's not totally clear from the text), because it inherits the context when the generator is created and then ignores the calling context. It's hard to get strong intuitions, but I feel like the current behavior is actually more sensible -- each time the generator gets resumed, the next bit of code runs in the context of whoever called next(), and the generator is just passively inheriting context, so ... that makes sense. OTOH of course if you change the generator code to: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") with decimal_context_precision(4): print(a + b) yield print(a + b) then it should print "3.333 3.333", because the generator is overriding the caller -- now when we resume the frame we're re-entering the decimal_context_precision(4) block, so it should take priority. So ... maybe all context variables are "stack-like"? -n -- Nathaniel J. Smith -- https://vorpus.org

On 12 August 2017 at 17:54, Nathaniel Smith <njs@pobox.com> wrote:
Now that you raise this point, I think it means that generators need to retain their current context inheritance behaviour, simply for backwards compatibility purposes. This means that the case we need to enable is the one where the generator *doesn't* dynamically adjust its execution context to match that of the calling function. One way that could work (using the cr_back/gi_back convention I suggested): - generators start with gi_back not set - if gi_back is NULL/None, gi.send() and gi.throw() set it to the calling frame for the duration of the synchronous call and *don't* adjust the execution context (i.e. the inverse of coroutine behaviour) - if gi_back is already set, then gi.send() and gi.throw() *do* save and restore the execution context around synchronous calls in to the generator frame To create an autonomous generator (i.e. one that didn't dynamically update its execution context), you'd use a decorator like: def autonomous_generator(gf): @functools.wraps(gf) def wrapper(*args, **kwds): gi = genfunc(*args, **kwds) gi.gi_back = gi.gi_frame return gi return wrapper Asynchronous generators would then work like synchronous generators: ag_back would be NULL/None by default, and dynamically set for the duration of each __anext__ call. If you wanted to create an autonomous one, you'd make it's back reference a circular reference to itself to disable the implicit dynamic updates. When I put it in those terms though, I think the cr_back/gi_back/ag_back idea should actually be orthogonal to the "revert_context" flag (so you can record the link back to the caller even when maintaining an autonomous context). Given that, you'd have the following initial states for "revert context" (currently called "isolated context" in the PEP): * unawaited coroutines: true (same as PEP) * awaited coroutines: false (same as PEP) * generators (both sync & async): false (opposite of current PEP) * autonomous generators: true (set "gi_revert_context" or "ag_revert_context" explicitly) Open question: whether having "yield" inside a with statement implies the creation of an autonomous generator (synchronous or otherwise), or whether you'd need a decorator to get your context management right in such cases. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick, Nathaniel, I'll be replying in full to your emails when I have time to do some experiments. Now I just want to address one point that I think is important: On Sat, Aug 12, 2017 at 1:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Nobody *intentionally* iterates a generator manually in different decimal contexts (or any other contexts). This is an extremely error prone thing to do, because one refactoring of generator -- rearranging yields -- would wreck your custom iteration/context logic. I don't think that any real code relies on this, and I don't think that we are breaking backwards compatibility here in any way. How many users need about this? If someone does need this, it's possible to flip `gi_isolated_execution_context` to `False` (as contextmanager does now) and get this behaviour. This might be needed for frameworks like Tornado which support coroutines via generators without 'yield from', but I'll have to verify this. What I'm saying here, is that any sort of context leaking *into* or *out of* generator *while* it is iterating will likely cause only bugs or undefined behaviour. Take a look at the precision example in the Rationale section of the PEP. Most of the time generators are created and are iterated in the same spot, you rarely create generator closures. One way the behaviour could be changed, however, is to capture the execution context when it's first iterated (as opposed to when it's instantiated), but I don't think it makes any real difference. Another idea: in one of my initial PEP implementations, I exposed gen.gi_execution_context (same for coroutines) to python as read/write attribute. That allowed to (a) get the execution context out of generator (for introspection or other purposes); (b) inject execution context for event loops; for instance asyncio.Task could do that for some purpose. Maybe this would be useful for someone who wants to mess with generators and contexts. [..]
Nick, I still have to fully grasp the idea of `gi_back`, but one quick thing: I specifically designed the PEP to avoid touching frames. The current design only needs TLS and a little help from the interpreter/core objects adjusting that TLS. It should be very straightforward to implement the PEP in any interpreter (with JIT or without) or compilers like Cython. [..]
If generators do not isolate their context, then the example in the Rationale section will not work as expected (or am I missing something?). Fixing generators state leak was one of the main goals of the PEP. Yury

On 13 August 2017 at 03:53, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
I think this is a reasonable stance for the PEP to take, but the hidden execution state around the "isolated or not" behaviour still bothers me. In some ways it reminds me of the way function parameters work: the bound parameters are effectively a *shallow* copy of the passed arguments, so callers can decide whether or not they want the callee to be able to modify them based on the arguments' mutability (or lack thereof). The execution context proposal uses copy-on-write semantics for runtime efficiency, but it's essentially the same shallow copy concept applied to __next__(), send() and throw() operations (and perhaps __anext__(), asend(), and athrow() - I haven't wrapped my head around the implications for async generators and context managers yet). That similarity makes me wonder whether the "isolated or not" behaviour could be moved from the object being executed and directly into the key/value pairs themselves based on whether or not the values were mutable, as that's the way function calls work: if the argument is immutable, the callee *can't* change it, while if it's mutable, the callee can mutate it, but it still can't rebind it to refer to a different object. The way I'd see that working with an always-reverted copy-on-write execution context: 1. If a parent context wants child contexts to be able to make changes, then it should put a *mutable* object in the context (e.g. a list or class instance) 2. If a parent context *does not* want child contexts to be able to make changes, then it should put an *immutable* object in the context (e.g. a tuple or number) 3. If a child context *wants to share a context key with its parent, then it should *mutate* it in place 4. If a child context *does not* want to share a context key with its parent, then it should *rebind* it to a different object That way, instead of reverted-or-not-reverted being an all-or-nothing interpreter level decision, it can be made on a key-by-key basis by choosing whether or not to use a mutable value. To make that a little less abstract, consider a concrete example like setting a "my_web_framework.request" key: 1. The step of *setting* the key will *not* be shared with the parent context, as that modifies the underlying copy-on-write namespace, and will hence be reverted when control is passed back to the parent 2. Any *mutation* of the request object *will* be shared, since mutating the value doesn't have any effect on the copy-on-write namespace Nathaniel's example of wanting stack-like behaviour could be modeled using tuples as values: when the child context appends to the tuple, it will necessarily have to create a new tuple and rebind the corresponding key, causing the changes to be invisible to the parent context. The contextlib.contextmanager use case could then be modeled as a *separate* method that skipped the save/revert context management step (e.g. "send_with_shared_context", "throw_with_shared_context")
Working through this above, I think the key points that bother me about the stateful revert-or-not setting is that whether or not context reversion is desirable depends mainly on two things: - the specific key in question (indicated by mutable vs immutable values) - the intent of the code in the parent context (which could be indicated by calling different methods) It *doesn't* seem to be an inherent property of a given generator or coroutine, except insofar as there's a correlation between the code that creates generators & coroutines and the code that subsequently invokes them.
Yeah, this would be useful, and could potentially avoid the need to expose a parallel set of "*_with_shared_context" methods - instead, contextlib.contextmanager could just invoke the underlying generator with an isolated context, and then set the parent context to the generator's one if it changed.
I think you can just ignore that idea for now, as I've convinced myself it's orthogonal to the question of how we handle execution contexts.
Agreed - see above :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 10:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Mutable default values for function arguments is one of the most confusing things to its users. I've seen numerous threads on StackOverflow/Reddit with people complaining about it.
I'm afraid that if we design EC context to behave differently for mutable/immutable values, it will be an even harder thing to understand to end users.
It's possible to put mutable values even with the current PEP 550 API. The issue that Nathaniel has with it, is that he actually wants the API to behave exactly like it does to implement his timeouts logic, but: there's a corner case, where isolating generator state at the time when it is created doesn't work in his favor. FWIW I believe that I now have a complete solution for the generator.send() problem that will make it possible for Nathaniel to implement his Trio APIs. The functional PoC is here: https://github.com/1st1/cpython/tree/pep550_gen The key change is to make generators and asynchronous generators to: 1. Have their own empty execution context when created. It will be used for whatever local modifications they do to it, ensuring that their state never escapes to the outside world (gi_isolated_execution_context flag is still here for contextmanager). 2. ExecutionContext has a new internal pointer called ec_back. In the Generator.send/throw method, ec_back is dynamically set to the current execution context. 3. This makes it possible for generators to see any outside changes in the execution context *and* have their own, where they can make *local* changes. So (pseudo-code): def gen(): print('1', context) yield print('2', context) with context(spam=ham): yield print('3', context) yield print('4', context) yield g = gen() context(foo=1, spam='bar') next(g) context(foo=2) next(g) context(foo=3) next(g) context(foo=4) next(g) will print: 1 {foo=1, spam=bar} 2 {foo=2, spam=bar} 3 {foo=3, spam=ham} 4 {foo=4, spam=bar} There are some downsides to the approach, mainly from the performance standpoint, but in a common case they will be negligible, if detectable at all. Yury

I'll start a new thread to discuss is we want this specific semantics change soon (with some updates). Yury

On 14 August 2017 at 02:33, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
There's nothing to design, as storing a list (or other mutable object) in an EC will necessarily be the same as storing one in a tuple: the fact you acquired the reference via an immutable container will do *nothing* to keep you from mutating the referenced object. And for use cases like web requests, that's exactly the behaviour we want - changing the active web request is an EC level operation, but making changes to the state of the currently active request (e.g. in a middleware processor) won't require anything special. [I'm going to snip the rest of the post, as it sounds pretty reasonable to me, and my questions about the interaction between sys.set_execution_context() and ec_back go away if sys.set_execution_context() doesn't exist as you're currently proposing]
(gi_isolated_execution_context flag is still here for contextmanager).
This hidden flag variable on the types managing suspendable frames is still the piece of the proposal that strikes me as being the most potentially problematic, as it at least doubles the number of flows of control that need to be tested. Essentially what we're aiming to model is: 1. Performing operations in a way that modifies the active execution context 2. Performing them in a way that saves & restores the execution context For synchronous calls, this distinction is straightforward: - plain calls may alter the active execution context via state mutation - use ec.run() to save/restore the execution context around the operation (The ec_back idea means we may also need an "ec.run()" variant that sets ec_back appropriately before making the call - for example, "ec.run()" could set ec_back, while a separate "ec.run_isolated()" could skip setting it. Alternatively, full isolation could be the default, and "ec.run_shared()" would set ec_back. If we go with the latter option, then "ec_shared" might be a better attribute name than "ec_back") A function can be marked as always having its own private context using a decorator like so: def private_context(f) @functools.wraps(f) def wrapper(*args, **kwds): ec = sys.get_active_context() return ec.run(f, *args, **kwds) return wrapper For next/send/throw and anext/asend/athrow, however, the proposal is to bake the save/restore into the *target objects*, rather than having to request it explicitly in the way those objects get called. This means that unless we apply some implicit decorator magic to the affected slot definitions, there's now going to be a major behavioural difference between: some_state = sys.new_context_item() def local_state_changer(x): for i in range(x): some_state.set(x) yield x class ParentStateChanger: def __init__(self, x): self._itr = iter(range(x)) def __iter__(self): return self def __next__(self): x = next(self._itr) some_state.set(x) return x The latter would need the equivalent of `@private_context` on the `__next__` method definition to get the behaviour that generators would have by default (and similarly for __anext__ and asynchronous generators). I haven't fully thought through the implications of this problem yet, but some initial unordered thoughts: - implicit method decorators are always suspicious, but skipping them in this case feels like we'd be setting up developers of custom iterators for really subtle context management bugs - contextlib's own helper classes would be fine, since they define __enter__ & __exit__, which wouldn't be affected by this - for lru_cache, we rely on `__wrapped__` to get access to the underlying function without caching applied. Might it make sense to do something similar for these implicitly context-restoring methods? If so, should we use a dedicated name so that additional wrapper layers don't overwrite it? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Could someone (perhaps in a new thread?) summarize the current proposal, with some examples of how typical use cases would look? This is an important topic but the discussion is way too voluminous for me to follow while I'm on vacation with my family, and the PEP spends too many words on motivation and not enough on crisply explaining how the proposed feature works (what state is stored where how it's accessed, and how it's manipulated behind the scenes). -- --Guido van Rossum (python.org/~guido)

Nick, you nailed it with your example. In short: current PEP 550 defines Execution Context in such a way, that generators and iterators will interact differently with it. That means that it won't be possible to refactor an iterator class to a generator and that's not acceptable. I'll be rewriting the whole specification section of the PEP today. Yury

On 15 August 2017 at 05:25, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Trying to summarise something I thought of this morning regarding ec_back and implicitly isolating iterator contexts: With the notion of generators running with their own private context by default, that means the state needed to call __next__ on the generator is as follows: - current thread EC - generator's private EC (stored on the generator) - the generator's __next__ method This means that if the EC manipulation were to live in the next() builtin rather than in the individual __next__() methods, then this can be made a general context isolation protocol: - provide a `sys.create_execution_context()` interface - set `__private_context__` on your iterable if you want `next()` to use `ec.run()` (and update __private_context__ afterwards) - set `__private_context__ = None` if you want `next()` to just call `obj.__next__()` directly - generators have __private_context__ set by default, but wrappers like contextlib.contextmanager can clear it That would also suggest that ec.run() will need to return a 2-tuple: def run(self, f: Callable, *args, **kwds) -> Tuple[Any, ExecutionContext]: """Run the given function in this execution context Returns a 2-tuple containing the function result and the execution context that was active when the function returned. """ That way next(itr) will be able to update itr.__private_context__ appropriately if it was initially set and the call changes the active context. We could then give send(), throw() and their asynchronous counterparts the builtin+protocol method treatment, and put the EC manipulation in their builtins as well. Anyway, potentially a useful option to consider as you work on revising the proposal - I'll refrain from further comments until you have an updated draft available :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, Thanks for writing this! You reminded me that it's crucial to have an ability to fully recreate generator behaviour in an iterator. Besides this being a requirement for a complete EC model, it is something that compilers like Cython absolutely need. I'm still working on a rewrite (which is now a completely different PEP), will probably finish it today. Yury

Nathaniel, Nick, I'll reply only to point 9 in this email to split this threads into manageable sub-threads. I'll cover other points in later emails. On Sat, Aug 12, 2017 at 3:54 AM, Nathaniel Smith <njs@pobox.com> wrote:
9. OK, my big question, about semantics.
FWIW I took me a good hour to fully understand what you are doing with "fail_after" and what you want from PEP 550, and the actual associated problems with generators :)
As you yourself show below, it's easy to implement stacks with the proposed EC spec. A linked list will work good enough.
Right. So the task always knows the EC at the point of "yield". It can then get the latest timeout from it and act accordingly if that yield did not resume in time. This should work.
As I tried to explain in my last email, I generally don't believe that people would do this partial iteration with timeouts or other contexts around it. The only use case I can come up so far is implementing some sort of receiver using an AG, and then "listening" on it through "__anext__" calls. But the case is interesting nevertheless, and maybe we can fix it without relaxing any guarantees of the PEP. The idea that I have is to allow linking of ExecutionContext (this is similar in a way to what Nick proposed, but has a stricter semantics): 1. The internal ExecutionContext object will have a new "back" attribute. 2. For regular code and coroutines everything that is already in the PEP will stay the same. 3. For generators and asynchronous generators, when a generator is created, an empty ExecutionContext will be created for it, with its "back" attribute pointing to the current EC. 4. The lookup function will be adjusted to to check the "EC.back" if the key is not found in the current EC. 5. The max level of "back" chain will be 1. 6. When a generator is created inside another generator, it will inherit another generator's EC. Because contexts are immutable this should be OK. 7. When a coroutine is created inside an EC with a "back" link, it will merge EC and EC.back in one new EC. Merge can be done very efficiently for HAMT mappings which I believe we will end up using for this anyways (an O(log32 N) operation). An illustration of what it will allow: def gen(): yield with context(key='spam'): yield yield g = gen() context(key=1) g.send(None) # The code around first yield will see "key=1" context(key=2) g.send(None) # The code around second yield will see "key=spam" context(key=3) g.send(None) # The code around thrird yield will see "key=3" Essentially, it makes generators "transparent" to the outside context changes, but OTOH fully isolate their local context changes from the outside world. This should solve the "fail_after" over a generator case. Nathaniel and Nick, what do you think? Yury

On 12 August 2017 at 08:37, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
The fully rendered version is also up now: https://www.python.org/dev/peps/pep-0550/ Thanks for this! The general approach looks good to me, so I just have some questions about specifics of the API: 1. Are you sure you want to expose the CoW type to pure Python code? The draft API looks fairly error prone to me, as I'm not sure of the intended differences in behaviour between the following: @contextmanager def context(x): old_x = sys.get_execution_context_item('x') sys.set_execution_context_item('x', x) try: yield finally: sys.set_execution_context_item('x', old_x) @contextmanager def context(x): old_x = sys.get_execution_context().get('x') sys.get_execution_context()['x'] = x try: yield finally: sys.get_execution_context()['x'] = old_x @contextmanager def context(x): ec = sys.get_execution_context() old_x = ec.get('x') ec['x'] = x try: yield finally: ec['x'] = old_x It seems to me that everything would be a lot safer if the *only* Python level API was a live dynamic view that completely hid the copy-on-write behaviour behind an "ExecutionContextProxy" type, such that the last two examples were functionally equivalent to each other and to the current PEP's get/set functions (rendering the latter redundant, and allowing it to be dropped from the PEP). If Python code wanted a snapshot of the current state, it would need to call sys.get_execution_context().copy(), which would give it a plain dictionary containing a shallow copy of the execution context at that particular point in time. If there's a genuine need to expose the raw copy-on-write machinery to Python level code (e.g. for asyncio's benefit), then that could be more clearly marked as "here be dragons" territory that most folks aren't going to want to touch (e.g. "sys.get_raw_execution_context()") 2. Do we need an ag_isolated_execution_context for asynchronous generators? (Modify this question as needed for the answer to the next question) 3. It bothers me that *_execution_context points to an actual execution context, while *_isolated_execution_context is a boolean. With names that similar I'd expect them to point to the same kind of object. Would it work to adjust that setting to say that rather than being an "isolated/not isolated" boolean, we instead made it a cr_back reverse pointer to the awaiting coroutine (akin to f_back in the frame stack), such that we had a doubly-linked list that defined the coroutine call stacks via their cr_await and cr_back attributes? If we did that, we'd have: Top-level Task: cr_back -> NULL (C) or None (Python) Awaited coroutine: cr_back -> coroutine that awaited this one (which would in turn have a cr_await reference back to here) coroutine.send()/throw() would then save and restore the execution context around the call if cr_back was NULL/None (equivalent to isolated==True in the current PEP), and leave it alone otherwise (equivalent to isolated==False). For generators, gi_back would normally be NULL/None (since we don't typically couple regular generators to a single managing object), but could be set appropriately by types.coroutine when the generator-based coroutine is awaited, and by contextlib.contextmanager before starting the underlying generator. (It may even make sense to break the naming symmetry for that attribute, and call it something like "gi_owner", since generators don't form a clean await-based logical call chain the way native coroutines do). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 10:12 AM, Nick Coghlan <ncoghlan@gmail.com> wrote: [..]
1. Are you sure you want to expose the CoW type to pure Python code?
Ultimately, why not? The execution context object you get with sys.get_execution_context() is yours to change. Any change to it won't be propagated anywhere, unless you execute something in that context with ExecutionContext.run or set it as a current one.
This one (the second example) won't do anything.
This one (the third one) won't do anything either. You can do this: ec = sys.get_execution_context() ec['x'] = x ec.run(my_function) or `sys.set_execution_context(ec)`
So there's no copy-on-write exposed to Python actually. What I am thinking about, though, is that we might not need the sys.set_execution_context() function. If you want to run something with a modified or empty execution context, do it through ExecutionContext.run method.
Yes, we'll need it for contextlib.asynccontextmanager at least.
I think we touched upon this in a parallel thread. But I think we can rename "gi_isolated_execution_context" to "gi_execution_context_isolated" or something more readable/obvious. Yury

Good work Yuri, going for all in one will help to not increase the diferences btw async and the sync world in Python. I do really like the idea of the immutable dicts, it makes easy inherit the context btw tasks/threads/whatever without put in risk the consistency if there is further key colisions. Ive just take a look at the asyncio modifications. Correct me if Im wrong, but the handler strategy has a side effect. The work done to save and restore the context will be done twice in some situations. It would happen when the callback is in charge of execute a task step, once by the run in context method and the other one by the coroutine. Is that correct? El 12/08/2017 00:38, "Yury Selivanov" <yselivanov.ml@gmail.com> escribió: Hi, This is a new PEP to implement Execution Contexts in Python. The PEP is in-flight to python.org, and in the meanwhile can be read on GitHub: https://github.com/python/peps/blob/master/pep-0550.rst (it contains a few diagrams and charts, so please read it there.) Thank you! Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov <yury@magic.io> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for isolating state of generators or asynchronous code because such code shares a single thread. Rationale ========= Traditionally a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. The above code will not work correctly, if a user iterates over the ``calculate()`` generator with different precisions in parallel:: g1 = calculate(100) g2 = calculate(50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision, # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, limited to be supported only by code that was explicitly enabled to work with them. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no option to pass the state explicitly. With this PEP implemented, it should be possible to update a context manager like the below:: _local = threading.local() @contextmanager def context(x): old_x = getattr(_local, 'x', None) _local.x = x try: yield finally: _local.x = old_x to a more robust version that can be reliably used in generators and async/await code, with a simple transformation:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) Specification ============= This proposal introduces a new concept called Execution Context (EC), along with a set of Python APIs and C APIs to interact with it. EC is implemented using an immutable mapping. Every modification of the mapping produces a new copy of it. To illustrate what it means let's compare it to how we work with tuples in Python:: a0 = () a1 = a0 + (1,) a2 = a1 + (2,) # a0 is an empty tuple # a1 is (1,) # a2 is (1, 2) Manipulating an EC object would be similar:: a0 = EC() a1 = a0.set('foo', 'bar') a2 = a1.set('spam', 'ham') # a0 is an empty mapping # a1 is {'foo': 'bar'} # a2 is {'foo': 'bar', 'spam': 'ham'} In CPython, every thread that can execute Python code has a corresponding ``PyThreadState`` object. It encapsulates important runtime information like a pointer to the current frame, and is being used by the ceval loop extensively. We add a new field to ``PyThreadState``, called ``exec_context``, which points to the current EC object. We also introduce a set of APIs to work with Execution Context. In this section we will only cover two functions that are needed to explain how Execution Context works. See the full list of new APIs in the `New APIs`_ section. * ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` in the EC of the executing thread. If not found, return ``default``. * ``sys.set_execution_context_item(key, value)``: get the current EC of the executing thread. Add a ``key``/``value`` item to it, which will produce a new EC object. Set the new object as the current one for the executing thread. In pseudo-code:: tstate = PyThreadState_GET() ec = tstate.exec_context ec2 = ec.set(key, value) tstate.exec_context = ec2 Note, that some important implementation details and optimizations are omitted here, and will be covered in later sections of this PEP. Now let's see how Execution Contexts work with regular multi-threaded code, generators, and coroutines. Regular & Multithreaded Code ---------------------------- For regular Python code, EC behaves just like a thread-local. Any modification of the EC object produces a new one, which is immediately set as the current one for the thread state. .. figure:: pep-0550/functions.png :align: center :width: 90% Figure 1. Execution Context flow in a thread. As Figure 1 illustrates, if a function calls ``set_execution_context_item()``, the modification of the execution context will be visible to all subsequent calls and to the caller:: def set_foo(): set_execution_context_item('foo', 'spam') set_execution_context_item('foo', 'bar') print(get_execution_context_item('foo')) set_foo() print(get_execution_context_item('foo')) # will print: # bar # spam Coroutines ---------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine can be viewed as a different calling convention: Tasks are similar to threads, and awaiting on coroutines within a Task is similar to calling functions within a thread. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same thread. To achieve this, a small set of modifications to the coroutine object is needed: * When a coroutine object is instantiated, it saves a reference to the current execution context object to its ``cr_execution_context`` attribute. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro->cr_isolated_execution_context: # Save a reference to the current execution context old_context = tstate->execution_context # Set our saved execution context as the current # for the current thread. tstate->execution_context = coro->cr_execution_context try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) finally: # Save a reference to the updated execution_context. # We will need it later, when `.send()` or `.throw()` # are called again. coro->cr_execution_context = tstate->execution_context # Restore thread's execution context to what it was before # invoking this coroutine. tstate->execution_context = old_context else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) * ``cr_isolated_execution_context`` is a new attribute on coroutine objects. Set to ``True`` by default, it makes any execution context modifications performed by coroutine to stay visible only to that coroutine. When Python interpreter sees an ``await`` instruction, it flips ``cr_isolated_execution_context`` to ``False`` for the coroutine that is about to be awaited. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task. Because the top-level coroutine (Task) cannot be scheduled with ``await`` (in asyncio you need to call ``loop.create_task()`` or ``asyncio.ensure_future()`` to schedule a Task), all execution context modifications are guaranteed to stay within the Task. * We always work with ``tstate->exec_context``. We use ``coro->cr_execution_context`` only to store coroutine's execution context when it is not executing. Figure 2 below illustrates how execution context mutations work with coroutines. .. figure:: pep-0550/coroutines.png :align: center :width: 90% Figure 2. Execution Context flow in coroutines. In the above diagram: * When "coro1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When it awaits on "coro2", any subsequent changes it does to the execution context are visible to "coro1", but not outside of it. In code:: async def inner_foo(): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', 2) async def foo(): print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 1) await inner_foo() print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) asyncio.get_event_loop().run_until_complete(foo()) print('main:', get_execution_context_item('key')) which will output:: main: spam foo: spam inner_foo: 1 foo: 2 main: spam Generator-based coroutines (generators decorated with ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as native coroutines with regards to execution context management: their ``yield from`` expression is semantically equivalent to ``await``. Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed to the end, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators share 99% of their implementation with coroutines, and thus have similar new attributes ``gi_execution_context`` and ``gi_isolated_execution_context``. Similar to coroutines, generators save a reference to the current execution context when they are instantiated. The have the same implementation of ``.send()`` and ``.throw()`` methods. The only difference is that ``gi_isolated_execution_context`` is always set to ``True``, and is never modified by the interpreter. ``yield from o`` expression in regular generators that are not decorated with ``types.coroutine``, is semantically equivalent to ``for v in o: yield v``. .. figure:: pep-0550/generators.png :align: center :width: 90% Figure 3. Execution Context flow in a generator. In the above diagram: * When "gen1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When "gen2" is created, it saves a reference to the current execution context for it -- "2.1". * Any subsequent execution context updated in "gen2" will only be visible to "gen2". * Likewise, any context changes that "gen1" will do after it created "gen2" will not be visible to "gen2". In code:: def inner_foo(): for i in range(3): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', i) yield i def foo(): set_execution_context_item('key', 'spam') print('foo:', get_execution_context_item('key')) inner = inner_foo() while True: val = next(inner, None) if val is None: break yield val print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) list(foo()) print('main:', get_execution_context_item('key')) which will output:: main: ham foo: spam inner_foo: spam foo: spam inner_foo: 0 foo: spam inner_foo: 1 foo: spam main: ham As we see, any modification of the execution context in a generator is visible only to the generator itself. There is one use-case where it is desired for generators to affect the surrounding execution context: ``contextlib.contextmanager`` decorator. To make the following work:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) we modified ``contextmanager`` to flip ``gi_isolated_execution_context`` flag to ``False`` on its generator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Since Execution Context is implemented on top of ``PyThreadState``, it's easy to add transparent support of it to greenlet. New APIs ======== Even though this PEP adds a number of new APIs, please keep in mind, that most Python users will likely ever use only two of them: ``sys.get_execution_context_item()`` and ``sys.set_execution_context_item()``. Python ------ 1. ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` for the current Execution Context. If not found, return ``default``. 2. ``sys.set_execution_context_item(key, value)``: set ``key``/``value`` item for the current Execution Context. If ``value`` is ``None``, the item will be removed. 3. ``sys.get_execution_context()``: return the current Execution Context object: ``sys.ExecutionContext``. 4. ``sys.set_execution_context(ec)``: set the passed ``sys.ExecutionContext`` instance as a current one for the current thread. 5. ``sys.ExecutionContext`` object. Implementation detail: ``sys.ExecutionContext`` wraps a low-level ``PyExecContextData`` object. ``sys.ExecutionContext`` has a mutable mapping API, abstracting away the real immutable ``PyExecContextData``. * ``ExecutionContext()``: construct a new, empty, execution context. * ``ec.run(func, *args)`` method: run ``func(*args)`` in the ``ec`` execution context. * ``ec[key]``: lookup ``key`` in ``ec`` context. * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and ``ec.copy()`` are similar to that of ``dict`` object. C API ----- C API is different from the Python one because it operates directly on the low-level immutable ``PyExecContextData`` object. 1. New ``PyThreadState->exec_context`` field, pointing to a ``PyExecContextData`` object. 2. ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` similar to ``sys.set_execution_context_item()`` and ``sys.get_execution_context_item()``. 3. ``PyThreadState_GetExecContext``: similar to ``sys.get_execution_context()``. Always returns an ``PyExecContextData`` object. If ``PyThreadState->exec_context`` is ``NULL`` an new and empty one will be created and assigned to ``PyThreadState->exec_context``. 4. ``PyThreadState_SetExecContext``: similar to ``sys.set_execution_context()``. 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` object. 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. The exact layout ``PyExecContextData`` is private, which allows to switch it to a different implementation later. More on that in the `Implementation Details`_ section. Modifications in Standard Library ================================= * ``contextlib.contextmanager`` was updated to flip the new ``gi_isolated_execution_context`` attribute on the generator. * ``asyncio.events.Handle`` object now captures the current execution context when it is created, and uses the saved execution context to run the callback (with ``ExecutionContext.run()`` method.) This makes ``loop.call_soon()`` to run callbacks in the execution context they were scheduled. No modifications in ``asyncio.Task`` or ``asyncio.Future`` were necessary. Some standard library modules like ``warnings`` and ``decimal`` can be updated to use new execution contexts. This will be considered in separate issues if this PEP is accepted. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Performance =========== Implementation Details ---------------------- The new ``PyExecContextData`` object is wrapping a ``dict`` object. Any modification requires creating a shallow copy of the dict. While working on the reference implementation of this PEP, we were able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for details. .. figure:: pep-0550/dict_copy.png :align: center :width: 100% Figure 4. Figure 4 shows that the performance of immutable dict implemented with shallow copying is expectedly O(n) for the ``set()`` operation. However, this is tolerable until dict has more than 100 items (1 ``set()`` takes about a microsecond.) Judging by the number of modules that need EC in Standard Library it is likely that real world Python applications will use significantly less than 100 execution context variables. The important point is that the cost of accessing a key in Execution Context is always O(1). If the ``set()`` operation performance is a major concern, we discuss alternative approaches that have O(1) or close ``set()`` performance in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and `Copy-on-write Execution Context`_ sections. Generators and Coroutines ------------------------- Using a microbenchmark for generators and coroutines from :pep:`492` ([12]_), it was possible to observe 0.5 to 1% performance degradation. asyncio echoserver microbechmarks from the uvloop project [13]_ showed 1-1.5% performance degradation for asyncio code. asyncpg benchmarks [14]_, that execute more code and are closer to a real-world application did not exhibit any noticeable performance change. Overall Performance Impact -------------------------- The total number of changed lines in the ceval loop is 2 -- in the ``YIELD_FROM`` opcode implementation. Only performance of generators and coroutines can be affected by the proposal. This was confirmed by running Python Performance Benchmark Suite [15]_, which demonstrated that there is no difference between 3.7 master branch and this PEP reference implementation branch (full benchmark results can be found here [16]_.) Design Considerations ===================== Alternative Immutable Dict Implementation ----------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()`` and ``get()`` operations, which will be essentially O(1) for relatively small mappings in EC. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550/hamt_vs_dict.png :align: center :width: 100% Figure 5. Benchmark code can be found here: [9]_. Figure 5 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550/lookup_hamt.png :align: center :width: 100% Figure 6. Benchmark code can be found here: [10]_. Figure 6 below shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that the current approach with implementing an immutable mapping with shallow-copying dict will likely perform adequately in real-life applications. The HAMT solution is more future proof, however. The proposed API is designed in such a way that the underlying implementation of the mapping can be changed completely without affecting the Execution Context `Specification`_, which allows us to switch to HAMT at some point if necessary. Copy-on-write Execution Context ------------------------------- The implementation of Execution Context in .NET is different from this PEP. .NET uses copy-on-write mechanism and a regular mutable mapping. One way to implement this in CPython would be to have two new fields in ``PyThreadState``: * ``exec_context`` pointing to the current Execution Context mapping; * ``exec_context_copy_on_write`` flag, set to ``0`` initially. The idea is that whenever we are modifying the EC, the copy-on-write flag is checked, and if it is set to ``1``, the EC is copied. Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` methods described in the `Coroutines`_ section will be almost the same, except that in addition to the ``gi_execution_context`` they will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine or a generator starts, the flag will be set to ``1``. This will ensure that any modification of the EC performed within a coroutine or a generator will be isolated. This approach has one advantage: * For Execution Context that contains a large number of items, copy-on-write is a more efficient solution than the shallow-copy dict approach. However, we believe that copy-on-write disadvantages are more important to consider: * Copy-on-write behaviour for generators and coroutines makes EC semantics less predictable. With immutable EC approach, generators and coroutines always execute in the EC that was current at the moment of their creation. Any modifications to the outer EC while a generator or a coroutine is executing are not visible to them:: def generator(): yield 1 print(get_execution_context_item('key')) yield 2 set_execution_context_item('key', 'spam') gen = iter(generator()) next(gen) set_execution_context_item('key', 'ham') next(gen) The above script will always print 'spam' with immutable EC. With a copy-on-write approach, the above script will print 'ham'. Now, consider that ``generator()`` was refactored to call some library function, that uses Execution Context:: def generator(): yield 1 some_function_that_uses_decimal_context() print(get_execution_context_item('key')) yield 2 Now, the script will print 'spam', because ``some_function_that_uses_decimal_context`` forced the EC to copy, and ``set_execution_context_item('key', 'ham')`` line did not affect the ``generator()`` code after all. * Similarly to the previous point, ``sys.ExecutionContext.run()`` method will also become less predictable, as ``sys.get_execution_context()`` would still return a reference to the current mutable EC. We can't modify ``sys.get_execution_context()`` to return a shallow copy of the current EC, because this would seriously harm performance of ``asyncio.call_soon()`` and similar places, where it is important to propagate the Execution Context. * Even though copy-on-write requires to shallow copy the execution context object less frequently, copying will still take place in coroutines and generators. In which case, HAMT approach will perform better for medium to large sized execution contexts. All in all, we believe that the copy-on-write approach introduces very subtle corner cases that could lead to bugs that are exceptionally hard to discover and fix. The immutable EC solution in comparison is always predictable and easy to reason about. Therefore we believe that any slight performance gain that the copy-on-write solution might offer is not worth it. Faster C API ------------ Packages like numpy and standard library modules like decimal need to frequently query the global state for some local context configuration. It is important that the APIs that they use is as fast as possible. The proposed ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` functions need to get the current thread state with ``PyThreadState_GET()`` (fast) and then perform a hash lookup (relatively slow). We can eliminate the hash lookup by adding three additional C API functions: * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` introduced :pep:`523`. The idea is to request a unique index that can later be used to lookup context items. The ``key_name`` can later be used by ``sys.ExecutionContext`` to introspect items added with this API. * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` to request an item by its index, avoiding the cost of hash lookup. Why setting a key to None removes the item? ------------------------------------------- Consider a context manager:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) With ``set_execution_context_item(key, None)`` call removing the ``key``, the user doesn't need to write additional code to remove the ``key`` if it wasn't in the execution context already. An alternative design with ``del_execution_context_item()`` method would look like the following:: @contextmanager def context(x): not_there = object() old_x = get_execution_context_item('x', not_there) set_execution_context_item('x', x) try: yield finally: if old_x is not_there: del_execution_context_item('x') else: set_execution_context_item('x', old_x) Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __suspend__(self): set_execution_context_item('x', self.old_x) def __resume__(self): set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. It also does not solve the leaking state problem for greenlet/gevent. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Reference Implementation ======================== The reference implementation can be found here: [11]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading. executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Finally got an almost decent internet connection. Seeing the changes related to that PEP I can confirm that the context will be saved twice in any "task switch" in an Asyncio environment. Once made by the run in context function executed by the Handler [1] and immediately after by the send [2] method belonging to the coroutine that belongs to that task. Formally from my understanding, there is no use of the context in the Asyncio layer, at least nowadays. Saving the context at the moment to schedule a Task is, at first sight, useless and might have a performance impact. Don't you think that this edge case that happens a lot might be in somehow optimized? Am I missing something? [1] https://github.com/1st1/cpython/blob/pep550/Lib/asyncio/events.py#L124 [2] https://github.com/1st1/cpython/blob/pep550/Lib/asyncio/tasks.py#L176 On Sat, Aug 12, 2017 at 11:03 PM, Pau Freixes <pfreixes@gmail.com> wrote:
-- --pau

Hi Pau, Re string keys collisions -- I decided to update the PEP to follow Nathaniel's suggestion to use a get_context_key api, which will eliminate this problem entirely. Re call_soon in asyncio.Task -- yes, it does use ec.run() to invoke coroutine.send(). However, this has almost no visible effect, as ExecutionContext.run() is a very cheap operation (think 1-2 function calls). It's possible to add a new keyword arg to call_soon like "ignore_execution_context" to eliminate even this small overhead, but this is something we can easily do later. Yury

I had an idea for an alternative API that exposes the same functionality/semantics as the current draft, but that might have some advantages. It would look like: # a "context item" is an object that holds a context-sensitive value # each call to create_context_item creates a new one ci = sys.create_context_item() # Set the value of this item in the current context ci.set(value) # Get the value of this item in the current context value = ci.get() value = ci.get(default) # To support async libraries, we need some way to capture the whole context # But an opaque token representing "all context item values" is enough state_token = sys.current_context_state_token() sys.set_context_state_token(state_token) coro.cr_state_token = state_token # etc. The advantages are: - Eliminates the current PEP's issues with namespace collision; every context item is automatically distinct from all others. - Eliminates the need for the None-means-del hack. - Lets the interpreter hide the details of garbage collecting context values. - Allows for more implementation flexibility. This could be implemented directly on top of Yury's current prototype. But it could also, for example, be implemented by storing the context values in a flat array, where each context item is assigned an index when it's allocated. In the current draft this is suggested as a possible extension for particularly performance-sensitive users, but this way we'd have the option of making everything fast without changing or extending the API. As precedent, this is basically the API that low-level thread-local storage implementations use; see e.g. pthread_key_create, pthread_getspecific, pthread_setspecific. (And the allocate-an-index-in-a-table is the implementation that fast thread-local storage implementations use too.) -n On Fri, Aug 11, 2017 at 3:37 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

Yes, I considered this idea myself, but ultimately rejected it because: 1. Current solution makes it easy to introspect things. Get the current EC and print it out. Although the context item idea could be extended to `sys.create_context_item('description')` to allow that. 2. What if we want to pickle the EC? If all items in it are pickleable, it's possible to dump the EC, send it over the network, and re-use in some other process. It's not something I want to consider in the PEP right now, but it's something that the current design theoretically allows. AFAIU, `ci = sys.create_context_item()` context item wouldn't be possible to pickle/unpickle correctly, no? Some more comments: On Sat, Aug 12, 2017 at 7:35 PM, Nathaniel Smith <njs@pobox.com> wrote: [..]
TBH I think that the collision issue is slightly exaggerated.
- Eliminates the need for the None-means-del hack.
I consider Execution Context to be an API, not a collection. It's an important distinction, If you view it that way, deletion on None is doesn't look that esoteric.
- Lets the interpreter hide the details of garbage collecting context values.
I'm not sure I understand how the current PEP design is bad from the GC standpoint. Or how this proposal can be different, FWIW.
You still want to have this optimization only for *some* keys. So I think a separate API is still needed. Yury

On Sat, Aug 12, 2017 at 6:27 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
My first draft actually had the description argument :-). But then I deleted it on the grounds that there's also no way to introspect a list of all threading.local objects, and no-one seems to be bothered by that, so why should we bother here. Obviously it'd be trivial to add though, yeah; I don't really care either way.
That's true. In this API, supporting pickling would require some kind of opt-in on the part of EC users. But... pickling would actually need to be opt-in anyway. Remember, the set of all EC items is a piece of global shared state; we expect new entries to appear when random 3rd party libraries are imported. So we have no idea what is in there or what it's being used for. Blindly pickling the whole context will lead to bugs (when code unexpectedly ends up with context that wasn't designed to go across processes) and crashes (there's no guarantee that all the objects are even pickleable). If we do decide we want to support this in the future then we could add a generic opt-in mechanism something like: MY_CI = sys.create_context_item(__name__, "MY_CI", pickleable=True) But I'm not sure that it even make sense to have a global flag enabling pickle. Probably it's better to have separate flags to opt-in to different libraries that might want to pickle in different situations for different reasons: pickleable-by-dask, pickleable-by-curio.run_in_process, ... And that's doable without any special interpreter support. E.g. you could have curio.Local(pickle=True) coordinate with curio.run_in_process.
Deletion on None is still a special case that API users need to remember, and it's a small footgun that you can't just take an arbitrary Python object and round-trip it through the context. Obviously these are both APIs and they can do anything that makes sense, but all else being equal I prefer APIs that have fewer special cases :-).
When the ContextItem object becomes unreachable and is collected, then the interpreter knows that all of the values associated with it in different contexts are also unreachable and can be collected. I mentioned this in my email yesterday -- look at the hoops threading.local jumps through to avoid breaking garbage collection. This is closely related to the previous point, actually -- AFAICT the only reason why it *really* matters that None deletes the item is that you need to be able to delete to free the item from the dictionary, which only matters if you want to dynamically allocate keys and then throw them away again. In the ContextItem approach, there's no need to manually delete the entry, you can just drop your reference to the ContextItem and the the garbage collector take care of it.
Wait, why is it a requirement that some keys be slow? That seems like weird requirement :-). -n -- Nathaniel J. Smith -- https://vorpus.org

As far as providing a thread-local like surrogate for coroutine based systems in Python, we had to solve this for Twisted with https://bitbucket.org/hipchat/txlocal. Because of the way the Twisted threadpooling works we also had to make a context system that was both coroutine and thread safe at the same time. We have a similar setup for asyncio but it seems we haven't open sourced it. I'll ask around for it if this group feels that an asyncio example would be beneficial. We implemented both of these in plain-old Python so they should be compatible beyond CPython. It's been over a year since I was directly involved with either of these projects, but added memory and CPU consumption were stats we watched closely and we found a negligible increase in both as we rolled out async context. On Sat, Aug 12, 2017 at 9:16 PM Nathaniel Smith <njs@pobox.com> wrote:

On 13 August 2017 at 12:15, Nathaniel Smith <njs@pobox.com> wrote:
In the TLS/TSS case, we have the design constraint of wanting to use the platform provided TLS/TSS implementation when available, and standard C APIs generally aren't designed to support rich runtime introspection from regular C code - instead, they expect the debugger, compiler, and standard library to be co-developed such that the debugger knows how to figure out where the latter two have put things at runtime.
Obviously it'd be trivial to add though, yeah; I don't really care either way.
As noted in my other email, I like the idea of making the context dependent state introspection API clearly distinct from the core context dependent state management API. That way the API implementation can focus on using the most efficient data structures for the purpose, rather than being limited to the most efficient data structures that can readily export a Python-style mapping interface. The latter can then be provided purely for introspection purposes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 9:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Excellent point.
Also an excellent point :-). -n -- Nathaniel J. Smith -- https://vorpus.org

On 13 August 2017 at 11:27, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
I think the TLS/TSS precedent means we should seriously consider the ContextItem + ContextStateToken approach for the core low level API. We also have a long history of pain and quirks arising from the locals() builtin being defined as returning a mapping even though function locals are managed as a linear array, so if we can avoid that for the execution context, it will likely be beneficial for both end users (due to less quirky runtime behaviour, especially across implementations) and language implementation developers (due to a reduced need to make something behave like an ordinary mapping when it really isn't). If we decide we want a separate context introspection API (akin to inspect.getcouroutinelocals() and inspect.getgeneratorlocals()), then an otherwise opaque ContextStateToken would be sufficient to enable that. Even if we don't need it for any other reason, having such an API available would be desirable for the regression test suite. For example, if context items are hashable, we could have the following arrangement: # Create new context items sys.create_context_item(name) # Opaque token for the current execution context sys.get_context_token() # Switch the current execution context to the given one sys.set_context(context_token) # Snapshot mapping context items to their values in given context sys.get_context_items(context_token) As Nathaniel suggestion, getting/setting/deleting individual items in the current context would be implemented as methods on the ContextItem objects, allowing the return value of "get_context_items" to be a plain dictionary, rather than a special type that directly supported updates to the underlying context.
As Nathaniel notes, cooperative partial pickling will be possible regardless of how the low level API works, and starting with a simpler low level API still doesn't rule out adding features like this at a later date. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: [..]
The current PEP 550 design returns a "snapshot" of the current EC with sys.get_execution_context(). I.e. if you do ec = sys.get_execution_context() ec['a'] = 'b' # sys.get_execution_context_item('a') will return None You did get a snapshot and you modified it -- but your modifications are not visible anywhere. You can run a function in that modified EC with `ec.run(function)` and that function will see that new 'a' key, but that's it. There's no "magical" updates to the underlying context. Yury

For what it's worth, as part of prompt_toolkit 2.0, I implemented something very similar to Nathaniel's idea some time ago. It works pretty well, but I don't have a strong opinion against an alternative implementation. - The active context is stored as a monotonically increasing integer. - For each local, the actual values are stored in a dictionary that maps the context ID to the value. (Could cause a GC issue - I'm not sure.) - Every time when an executor is started, I have to wrap the callable in a context manager that applies the current context to that thread. - When a new 'Future' is created, I grab the context ID and apply it to the callbacks when the result is set. https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad942... https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad942... FYI: In my case, I did not want to pass the currently active "Application" object around all of the code. But when I started supporting telnet, multiple applications could be alive at once, each with a different I/O backend. Therefore the active application needed to be stored in a kind of executing context. When PEP550 gets approved I'll probably make this compatible. It should at least be possible to run prompt_toolkit on the asyncio event loop. Jonathan 2017-08-13 1:35 GMT+02:00 Nathaniel Smith <njs@pobox.com>:

Hi Jonathan, Thanks for the feedback. I'll update the PEP to use Nathaniel's idea of of `sys.get_context_key`. It will be a pretty similar API to what you currently have in prompt_toolkit. Yury

Yury Selivanov wrote:
This is a new PEP to implement Execution Contexts in Python.
It dawns on me that I might be able to use ECs to do a better job of implementing flufl.i18n's translation contexts. I think this is another example of what the PEP's abstract describes as "Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings;" The _ object maintains a stack of the language codes being used, and you can push a new code onto the stack (typically using `with` so they get automatically popped when exiting). The use case for this is translating say a notification to multiple recipients in the same request, one who speaks French, one who speaks German, and another that speaks English. The problem is that _ is usually a global in a typical application, so in an async environment, if one request is translating to 'fr', another might be translating to 'de', or even a deferred context (e.g. because you want to mark a string but not translate it until some later use). While I haven't used it in an async environment yet, the current approach probably doesn't work very well, or at all. I'd probably start by recommending a separate _ object in each thread, but that's less convenient to use in practice. It seems like it would be better to either attach an _ object to each EC, or to implement the stack of codes in the EC and let the global _ access that stack. It feels a lot like `let` in lisp, but without the implicit addition of the contextual keys into the local namespace. E.g. in a PEP 550 world, you'd have to explicitly retrieve the key/values from the EC rather than have them magically appear in the local namespace, the former of course being the Pythonic way to do it. Cheers, -Barry

Hi Barry, Yes, i18n is another use-case for execution context, and ec should be a perfect fit for it. Yury
participants (11)
-
Antoine Rozo
-
Barry Warsaw
-
Eric Snow
-
Guido van Rossum
-
Jelle Zijlstra
-
Jonathan Slenders
-
Kevin Conway
-
Nathaniel Smith
-
Nick Coghlan
-
Pau Freixes
-
Yury Selivanov