[Python-ideas] New PEP 550: Execution Context

Sun Aug 13 12:57:20 EDT 2017

[replying to the list]

On Sun, Aug 13, 2017 at 6:14 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 13 August 2017 at 16:01, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>> On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> [..]
>>> As Nathaniel suggestion, getting/setting/deleting individual items in
>>> the current context would be implemented as methods on the ContextItem
>>> objects, allowing the return value of "get_context_items" to be a
>>> plain dictionary, rather than a special type that directly supported
>>> updates to the underlying context.
>>
>> The current PEP 550 design returns a "snapshot" of the current EC with
>> sys.get_execution_context().
>>
>> I.e. if you do
>>
>> ec = sys.get_execution_context()
>> ec['a'] = 'b'
>>
>> # sys.get_execution_context_item('a') will return None
>>
>> You did get a snapshot and you modified it -- but your modifications
>> are not visible anywhere. You can run a function in that modified EC
>> with `ec.run(function)` and that function will see that new 'a' key,
>> but that's it. There's no "magical" updates to the underlying context.
>
> In that case, I think "get_execution_context()" is quite misleading as
> a name, and is going to be prone to exactly the confusion we currently
> have with the mapping returned by locals(), which is that regardless
> of whether writes to it affect the target namespace or not, it's going
> to be surprising in at least some situations.
>
> So despite being initially in favour of exposing a mapping-like API at
> the Python level, I'm now coming around to Armin Ronacher's point of
> view: the copy-on-write semantics for the active context are
> sufficiently different from any other mapping type in Python that we
> should just avoid the use of __setitem__ and __delitem__ as syntactic
> sugar entirely.

I agree. I'll be redesigning the PEP to use the following API (please
ignore the naming peculiarities, there are so many proposals at this
point that I'll just stick to something I have in my head):

1. sys.new_execution_context_key('description') -> sys.ContextItem (or
maybe we should just expose the sys.ContextItem type and let people
instantiate it?)

A key (or "token") to use with the execution context. Besides
eliminating the names collision issue, it'll also have a slightly
better performance, because its __hash__ method will always return a
constant. (Strings cache their __hash__, but other types don't).

2. ContextItem.has(), ContextItem.get(), ContextItem.set(),
ContextItem.delete() -- pretty self-explanatory.

3. sys.get_active_context() -> sys.ExecutionContext -- an immutable
object, has no methods to modify the context.

3a. sys.ExecutionContext.run(callable, *args) -- run a callable(*args)
in some execution context.

3b. sys.ExecutionContext.items() -- an iterator of ContextItem ->
value for introspection and debugging purposes.

4. No sys.set_execution_context() method.  At this point I'm not sure
it's a good idea to allow users to change the current execution
context to something else entirely.  For use cases like enabling
concurrent.futures to run your function within the current EC, you
just use the sys.get_active_context()/ExecutionContext.run
combination. If anything, we can add this function later.

> Instead, we'd lay out the essential primitive operations that *only*
> the interpreter can provide and define procedural interfaces for
> those, and if anyone wanted to build a higher level object-oriented
> interface on top of those primitives, they'd be free to do so, with
> the procedural API acting as the abstraction layer that decouples "how
> interpreters actually implement it" (e.g. copy-on-write mappings) from
> "how libraries and frameworks model it for their own use" (e.g. rich
> application context objects). That way, each interpreter would also be
> free to define their *internal* object model in whichever way made the
> most sense for them, rather than enshrining a point-in-time snaphot of
> CPython's preferred implementation model as part of the language
> definition.

I agree. I like that this idea gives us more flexibility with the
exact implementation strategy.

[..]
> The essential capabilities for active context manipulation would then be:
>
> - get_active_context_token()
> - set_active_context(context_token)

As I mentioned above, at this point I'm not entirely sure that we even
need "set_active_context".  The only useful thing for it that I can
imagine is creating a decorator that isolates any changes of the
context, but the only usecase for this I see is unittests.

But even for unittests, a better solution is to use a decorator that
detects keys that were added but not deleted during the test (leaks).

> - implicitly saving and reverting the active context around various operations

Usually we need to save/revert one particular context item, not the
whole context.

> - accessing the active context id for suspended coroutines and
> generators (so parent contexts can opt-in to seeing changes made in
> child contexts)

Yes, this might be useful, let's keep it.

>
> Running commands in a particular context *wouldn't* be a primitive
> operation given those building blocks, since you can implement that
> for yourself using the above primitives:
>
>     def run_in_context(target_context_token, func, *args, **kwds):
>         old_context_token = get_active_context_token()
>         set_active_context(target_context_token)
>         try:
>             func(*args, **kwds)
>         finally:
>             set_active_context(old_context_token)

I'd still prefer to implement this as part of the spec.  There are
some tricks that I want to use to make ExecutionContext.run() much
faster than a pure Python version.  This is a highly performance
critical part of the PEP -- call_soon in asyncio is a VERY frequent
thing.

Besides, having ExecutionContext.run eliminates the need to
sys.set_active_context() -- again, we need to discuss this, but I see
less and less utility for it now.

>
> The public manipulation API here would be deliberately based on opaque
> tokens to make it clear that creating and mutating execution contexts
> is entirely within the realm of the interpreter implementation, and
> user level code can only control *which* execution context is active
> in the current thread, not create arbitrary new execution contexts of
> its own (at least, not without writing a CPython-specific C
> extension).
>
> For manipulation of values within the active context, looking at other
> comparable APIs, I think the main prior art within the language would
> be:
>
> 1. threading.local(), which uses the descriptor protocol to handle
> arbitrary attributes
> 2. Cell variable references in function `__closure__` attributes,
> which also uses the descriptor protocol by way of the "cell_contents"
> attribute
>
> In 3.7, those two examples are being brought closer by way of
> `cell_contents` becoming a read/write attribute:
>
>     >>> def f(i):
>     ...     def g():
>     ...         nonlocal i
>     ...         return i
>     ...     return g
>     ...
>     >>> g = f(0)
>     >>> g()
>     0
>     >>> cell = g.__closure__[0]
>     >>> cell.cell_contents
>     0
>     >>> cell.cell_contents = 5
>     >>> g()
>     5
>     >>> del cell.cell_contents
>     >>> g()
>     Traceback (most recent call last):
>      ...
>     NameError: free variable 'i' referenced before assignment in enclosing scope
>     >>> cell.cell_contents = 0
>     >>> g()
>     0
>
> This is very similar to the way manipulation of entries within a
> thread local namespace works, but with each cell containing exactly
> one attribute.
>
> For context items, I agree with Nathaniel that the cell-style
> one-value-per-item approach is likely to be the way to go. To
> emphasise that changes to that attribute only affect the *active*
> context, I think "active_value" would be a good name:
>
>     >>> request_id =
> sys.create_context_item("my_web_framework.request_id", "Request
> identifier for my_web_framework")
>     >>> request_id.active_value
>     Traceback (most recent call last):
>      ...
>     RuntimeError: Context item "my_web_framework.request" not set in
> context <context token>
>     >>> request_id.active_value = "12345"
>     >>> request_id.active_value
>     '12345'

I myself prefer a functional API to to __getattr__. I don't like the
"del local.x" syntax. I don't think we are forced to follow the
threading.local() API here, aren't we?

Yury

>
> Finally, given opaque context tokens, and context items that worked
> like closure cells (only accessing the active context rather than
> lexically scoped variables), the one introspection primitive the
> *interpreter* would need to provide is either:
>
> 1. Given a context token, return a mapping from context items to their
> defined values in the given context
> 2. A way to get a listing of the context items defined in the active context
>
> Since either of those can be defined in terms of the other, my own
> preference goes to the first one, since using it to implement the
> second alternative just requires a simple
> `sys.get_active_context_token()` call, while implementing the first
> one in terms of the second one requires a helper like
> `run_in_context()` above to manipulate the active context in the
> current thread.
>
> The first one also makes it fairly straightforward to *diff* a given
> context against the active one - get the mappings for both contexts,
> check which keys they have in common, compare the values for the
> common keys, and then report on
>
> - keys that appear in one context but not the other
> - values which differ between them for common keys
> - (optionally) values which are the same for common keys
>
> Cheers,
> Nick.