[Python-Dev] Timeout for PEP 550 / Execution Context discussion

Guido van Rossum guido at python.org
Wed Oct 18 13:06:24 EDT 2017

On Tue, Oct 17, 2017 at 9:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 18 October 2017 at 05:55, Yury Selivanov <yselivanov.ml at gmail.com>
> wrote:
>> I actually like what you did in
>> https://github.com/gvanrossum/pep550/blob/master/simpler.py, it seems
>> reasonable.  The only thing that I'd change is to remove "set_ctx"
>> from the public API and add "Context.run(callable)".  This makes the
>> API more flexible to potential future changes and amendments.
> Yep, with that tweak, I like Guido's suggested API as well.

I've added the suggested Context.run() method.

> Attempting to explain why I think we want "Context.run(callable)" rather
> "context_vars.set_ctx()" by drawing an analogy to thread local storage:
> 1. In C, the compiler & CPU work together to ensure you can't access
> another thread's thread locals.

But why is that so important? I wouldn't recommend doing it, but it might
be handy for a debugger to be able to inspect a thread's thread-locals. As
it is, it seems a debugger can only access thread-locals for the thread in
which the debugger itself runs. It has better access to the real locals on
the thread's stack of frames!

> 2. In Python's thread locals API, we do the same thing: you can only get
> access to the running thread's thread locals, not anyone else's

But there's no real benefit in this. In C, I could imagine a compiler
optimizing access to thread-locals, but in Python that's moot.

> At the Python API layer, we don't expose the ability to switch explicitly
> to another thread state while remaining within the current function.
> Instead, we only offer two options: starting a new thread, and waiting for
> a thread to finish execution. The lifecycle of the thread local storage is
> then intrinsically linked to the lifecycle of the thread it belongs to.

To me this feels more a side-effect of the implementation (perhaps
inherited from C's implementation) than an intentional design.

To be clear, I think it's totally fine for *clients* of the ContextVar API
-- e.g. numpy or decimal -- to assume that their context doesn't change
arbitrarily while they're happily executing in a single frame or calling
stuff they trust not to change the context. (IOW all changes to a
particular ContextVar would be through that ContextVar object, not through
behind-the-scenes manipulation of the thread's current context).

But for *frameworks* (e.g. asyncio or Twisted) I find it simpler to think
about the context in terms of `set_ctx` and `get_ctx`, and I worry that
*hiding* these might block off certain API design patterns that some
framework might want to use -- who knows, maybe Nathaniel (who is fond of
might come up with a context manager to run a block of code in a different
context (perhaps cloned from the current one).

> That intrinsic link makes various aspects of thread local storage easier
> to reason about, since the active thread state can't change in the middle
> of a running function - even if the current thread gets suspended by the
> OS, resuming the function also implies resuming the original thread.

I don't feel reasoning would be much impaired. When reasoning about code we
make assumptions that are theoretically unsafe all the time (e.g. "nobody
will move the clock back").

> Including a "contextvars.set_ctx" API would be akin to making
> PyThreadState_Swap a public Python-level API, rather than only exposing
> _thread.start_new_thread the way we do now.

It's different for threads, because they are the bedrock of execution, and
nobody is interested in implementing their own threading framework that
doesn't build on this same bedrock.

> One reason we *don't* do that is because it would make thread locals much
> harder to reason about - every function call could have an implicit side
> effect of changing the active thread state, which would mean the thread
> locals at the start of the function could differ from those at the end of
> the function, even if the function itself didn't do anything to change them.

Hm. Threads are still hard to reason about, because for everything *but*
thread-locals there is always the possibility that it's being mutated by
another thread... So I don't think we should get our knickers twisted over
thread-local variables.

> Only offering Context.run(callable) provides a similar "the only changes
> to the execution context will be those this function, or a function it
> called, explicitly initiates" protection for context variables, and Guido's
> requested API simplifications make this aspect even easier to reason about:
> after any given function call, you can be certain of being back in the
> context you started in, because we wouldn't expose any Python level API
> that allowed an execution context switch to persist beyond the frame that
> initiated it.

And as long as you're not calling something that's a specific framework's
API for messing with the context, that's a fine assumption. I just don't
see the need to try to "enforce" this by hiding the underlying API.
(Especially since I presume that at the C API level it will still be
possible -- else how would Context.run() itself be implemented?)

> ====
> The above is my main rationale for preferring contextvars.Context.run() to
> contextvars.set_ctx(), but it's not the only reason I prefer it.
> At a more abstract design philosophy level, I think the distinction
> between symmetric and asymmetric coroutines is relevant here [2]:
> * in symmetric coroutines, there's a single operation that says "switch to
> running this other coroutine"
> * in asymmetric coroutines, there are separate operations for starting or
> resuming coroutine and for suspending the currently running one
> Python's native coroutines are asymmetric - we don't provide a "switch to
> this coroutine" primitive, we instead provide an API for starting or
> resuming a coroutine (via cr.__next__(), cr.send() & cr.throw()), and an
> API for suspending one (via await).
> The contextvars.set_ctx() API would be suitable for symmetric coroutines,
> as there's no implied notion of parent context/child context, just a notion
> of switching which context is active.
> The Context.run() API aligns better with asymmetric coroutines, as there's
> a clear distinction between the parent frame (the one initiating the
> context switch) and the child frame (the one running in the designated
> context).

Sure. But a *framework* might build something different.

> As a practical matter, Context.run also composes nicely (in combination
> with functools.partial) for use with any existing API based on submitting
> functions for delayed execution, or execution in another thread or process:
> - sched
> - concurrent.futures
> - arbitrary callback APIs
> - method based protocols (including iteration)
> By contrast, "contextvars.set_ctx" would need various wrappers to handle
> correctly reverting the context change, and would hence be prone to
> "changed the active context without changing it back" bugs (which can be
> especially fun when you're dealing with a shared pool of worker threads or
> processes).

So let's have both.

> Nick.
> [1] Technically C extensions can play games with this via
> PyThreadState_Swap, but I'm not going to worry about that here
> [2] https://stackoverflow.com/questions/41891989/what-is-
> the-difference-between-asymmetric-and-symmetric-coroutines
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171018/0cc5b348/attachment.html>

More information about the Python-Dev mailing list