[Python-ideas] PEP 550 v2

Wed Aug 16 03:18:23 EDT 2017

On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> Hi,
>
> Here's the PEP 550 version 2.

Awesome!

Some of the changes from v1 to v2 might be a bit confusing -- in
particular the thing where ExecutionContext is now a stack of
LocalContext objects instead of just being a mapping. So here's the
big picture as I understand it:

In discussions on the mailing list and off-line, we realized that the
main reason people use "thread locals" is to implement fake dynamic
scoping. Of course, generators/async/await mean that currently it's
impossible to *really* fake dynamic scoping in Python -- that's what
PEP 550 is trying to fix. So PEP 550 v1 essentially added "generator
locals" as a refinement of "thread locals". But... it turns out that
"generator locals" aren't enough to properly implement dynamic scoping
either! So the goal in PEP 550 v2 is to provide semantics strong
enough to *really* get this right.

I wrote up some notes on what I mean by dynamic scoping, and why
neither thread-locals nor generator-locals can fake it:

    https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb

> Specification
> =============
>
> Execution Context is a mechanism of storing and accessing data specific
> to a logical thread of execution.  We consider OS threads,
> generators, and chains of coroutines (such as ``asyncio.Task``)
> to be variants of a logical thread.
>
> In this specification, we will use the following terminology:
>
> * **Local Context**, or LC, is a key/value mapping that stores the
>   context of a logical thread.

If you're more familiar with dynamic scoping, then you can think of an
LC as a single dynamic scope...

> * **Execution Context**, or EC, is an OS-thread-specific dynamic
>   stack of Local Contexts.

...and an EC as a stack of scopes. Looking up a ContextItem in an EC
proceeds by checking the first LC (innermost scope), then if it
doesn't find what it's looking for it checks the second LC (the
next-innermost scope), etc.

> ``ContextItem`` objects have the following methods and attributes:
>
> * ``.description``: read-only description;
>
> * ``.set(o)`` method: set the value to ``o`` for the context item
>   in the execution context.
>
> * ``.get()`` method: return the current EC value for the context item.
>   Context items are initialized with ``None`` when created, so
>   this method call never fails.

Two issues here, that both require some expansion of this API to
reveal a *bit* more information about the EC structure.

1) For trio's cancel scope use case I described in the last, I
actually need some way to read out all the values on the LocalContext
stack. (It would also be helpful if there were some fast way to check
the depth of the ExecutionContext stack -- or at least tell whether
it's 1 deep or more-than-1 deep. I know that any cancel scopes that
are in the bottommost LC will always be attached to the given Task, so
I can set up the scope->task mapping once and re-use it indefinitely.
OTOH for scopes that are stored in higher LCs, I have to check at
every yield whether they're currently in effect. And I want to
minimize the per-yield workload as much as possible.)

2) For classic decimal.localcontext context managers, the idea is
still that you save/restore the value, so that you can nest multiple
context managers without having to push/pop LCs all the time. But the
above API is not actually sufficient to implement a proper
save/restore, for a subtle reason: if you do

ci.set(ci.get())

then you just (potentially) moved the value from a lower LC up to the top LC.

Here's an example of a case where this can produce user-visible effects:

https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py

There are probably a bunch of options for fixing this. But basically
we need some API that makes it possible to temporarily set a value in
the top LC, and then restore that value to what it was before (either
the previous value, or 'unset' to unshadow a value in a lower LC). One
simple option would be to make the idiom be something like:

@contextmanager
def local_value(new_value):
    state = ci.get_local_state()
    ci.set(new_value)
    try:
        yield
    finally:
        ci.set_local_state(state)

where 'state' is something like a tuple (ci in EC[-1],
EC[-1].get(ci)). A downside with this is that it's a bit error-prone
(very easy for an unwary user to accidentally use get/set instead of
get_local_state/set_local_state). But I'm sure we can come up with
something.

> Manual Context Management
> -------------------------
>
> Execution Context is generally managed by the Python interpreter,
> but sometimes it is desirable for the user to take the control
> over it.  A few examples when this is needed:
>
> * running a computation in ``concurrent.futures.ThreadPoolExecutor``
>   with the current EC;
>
> * reimplementing generators with iterators (more on that later);
>
> * managing contexts in asynchronous frameworks (implement proper
>   EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)
>
> For these purposes we add a set of new APIs (they will be used in
> later sections of this specification):
>
> * ``sys.new_local_context()``: create an empty ``LocalContext``
>   object.
>
> * ``sys.new_execution_context()``: create an empty
>   ``ExecutionContext`` object.
>
> * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque
>   to Python code, and there are no APIs to modify them.
>
> * ``sys.get_execution_context()`` function.  The function returns a
>   copy of the current EC: an ``ExecutionContext`` instance.

If there are enough of these functions then it might make sense to
stick them in their own module instead of adding more stuff to sys. I
guess worrying about that can wait until the API details are more firm
though.

>   * If ``coro.cr_local_context`` is an empty ``LocalContext`` object
>     that ``coro`` was created with, the interpreter will set
>     ``coro.cr_local_context`` to ``None``.

I like all the ideas in this section, but this specific point feels a
bit weird. Coroutine objects need a second hidden field somewhere to
keep track of whether the object they end up with is the same one they
were created with?

If I set cr_local_context to something else, and then set it back to
the original value, does that trigger the magic await behavior or not?
What if I take the initial LocalContext off of one coroutine and
attach it to another, does that trigger the magic await behavior?

Maybe it would make more sense to have two sentinel values:
UNINITIALIZED and INHERIT?

> To enable correct Execution Context propagation into Tasks, the
> asynchronous framework needs to assist the interpreter:
>
> * When ``create_task`` is called, it should capture the current
>   execution context with ``sys.get_execution_context()`` and save it
>   on the Task object.

I wonder if it would be useful to have an option to squash this
execution context down into a single LocalContext, since we know we'll
be using it for a while and once we've copied an ExecutionContext it
becomes impossible to tell the difference between one that has lots of
internal LocalContexts and one that doesn't. This could also be handy
for trio/curio's semantics where they initialize a new task's context
to be a shallow copy of the parent task: you could do

new_task_coro.cr_local_context = get_current_context().squash()

and then skip having to wrap every send() call in a run_in_context.

> Generators
> ----------
>
> Generators in Python, while similar to Coroutines, are used in a
> fundamentally different way.  They are producers of data, and
> they use ``yield`` expression to suspend/resume their execution.
>
> A crucial difference between ``await coro`` and ``yield value`` is
> that the former expression guarantees that the ``coro`` will be
> executed fully, while the latter is producing ``value`` and
> suspending the generator until it gets iterated again.
>
> Generators, similarly to coroutines, have a ``gi_local_context``
> attribute, which is set to an empty Local Context when created.
>
> Contrary to coroutines though, ``yield from o`` expression in
> generators (that are not generator-based coroutines) is semantically
> equivalent to ``for v in o: yield v``, therefore the interpreter does
> not attempt to control their ``gi_local_context``.

Hmm. I assume you're simplifying for expository purposes, but 'yield
from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says:
"Motivation: [...] a piece of code containing a yield cannot be
factored out and put into a separate function in the same way as other
code. [...] If yielding of values is the only concern, this can be
performed without much difficulty using a loop such as 'for v in g:
yield v'. However, if the subgenerator is to interact properly with
the caller in the case of calls to send(), throw() and close(), things
become considerably more difficult. As will be seen later, the
necessary code is very complicated, and it is tricky to handle all the
corner cases correctly."

So it seems to me that the whole idea of 'yield from' is that it's
supposed to handle all the tricky bits needed to guarantee that if you
take some code out of a generator and refactor it into a subgenerator,
then everything works the same as before. This suggests that 'yield
from' should do the same magic as 'await', where by default the
subgenerator shares the same LocalContext as the parent generator.
(And as a bonus it makes things simpler if 'yield from' and 'await'
work the same.)

> Asynchronous Generators
> -----------------------
>
> Asynchronous Generators (AG) interact with the Execution Context
> similarly to regular generators.
>
> They have an ``ag_local_context`` attribute, which, similarly to
> regular generators, can be set to ``None`` to make them use the outer
> Local Context.  This is used by the new
> ``contextlib.asynccontextmanager`` decorator.
>
> The EC support of ``await`` expression is implemented using the same
> approach as in coroutines, see the `Coroutine Object Modifications`_
> section.

You showed how to make an iterator that acts like a generator. Is it
also possible to make an async iterator that acts like an async
generator? It's not immediately obvious, because you need to make sure
that the local context gets restored each time you re-enter the
__anext__ generator. I think it's something like:

class AIter:
    def __init__(self):
        self._local_context = ...

    # Note: intentionally not async
    def __anext__(self):
        coro = self._real_anext()
        coro.cr_local_context = self._local_context
        return coro

    async def _real_anext(self):
        ...

Does that look right?

> ContextItem.get() Cache
> -----------------------
>
> We can add three new fields to ``PyThreadState`` and
> ``PyInterpreterState`` structs:
>
> * ``uint64_t PyThreadState->unique_id``: a globally unique
>   thread state identifier (we can add a counter to
>   ``PyInterpreterState`` and increment it when a new thread state is
>   created.)
>
> * ``uint64_t PyInterpreterState->context_item_deallocs``: every time
>   a ``ContextItem`` is GCed, all Execution Contexts in all threads
>   will lose track of it.  ``context_item_deallocs`` will simply
>   count all ``ContextItem`` deallocations.
>
> * ``uint64_t PyThreadState->execution_context_ver``: every time
>   a new item is set, or an existing item is updated, or the stack
>   of execution contexts is changed in the thread, we increment this
>   counter.

I think this can be refined further (and I don't understand
context_item_deallocs -- maybe it's a mistake?). AFAICT the things
that invalidate a ContextItem's cache are:

1) switching threadstates
2) popping or pushing a non-empty LocalContext off the current
threadstate's ExecutionContext
3) calling ContextItem.set() on *that* context item

So I'd suggest tracking the thread state id, a counter of how many
non-empty LocalContexts have been pushed/popped on this thread state,
and a *per ContextItem* counter of how many times set() has been
called.

> Backwards Compatibility
> =======================
>
> This proposal preserves 100% backwards compatibility.

While this is mostly true in the strict sense, in practice this PEP is
useless if existing thread-local users like decimal and numpy can't
migrate to it without breaking backcompat. So maybe this section
should discuss that?

(For example, one constraint on the design is that we can't provide
only a pure push/pop API, even though that's what would be most
convenient context managers like decimal.localcontext or
numpy.errstate, because we also need to provide some backcompat story
for legacy functions like decimal.setcontext and numpy.seterr.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org