[Python-ideas] PEP draft: context variables

Tue Oct 10 08:34:09 EDT 2017

On Tue, Oct 10, 2017 at 4:22 AM, Yury Selivanov <yselivanov.ml at gmail.com>
wrote:

> On Mon, Oct 9, 2017 at 8:37 PM, Koos Zevenhoven <k7hoven at gmail.com> wrote:
> > You can cause unbound growth in PEP 550 too. All you have to do is nest
> an
> > unbounded number of generators.
>
> You can only nest up to 'sys.get_recursion_limit()' number of generators.
>
> With PEP 555 you can do:
>
>   while True:
>     context_var.assign(42).__enter__()
>
>
Well, in PEP 550, you can explicitly stack an unbounded number of
LogicalContexts in a while True loop. Or you can run out of memory using
plain lists even faster:

l = [42]

while True:
    l *= 2 # ensure exponential blow-up

I don't see why your example with context_var.assign(42).__enter__() would
be any more likely.

Sure, we could limit the number of allowed nested contexts in PEP 555. I
don't really care. Just don't enter an unbounded number of context managers
without exiting them.

Really, it was my mistake to ever make you think that
context_var.assign(42).__enter__() can be compared to .set(42) in PEP 550.
I'll say it once more: PEP 555 context arguments have no equivalent of the
PEP-550 .set(..).

> > In PEP 555, nesting generators doesn't do
> > anything really, unless you actually assign to context arguments in the
> > generators. Only those who use it will pay.
>
> Same for 550.  If a generator doesn't set context variables, its LC
> will be an empty mapping (or NULL if you want to micro-optimize
> things).  Nodes for the chain will come from a freelist. The effective
> overhead for generators is a couple operations on pointers, and thus
> visible only in microbenchmarks.
>

Sure, you can implement push and pop and maintain a freelist by just doing
operations on pointers. But that would be a handful of operations. Maybe
you'd even manage to avoid INCREFs and DECREFs by not exposing things as
Python objects.

But I guarantee you, PEP 555 is simpler in this regard. In (pseudo?) C, the
per-generator and per-send overhead would come from something like:

/* On generator creation */

stack = PyThreadState_Get()->carg_stack;
Py_INCREF(stack);
self->carg_stack = stack;

----------

/* On each next / send */

stack_ptr = &PyThreadState_Get()->carg_stack;
if (*stack_ptr == self->carg_stack) {
    /* no assignments made => do nothing */
}

/* ... then after next yield */

if (*stack_ptr == self->carg_stack) {
    /* once more, do nothing */
}

And there will of course be a PyDECREF after the generator has finished or
when it is deallocated.

If the generators *do* use context argument assignments, then some stuff
would happen in the else clauses of the if statements above. (Or actually,
using != instead of ==).

> But seriously, you will always end up in a weird situation if you call an
> > unbounded number of contextmanager.__enter__() methods without calling
> > __exit__(). Nothing new about that. But entering a handful of assignment
> > contexts and leaving them open until a script ends is not the end of the
> > world. I don't think anyone should do that though.
> >
> >
> >>
> >> You'll say that it's not how the API is supposed to be used,
> >> and we say that we want to convert things like decimal and numpy to
> >> use the new mechanism.  That question was also hand-waved by you:
> >> numpy and decimal will have to come up with new/better APIs to use PEP
> >> 555.  Well, that's just not good enough.
> >
> >
> > What part of my explanation of this are you unhappy with? For instance,
> the
> > 12th (I think) email in this thread, which is my response to Nathaniel.
> > Could you reply to that and tell us your concern?
>
> I'm sorry, I'm not going to find some 12th email in some thread.  I
> stated in this thread the following: not being able to use PEP 555 to
> fix *existing* decimal & numpy APIs is not good enough.  And decimal &
> numpy is only one example, there's tons of code out there that can
> benefit from its APIs to be fixed to support for async code in Python
> 3.7.
>
>
Well, anyone interested can read that 12th email in this thread. In short,
my recommendation for libraries would be as follows:

* If the library does not provide a context manager yet, they should add
one, using PEP 555. That will then work nicely in coroutines and generators.

* If the library does have a context manager, implement it using PEP 555.
Or to be safe, add a new API function, so behavior in existing async code
won't change.

* If the library needs to support some kind of set_state(..) operation,
implement it by getting the state using a PEP 555 context argument and
mutating its contents.

* Fall back to thread-local storage if no context argument is present or
if the Python version does not support context arguments.

[...]

> >> > Some kind of
> >> > chained-lookup-like thing is inevitable if you want the state not to
> >> > leak
> >> > though yields out of the generator:
> >>
> >> No, it's not "inevitable".  In PEP 550 v1, generators captured the
> >> context when they are created and there was always only one level of
> >> context.  This means that:
> >>
> >> 1. Context changes in generators aren't visible to the outside world.
> >> 2. Changes to the context in the outside world are not visible to
> >> running generators.
> >
> >
> > Sure, if you make generators completely isolated from the outside world,
> > then you can avoid chaining-like things too. But that would just sweep it
> > under the carpet.
>
> What do you mean by "just sweep it under the carpet"?  Capturing the
> context at the moment of generators creation is a design choice with
> some consequences (that I illustrated in my previous email).  There
> are cons and pros of doing that.
>
>
"Capturing the context at generator creation" and "isolating generators
completely" are two different things.

I've described pros of the former. The latter has no pros that I'm aware
of, except if sweeping things under the carpet is considered as one.

Yes, the latter works in some use cases, but in others it does not. For
instance, if an async framework wants to make some information available
throughout the async task. If you isolate generators, then async
programmers will have to avoid generators, because they don't have access
to the information the framework is trying to provide. Also, if you
refactor your generator into subgenerators using `yield from`, the
subgenerators will not see the context set by the outer generator.

––Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171010/e508e4d1/attachment-0001.html>