[Python-ideas] A comprehension scope issue in PEP 572

Tim Peters tim.peters at gmail.com
Sat May 12 00:38:23 EDT 2018

>> ...
>> ":=" target names in a genexp/listcmp are treated exactly the same as
>> any other non-for-target name:  they resolve to the same scope as they
>> resolve to in the block that contains them.  The only twist is that if
>> such a name `x` isn't otherwise known in the block, then `x` is
>> established as being local to the block (which incidentally also
>> covers the case when the genexp/listcomp is at module level, where
>> "local to the block" and "global to the block" mean the same thing).
>> Class scope may be an exception (I cheerfully never learned anything
>> about how class scope works, because I don't write insane code ;-) ).

> That's all well and good, but it is *completely insufficient for the
> language specification*.

I haven't been trying to write reference docs here, but so far as
supplying a rigorous specification goes, I maintain the above gets
"pretty close".  It needs more words, and certainly isn't in the
_style_ of Python's current reference docs, but that's all repairable.
Don't dismiss it just because it's brief.  Comprehensions already
exist in the language, and so do nested scopes, so it's not necessary
for this PEP to repeat any of the stuff that goes into those.  Mostly
it needs to specify the scopes of assignment expression target names -
and the _intent_ here is really quite simple.

Here with more words, restricted to the case of assignment expressions
in comprehensions (the only case with any subtleties):

Consider a name `y` appearing in the top level of a comprehension as
an assignment expression target, where the comprehension is
immediately contained in scope C, and the names belonging to scopes
containing C have already been determined:

    ... (y := expression) ...

We can ignore that `y` also appears as a `for` target at the
comprehension's top level, because it was already decided that's a
compile-time error.

Consider what the scope of `y` would be if `(y := expression)` were
textually replaced by `(y)`.  Then what would the scope of `y` be?
The answer relies solely on what the docs _already_ specify.  There
are three possible answers:

1. The docs say `y` belongs to scope S (which may be C itself, or a
scope containing C).  Then y's scope in the original comprehension is

2. The docs say name `y` is unknown.  Then y's scope in the original
comprehension is C.

3. The docs are unclear about whether #1 or #2 applies.  Then the
language is _already_ ill-defined.

It doesn't matter to this whether the assignment expression is, or is
not, in the expression that defines the iterable for the outermost

What about that is hand-wavy?  Defining semantics clearly and
unambiguously doesn't require specifying a concrete implementation
(the latter is one possible way to achieve the goal - but _here_ it's
a convoluted PITA because Python has no way to explicitly declare
intended scopes).  Since all questions about scope are reduced by the
above to questions about Python's _current_ scope rules, it's as clear
and unambiguous as Python's current scope rules.

Now those may not be the _intended_ rules in all cases.  That deserves
deep scrutiny.  But claiming it's too vague to scrutinize doesn't fly
with me.  If there's a scope question you suspect can't be answered by
the above, or that the above gives an unintended answer to, by all
means bring that up!  If your question isn't about scope, then I'd
probably view it as being irrelevant to the current PEP (e.g., what
`locals()` returns depends on how the relevant code object attributes
are set, which are in turn determined by which scopes names belong to
relative to the code block's local scope, and it's certainly not
_this_ PEP's job to redefine what `locals()` does with that info).

Something to note:  for-target names appearing in the outermost `for`
_may_ have different scopes in different parts of the comprehension.

    y = 12
    [y for y in range(y)]

There the first two `y`'s have scope local to the comprehension, but
the last `y` is local to the containing block.  But an assignment
expression target name always has the same scope within a
comprehension.  In that specific sense, their scope rules are "more
elegant" than for-target names.  This isn't a new rule, but a logical
consequence of the scope-determining algorithm given above.  It's a
_conceptual_ consequence of that assignment statement targets are
"intended to act like" the bindings are performed _in_ scope C rather
than in the comprehension's scope.  And that's no conceptually weirder
than that it's _already_ the case that the expression defining the
iterable of the outermost `for` _is_ evaluated in scope C (which I'm
not a fan of, but which is rhetorically convenient to mention here ;-)

As I've said more than once already, I don't know whether this should
apply to comprehensions at class scope too - I've never used a
comprehension in class scope, and doubt I ever will.  Without use
cases I'm familiar with, I have no idea what might be most useful
there.  Best uninformed guess is that the above makes decent sense at
class scope too, especially given that I've picked up on that people
are already baffled by some comprehension behavior at class scope.  I
suspect that you already know, but find it rhetorically convenient to
pretend this is all so incredibly unclear you can't possibly guess ;-)

> For the language spec, we have to be able to tell implementation authors
> exactly how all of the "bizarre edge case"

Which are?

> that you're attempting to hand wave away

Not attempting to wave them way - don't know what you're referring to.
The proposed scope rules are defined entirely by straightforward
reference to existing scope rules - and stripped of all the excess
verbiage amount to no more than "same scope in the comprehension as in
the containing scope".

> should behave by updating
> https://docs.python.org/dev/reference/expressions.html#displays-for-lists-sets-and-dictionaries

Thanks for the link!  I hadn't seen that before.  If the PEP gets that
far, I'd think harder about how it really "ought to be" documented.  I
think, e.g., that scope issues should be more rigorously handled in
section 4.2 (which is about binding and name resolution).

> appropriately. It isn't 1995 any more - while CPython is still the reference
> implementation for Python, we're far from being the only implementation,
> which means we have to be a lot more disciplined about how much we leave up
> to the implementation to define.

What in the "more words" above was left to the implementation's
discretion?  I can already guess you don't _like_ the way it's worded,
but that's not what I'm asking about.

> The expected semantics for locals() are already sufficiently unclear that
> they're a source of software bugs (even in CPython) when attempting to run
> things under a debugger or line profiler (or anything else that sets a trace
> function). See https://www.python.org/dev/peps/pep-0558/ for details.

As above, what does that have to do with PEP 572?  The docs you
referenced as a model don't even mention `locals()` - but PEP 572

Well, fine:  from the explanation above, it's trivially deduced that
all names appearing as assignment expression targets in comprehensions
will appear as free variables in their code blocks, except for when
they resolve to the global scope.  In the former case, looks like
`locals()` will return them, despite that they're _not_ local to the
block.  But that's the same thing `locals()` does for free variables
created via any means whatsoever - it appears to add all the names in
code_object.co_freevars to the returned dict.

I have no idea why it acts that way, and wouldn't have done it that
way myself.  But if that's "a bug", it would be repaired for the PEP
572 cases at the same time and in the same way as for all other
freevars cases.

Again, the only thing at issue here is specifying intended scopes.
There's nothing inherently unique about that..

> "Comprehension scopes are already confusing, so it's OK to dial their
> weirdness all the way up to 11" is an *incredibly* strange argument to be
> attempting

That's an extreme characterization of what, in reality, is merely
specifying scopes.  That

    total = 0
    sums = [total := total + value for value in data]

blows up without the change is at least as confusing - and is more
confusing to me.

> to make when the original better defined sublocal scoping
> proposal was knocked back as being overly confusing (even after it had been
> deliberately simplified by prohibiting nonlocal access to sublocals).

I'm done arguing about this part ;-)

> Right now, the learning process for picking up the details of comprehension
> scopes goes something like this:

Who needs to do this?  I'm not denying that many people do, but is
that a significant percentage of those who merely want to _use_
comprehensions?  We already did lots of heroic stuff apparently
attempting to cater to those who _don't_ want to learn about their
implementation, like evaluating the outer iterable "at once" outside
the comprehension scope, and - indeed - bothering to create a new
scope for them at all.  Look at the "total := total + value" example
again and really try to pretend you don't know anything about the
implementation.  "It works!" is a happy experience :-)

For the rest of this message, it's an entertaining and educational
development.  I'm not clear on what it has to do with the PEP, though.

> * make the
> technically-incorrect-but-mostly-reliable-in-the-absence-of-name-shadowing
> assumption that "[x for x in data]" is semantically equivalent to a for loop
> (especially common for experienced Py2 devs where this really was the
> case!):
>     _result = []
>     for x in data:
>         _result.append(x)
> * discover that "[x for x in data]" is actually semantically equivalent to
> "list(x for x in data)" (albeit without the name lookup and optimised to
> avoid actually creating the generator-iterator)
> * make the still-technically-incorrect-but-even-more-reliable assumption
> that the generator expression "(x for x in data)" is equivalent to
>     def _genexp():
>         for x in data:
>             yield x
>     _result = _genexp()
> * *maybe* discover that even the above expansion isn't quite accurate, and
> that the underlying semantic equivalent is actually this (one way to
> discover this by accident is to have a name error in the outermost iterable
> expression):
>     def _genexp(_outermost_iter):
>         for x in _outermost_iter:
>             yield x
>     _result = _genexp(_outermost_iter)
> * and then realise that the optimised list comprehension form is essentially
> this:
>     def _listcomp(_outermost_iter):
>         result = []
>         for x in _outermost_iter:
>             result.append(x)
>         return result
>     _result = _listcomp(data)
> Now that "yield" in comprehensions has been prohibited, you've learned all
> the edge cases at that point - all of the runtime behaviour of things like
> name references, locals(), lambda expressions that close over the iteration
> variable, etc can be explained directly in terms of the equivalent functions
> and generators, so while comprehension iteration variable hiding may *seem*
> magical, it's really mostly explained by the deliberate semantic equivalence
> between the comprehension form and the constructor+genexp form. (That's
> exactly how PEP 3100 describes the change: "Have list comprehensions be
> syntactic sugar for passing an equivalent generator expression to list(); as
> a consequence the loop variable will no longer be exposed")
> As such, any proposal to have name bindings behave differently in
> comprehension and generator expression scope from the way they would behave
> in the equivalent nested function definitions *must be specified to an
> equivalent level of detail as the status quo*.

I don't see any of those Python workalike examples in the docs.  So
which "status quo" are you referring to?

You already know it's possible, and indeed straightforward, to write
functions that model the proposed scope rules in any given case, so
what;s your real point?  They're "just like" the stuff above, possibly
adding a sprinkling of "nonlocal" and/or "global" declarations.  They
don't require changing anything fundamental about the workalike
examples you've already given - just adding cruft to specify scopes.

I don't want to bother doing it here, because it's just tedious, and
you _already know_ it.  Most tediously, because there's no explicit
way to declare a non-global scope in Python, in the

    2. The docs say name `y` is unknown.  Then y's scope in the
original comprehension is C.

case it's necessary to do something like:

    if 0:
        y = None

in the scope containing the synthetic function so that the contained
"nonlocal y" declaration knows which scope `y` is intended to live in.
(The "if 0:" block is optimized out of existence, but after the
compiler has noticed the local assignment to `y` and so records that
`y` is containing-scope-local.)  Crap like that isn't really

> All of the attempts at such a definition that have been made so far have
> been riddled with action and a distance and context-dependent compilation
> requirements:
> * whether to implicitly declare the binding target as nonlocal or global
> depends on whether or not you're at module scope or inside a function

That's artificial silliness, though.  Already suggested that Python
repair one of its historical scope distinctions by teaching `nonlocal`

    nonlocal x

in a top-level function is a synonym for

    global x

in a top-level function.  In every relevant conceptual sense, the
module scope _is_ the top-level lexical scope.  It seems pointlessly
pedantic to me to insist that `nonlocal` _only_ refer to a non-global
enclosing lexical scope.  Who cares?  The user-level semantically
important part is "containing scope", not "is implemented by a cell

In the meantime, BFD.  So long as the language keyword insists on
making that distinction, ya, it's a distinction that needs to be made
by users too (and by the compiler regardless).

This isn't some inherently new burden for the compiler either.  When
it sees a not-local name in a function, it already has to figure out
whether to reference a cell or pump out a LOAD_GLOBAL opcode.

> * the desired semantics at class scope have been left largely unclear

Covered before.  Someone who knows something about _desired_ class
scope behavior needs to look at that.  That's not me.

> * the desired semantics in the case of nested comprehensions and generator
> expressions has been left entirely unclear

See the "more words" version above.  It implies that scopes need to be
resolved "outside in" for nesting of any kind.   Which they need to be
anyway, e.g., to make the "is this not-local name a cell or a global?"
distinction in any kind of function code.

> Now, there *are* ways to resolve these problems in a coherent way, and that
> would be to define "parent local scoping" as a new scope type, and introduce
> a corresponding "parentlocal NAME" compiler declaration to explicitly
> request those semantics for bound names (allowing the expansions of
> comprehensions and generator expressions as explicitly nested functions to
> be adjusted accordingly).

Sorry, I don't know what that means.  I don't even know what "compiler
declaration" alone means.  Regardless, there's nothing here that can't
be explained easily enough by utterly vanilla lexically nested scopes.
All the apparent difficulties stem from the inability to explicitly
declare a name's intended scope, and that the "nonlocal" keyword in a
top-level function currently refuses to acknowledge that the global
scope _is_ the containing not-local scope.

If you mean adding a new statement to Python

    parentlocal NAME

... sure, that could work.  But it obscures that the problem just
isn't hard enough to require such excessive novelty in Python's scope
gimmicks.  The correct place to declare NAME's scope is _in_ NAME's
intended scope, the same as in every other language with lexical

There's also that the plain English meaning of "parent local' only
applies to rule #2 at the top, and to the proper subset of cases in
rule #1 where it turns out that S is C.  In the other rule #1 cases,
"parentlocal" would be a misleading name for the less specific
"nonlocal" or the more specific "global".

Writing workalike functions by hand isn't difficult regardless, just
tedious (even without the current proposal!), and I don't view it as a
significant use case regardless.  I expect the minority who do it have
real fun with it for a day or two, and then quite possibly never
again.  Which is a fair summary of my own life ;-)

> But the PEP will need to state explicitly that that's what it is doing, and
> fully specify how those new semantics are expected to work in *all* of the
> existing scope types, not just the two where the desired behaviour is
> relatively easy to define in terms of nonlocal and global.

So you finally admit they _are_ relatively easy to define ;-)  What,
specifically, _are_ "*all" of the existing scope types"?  There are
only module, class, and function scopes in my view of the world. (and
"comprehension scope" is  just a name given at obvious times to
function scope in my view of the world).

If you also want piles of words about, e.g., how PEP 572 acts in all
cases in smaller blocks, like code typed at a shell, or strings passed
to eval() or exec(), you'll first have to explain why this was never
necessary for any previous feature.

PS:  I hope you appreciate that I didn't whine about microscopic
differences in the workalike examples' generated byte code ;-)

More information about the Python-ideas mailing list