[Python-ideas] PEP 572: Statement-Local Name Bindings, take three!

Chris Angelico rosuav at gmail.com
Fri Mar 23 14:09:54 EDT 2018


On Sat, Mar 24, 2018 at 2:00 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, Mar 23, 2018 at 09:01:01PM +1100, Chris Angelico wrote:
>
>> PEP: 572
>> Title: Syntax for Statement-Local Name Bindings
> [...]
>> Abstract
>> ========
>>
>> Programming is all about reusing code rather than duplicating it.
>
> I don't think that editorial comment belongs here, or at least, it is
> way too strong. I'm pretty sure that programming is not ALL about reusing code,
> and code duplication is not always wrong.
>
> Rather, we can say that *often* we want to avoid code duplication, and
> this proposal is way way to do so. And this should go into the
> Rationale, not the Abstract. The abstract should describe what this
> proposal *does*, not why, for example:
>
>     This is a proposal for permitting temporary name bindings
>     which are limited to a single statement.
>
> What the proposal *is* goes in the Abstract; reasons *why* we want it
> go in the Rationale.

Thanks. I've never really been happy with my "Abstract" / "Rationale"
split, as they're two sections both designed to give that initial
'sell', and I'm clearly not good at writing the distinction :)

Unless you object, I'm just going to steal your Abstract wholesale.
Seems like some good words there.

> I see you haven't mentioned anything about Nick Coglan's (long ago)
> concept of a "where" block. If memory serves, it would be something
> like:
>
>     value = x**2 + 2*x where:
>         x = some expression
>
> These are not necessarily competing, but they are relevant.

Definitely relevant, thanks. This is exactly what I'm looking for -
related proposals that got lost in the lengthy threads on the subject.
I'll mention it as another proposal, but if anyone has an actual post
for me to reference, that would be appreciated (just to make sure I'm
correctly representing it).

> Nor have you done a review of any other languages, to see what similar
> features they already offer. Not even the C's form of "assignment as an
> expression" -- you should refer to that, and explain why this would not
> similarly be a bug magnet.

No, I haven't yet. Sounds like a new section is needed. Thing is,
there's a HUGE family of C-like and C-inspired languages that allow
assignment expressions, and for the rest, I don't have any personal
experience. So I need input from people: what languages do you know of
that have small-scope name bindings like this?

>> Rationale
>> =========
>>
>> When a subexpression is used multiple times in a list comprehension,
>
> I think that list comps are merely a single concrete example of a more
> general concept that we sometimes want or need to apply the DRY
> principle to a single expression.
>
> This is (usually) a violation of DRY whether it is inside or outside of
> a list comp:
>
>     result = (func(x), func(x)+1, func(x)*2)

True, but outside of comprehensions, the most obvious response is
"just add another assignment statement". You can't do that in a list
comp (or equivalently in a genexp or dict comp). Syntactically you're
right that they're just one example of a general concept; but they're
one of the original motivating reasons. I've tweaked the rationale
wording some; the idea is now "here's a general idea" followed by two
paragraphs of specific use-cases (comprehensions and loops). Let me
know if that works better.

>> Syntax and semantics
>> ====================
>>
>> In any context where arbitrary Python expressions can be used, a **named
>> expression** can appear. This must be parenthesized for clarity, and is of
>> the form ``(expr as NAME)`` where ``expr`` is any valid Python expression,
>> and ``NAME`` is a simple name.
>>
>> The value of such a named expression is the same as the incorporated
>> expression, with the additional side-effect that NAME is bound to that
>> value for the remainder of the current statement.
>
>
> Examples should go with the description. Such as:
>
>     x = None if (spam().ham as eggs) is None else eggs

Not sure what you gain out of that :) Maybe a different first
expression would help.

>     y = ((spam() as eggs), (eggs.method() as cheese), cheese[eggs])

Sure. I may need to get some simpler examples to kick things off though.

>> Just as function-local names shadow global names for the scope of the
>> function, statement-local names shadow other names for that statement.
>> (They can technically also shadow each other, though actually doing this
>> should not be encouraged.)
>
> That seems weird.

Which part? That they shadow, or that they can shadow each other?
Shadowing is the same as nested functions (including comprehensions,
since they're implemented with functions); and if SLNBs are *not* to
shadow each other, the only way is to straight-up disallow it. For the
moment, I'm not forbidding it, as there's no particular advantage to
popping a SyntaxError.

>> Assignment to statement-local names is ONLY through this syntax. Regular
>> assignment to the same name will remove the statement-local name and
>> affect the name in the surrounding scope (function, class, or module).
>
> That seems unnecessary. Since the scope only applies to a single
> statement, not a block, there can be no other assignment to that name.
>
> Correction: I see further in that this isn't the case. But that's deeply
> confusing, to have the same name refer to two (or more!) scopes in the
> same block. I think that's going to lead to some really confusing
> scoping problems.

For the current proposal, I prefer simpler definitions to outlawing
the odd options. The rule is: An SLNB exists from the moment it's
created to the end of that statement. Very simple, very
straight-forward. Yes, that means you could use the same name earlier
in the statement, but ideally, you just wouldn't do that.

Python already has weirder behaviour in it.

>>> def f():
...     e = 2.71828
...     try:
...         1/0
...     except Exception as e:
...         print(e)
...     print(e)
...
>>> f()
division by zero
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in f
UnboundLocalError: local variable 'e' referenced before assignment

Does this often cause problems? No, because most functions don't use
the same name in two different ways. An SLNB should be basically the
same.

>> Statement-local names never appear in locals() or globals(), and cannot be
>> closed over by nested functions.
>
> Why can they not be used in closures? I expect that's going to cause a
> lot of frustration.

Conceptually, the variable stops existing at the end of that
statement. It makes for some oddities, but fewer oddities than every
other variant that I toyed with. For example, does this create one
single temporary or many different temporaries?

def f():
    x = "outer"
    funcs = {}
    for i in range(10):
        if (g(i) as x) > 0:
            def closure():
                return x
            funcs[x] = closure

Obviously the 'x' in funcs[x] is the current version of x as it runs
through the loop. But what about the one closed over? If regular
assignment is used ("x = g(i)"), the last value of x will be seen by
every function. With a statement-local variable, should it be a single
temporary all through the loop, or should each iteration create a
brand new "slot" that gets closed over? If the latter, why is it
different from regular assignment, and how would it be implemented
anyway? Do we now need an infinite number of closure cells that all
have the exact same name?

>> Execution order and its consequences
>> ------------------------------------
>>
>> Since the statement-local name binding lasts from its point of execution
>> to the end of the current statement, this can potentially cause confusion
>> when the actual order of execution does not match the programmer's
>> expectations. Some examples::
>>
>>     # A simple statement ends at the newline or semicolon.
>>     a = (1 as y)
>>     print(y) # NameError
>
> That error surprises me. Every other use of "as" binds to the
> current local namespace. (Or global, if you use the global
> declaration first.)
>
> I think there's going to be a lot of confusion about which uses of "as"
> bind to a new local and which don't.

That's the exact point of "statement-local" though.

> I think this proposal is conflating two unrelated concepts:
>
> - introducing new variables in order to meet DRY requirements;
>
> - introducing a new scope.
>
> Why can't we do the first without the second?
>
>     a = (1 as y)
>     print(y)  # prints 1, as other uses of "as" would do
>
>
> That would avoid the unnecessary (IMO) restriction that these variables
> cannot be used in closures.

You're talking about one of the alternate proposals there. (#6,
currently.) I have talked about the possibility of splitting this into
two separate proposals, but then I'd have to try to chair two separate
concurrent discussions that would constantly interact and cross over
:)

>>     # The assignment ignores the SLNB - this adds one to 'a'
>>     a = (a + 1 as a)
>
> "SLNB"? Undefined acronym. What is it? I presume it has something to do
> with the single-statement variable.

Statement-Local Name Binding, from the title of the PEP. (But people
probably don't read titles.)

> I know it would be legal, but why would you write something like that?
> Surely your examples must at least have a pretence of being useful (even
> if the examples are only toy examples rather than realistic).

That section is about the edge cases, and one such edge case is
assigning through an SLNB.

> I think that having "a" be both local and single-statement in the same
> expression is an awful idea. Lua has the (mis-)features that
> variables are global by default, locals need to be declared, and the
> same variable name can refer to both local and global simultaneously.
> Thus we have:
>
>     print(x)  # prints the global x
>     local x = x + 1  # sets local x to the global x plus 1
>     print(x)  # prints the local x
>
> https://www.lua.org/pil/4.2.html

IMO that's a *good* thing. JavaScript works the other way; either you
say "var x = x + 1;" and the variable exists for the whole function,
pre-initialized to the special value 'undefined', or you say "let x =
x + 1;" and the variable is in limbo until you hit that statement,
causing a ReferenceError (JS's version of NameError). Neither makes as
much sense as evaluating the initializer before the variable starts to
exist.

That said, though, this is STILL an edge case. It's giving a
somewhat-sane meaning to something you normally won't do.

> This idea of local + single-statement names in the same expression
> strikes me as similar. Having that same sort of thing happening within a
> single statement gives me a headache:
>
>     spam = (spam, ((spam + spam as spam) + spam as spam), spam)
>
> Explain that, if you will.

Sure. First, eliminate all the name bindings:

spam = (spam, ((spam + spam) + spam), spam)

Okay. Now anyone with basic understanding of algebra can figure out
the execution order. Then every time you have a construct with an
'as', you change the value of 'spam' from that point on.

Which means we have:

spam0 = (spam0, ((spam0 + spam0 as spam1) + spam1 as spam2), spam2)

Execution order is strictly left-to-right here, so it's pretty
straight-forward. Less clear if you have an if/else expression (since
they're executed middle-out instead of left-to-right), but SLNBs are
just like any other side effects in an expression, performed in a
well-defined order. And just like with side effects, you don't want to
have complex interactions between them, but there's nothing illegal in
it.

>>     # Compound statements usually enclose everything...
>>     if (re.match(...) as m):
>>         print(m.groups(0))
>>     print(m) # NameError
>
> Ah, how surprising -- given the tone of this PEP, I honestly thought
> that it only applied to a single statement, not compound statements.
>
> You should mention this much earlier.

Hmm. It's right up in the Rationale section, but without an example.
Maybe an example would make it clearer?

>>     # ... except when function bodies are involved...
>>     if (input("> ") as cmd):
>>         def run_cmd():
>>             print("Running command", cmd) # NameError
>
> Such a special case is a violation of the Principle of Least Surprise.

Blame classes, which already do this. Exactly this. Being able to
close over temporaries creates its own problems.

>>     # ... but function *headers* are executed immediately
>>     if (input("> ") as cmd):
>>         def run_cmd(cmd=cmd): # Capture the value in the default arg
>>             print("Running command", cmd) # Works
>>
>> Function bodies, in this respect, behave the same way they do in class scope;
>> assigned names are not closed over by method definitions. Defining a function
>> inside a loop already has potentially-confusing consequences, and SLNBs do not
>> materially worsen the existing situation.
>
> Except by adding more complications to make it even harder to
> understand the scoping rules.

Except that I'm adding no complications. This is just the consequences
of Python's *existing* scoping rules.

>> Differences from regular assignment statements
>> ----------------------------------------------
>>
>> Using ``(EXPR as NAME)`` is similar to ``NAME = EXPR``, but has a number of
>> important distinctions.
>>
>> * Assignment is a statement; an SLNB is an expression whose value is the same
>>   as the object bound to the new name.
>> * SLNBs disappear at the end of their enclosing statement, at which point the
>>   name again refers to whatever it previously would have.  SLNBs can thus
>>   shadow other names without conflict (although deliberately doing so will
>>   often be a sign of bad code).
>
> Why choose this design over binding to a local variable? What benefit is
> there to using yet another scope?

Mainly, I just know that there has been a lot of backlash against a
generic "assignment as expression" syntax in the past.

>> * SLNBs do not appear in ``locals()`` or ``globals()``.
>
> That is like non-locals, so I suppose that's not unprecedented.
>
> Will there be a function slnbs() to retrieve these?

Not in the current proposal, no. Originally, I planned for them to
appear in locals() while they were in scope, but that created its own
problems; I'd be happy to return to that proposal if it were
worthwhile.

>> * An SLNB cannot be the target of any form of assignment, including augmented.
>>   Attempting to do so will remove the SLNB and assign to the fully-scoped name.
>
> What's the justification for this limitation?

Not having that limitation creates worse problems, like that having
"(1 as a)" somewhere can suddenly make an assignment fail. This is
particularly notable with loop headers rather than simple statements.

>> Example usage
>> =============
>>
>> These list comprehensions are all approximately equivalent::
> [...]
>
> I don't think you need to give an exhaustive list of every way to write
> a list comp. List comps are only a single use-case for this feature.
>
>
>>     # See, for instance, Lib/pydoc.py
>>     if (re.search(pat, text) as match):
>>         print("Found:", match.group(0))
>
> I do not believe that is actually code found in Lib/pydoc.py, since that
> will be a syntax error. What are you trying to say here?

Lib/pydoc.py has a more complicated version of the exact same
functionality. This would be a simplification of a common idiom that
can be found in the stdlib and elsewhere.

>>     while (sock.read() as data):
>>         print("Received data:", data)
>
> Looking at that example, I wonder why we need to include the parens when
> there is no ambiguity.
>
> # okay
> while sock.read() as data:
>     print("Received data:", data)
>
> # needs parentheses
> while (spam.method() as eggs) is None or eggs.count() < 100:
>     print("something")

I agree, but starting with them mandatory allows for future relaxation
of requirements. The changes to the grammar are less intrusive if the
parens are always required (for instance, the special case "f(x for x
in y)" has its own entry in the grammar).

>> Performance costs
>> =================
>>
>> The cost of SLNBs must be kept to a minimum, particularly when they are not
>> used; the normal case MUST NOT be measurably penalized.
>
> What is the "normal case"?

The case where you're not using any SLNBs.

> It takes time, even if only a nanosecond, to bind a value to a
> name, as opposed to *not* binding it to a name.
>
>     x = (spam as eggs)
>
> has to be more expensive than
>
>     x = spam
>
> because the first performs two name bindings rather than one. So "MUST
> NOT" already implies this proposal *must* be rejected. Perhaps you mean
> that there SHOULD NOT be a SIGNIFICANT performance penalty.

The mere fact that this feature exists in the language MUST NOT
measurably impact Python run-time performance.

>> SLNBs are expected to be uncommon,
>
> On what basis do you expect this?
>
> Me, I'm cynical about my fellow coders, because I've worked with them
> and read their code *wink* and I expect they'll use this everywhere
> "just in case" and "to avoid namespace pollution".

Compared to regular name bindings? Just look at the number of ways to
assign that are NOT statement-local, and then add in the fact that
SLNBs aren't going to be effective for anything that you need to
mutate more than once, and I fully expect that regular name bindings
will far exceed SLNBs.

> Besides, I think that the while loop example is a really nice one. I'd
> use that, I think. I *almost* think that it alone justifies the
> exercise.

Hmm, okay. I'll work on rewording that section later.

>> Forbidden special cases
>> =======================
>>
>> In two situations, the use of SLNBs makes no sense, and could be confusing due
>> to the ``as`` keyword already having a different meaning in the same context.
>
> I'm pretty sure there are many more than just two situations where the
> use of this makes no sense. Many of your examples perform an unnecessary
> name binding that is then never used. I think that's going to encourage
> programmers to do the same, especially when they read this PEP and think
> your examples are "Best Practice".

Unnecessary, yes, but not downright problematic. The two specific
cases mentioned are (a) evaluating expressions, and (b) using the 'as'
keyword in a way that's incompatible with PEP 572. (There's no
confusion in "import x as y", for instance, because "x" is not an
expression.)

> Besides, in principle they could be useful (at least in contrived
> examples). Emember that exceptions are not necessarily constants. They
> can be computed at runtime:
>
> try:
>     ...
> except (Errors[key], spam(Errors[key]):
>     ...

Sure they *can*. Have you ever seen something like that in production?
I've seen simple examples (eg having a tuple of exception types that
you care about, and that tuple not always being constant), but nothing
where you could ever want an SLNB.

> Since we have a DRY-violation in Errors[key] twice, it is conceivable
> that we could write:
>
> try:
>     ...
> except ((Errors[key] as my_error), spam(my_error)):
>     ...
>
> Contrived? Sure. But I think it makes sense.
>
> Perhaps a better argument is that it may be ambiguous with existing
> syntax, in which case the ambiguous cases should be banned.

It's not *technically* ambiguous, because PEP 572 demands parentheses
and both 'except' and 'with' statements forbid parentheses. The
compiler can, with 100% accuracy, pick between the two alternatives.
But having "except X as Y:" mean something drastically different from
"except (X as Y):" is confusing *to humans*.

>> 2. ``with NAME = EXPR``::
>>
>>        stuff = [(y, x/y) with y = f(x) for x in range(5)]
>
> This is the same proposal as above, just using a different keyword.

Yep. I've changed the heading to "Alternative proposals and variants"
as some of them are merely variations on each other. They're given
separate entries because I have separate commentary about them.

>> 6. Allowing ``(EXPR as NAME)`` to assign to any form of name.
>
> And this would be a second proposal.
>
>>    This is exactly the same as the promoted proposal, save that the name is
>>    bound in the same scope that it would otherwise have. Any expression can
>>    assign to any name, just as it would if the ``=`` operator had been used.
>>    Such variables would leak out of the statement into the enclosing function,
>>    subject to the regular behaviour of comprehensions (since they implicitly
>>    create a nested function, the name binding would be restricted to the
>>    comprehension itself, just as with the names bound by ``for`` loops).
>
> Indeed. Why are you rejecting this in favour of combining name-binding +
> new scope into a single syntax?
>

Mainly because there's been a lot of backlash against regular
assignment inside expressions. One thing I *have* learned from life is
that you can't make everyone happy. Sometimes, "why isn't your
proposal X instead of Y" is just "well, X is a valid proposal too, so
you can go ahead and push for that one if you like". :) I had to pick
something, and I picked that one.

ChrisA


More information about the Python-ideas mailing list