[Python-ideas] PEP 572: Statement-Local Name Bindings

Chris Angelico rosuav at gmail.com
Wed Feb 28 00:23:13 EST 2018


On Wed, Feb 28, 2018 at 2:47 PM, Rob Cliffe via Python-ideas
<python-ideas at python.org> wrote:
> I hope nobody will mind too much if I throw in my (relatively uninformed) 2c
> before some of the big guns respond.

Not at all! Everyone's contributions are welcomed. Even after the "big
guns" respond, other voices are definitely worth hearing.

(One small tip though: Responding in plain text is appreciated, as it
means information about who said what is entirely copy-and-pasteable.)

> First: Well done, Chris, for all the work on this.  IMHO this could be a
> useful Python enhancement (and reduce the newsgroup churn :-)).

Thanks :) It's one of those PEPs that can be immensely useful even if
it's rejected.

> On 27/02/2018 22:27, Chris Angelico wrote:
> Programming is all about reusing code rather than duplicating it.  When
> an expression needs to be used twice in quick succession but never again,
> it is convenient to assign it to a temporary name with very small scope.
> By permitting name bindings to exist within a single statement only, we
> make this both convenient and safe against collisions.
>
> It may be pedantic of me (and it will produce a more pedantic-sounding
> sentence) but I honestly think that
> "safe against name collisions" is clearer than "safe against collisions",
> and that clarity matters.

Sure. I'm also aware that I'm using the same words over and over, but
I can add that one.

> Rationale
> =========
>
> When an expression is used multiple times in a list comprehension, there
> are currently several suboptimal ways to spell this, and no truly good
> ways. A statement-local name allows any expression to be temporarily
> captured and then used multiple times.
>
> IMHO the first sentence is a bit of an overstatement (though of course it's
> a big part of the PEP's "sell").
> How about "... there are currently several ways to spell this, none of them
> ideal."

Hmm, I think I prefer the current wording, but maybe there's some
other way to say it that's even better.

> Also, given that a list comprehension is an expression, which in turn could
> be part of a larger expression, would it be appropriate to replace
> "expression" by "sub-expression" in the 2 places where it occurs in the
> above paragraph?

Thanks, done.

> Syntax and semantics
> ====================
>
> In any context where arbitrary Python expressions can be used, a named
> expression can appear. This must be parenthesized for clarity,
>
> I agree, pro tem (not that I am claiming that my opinion counts for much).
> I'm personally somewhat allergic to making parentheses mandatory where they
> really don't need to be, but trying to think about where they could be
> unambiguously omitted makes my head spin.  At least, if we run with this for
> now, then making them non-mandatory in some contexts, at some future time,
> won't lead to backwards incompatibility.

Yeah, definitely. There've been other times when a new piece of syntax
is extra restrictive at first, and then gets opened up later. It's way
easier than the alternative.

(For the record, I had some trouble with this syntax at first, and was
almost going to launch this PEP with a syntax of "( > expr as NAME)"
to disambiguate. That was never the intention, though, and I'm
grateful to the folks on core-mentorship for helping me get that
sorted.)

> Example usage
> =============
>
> These list comprehensions are all approximately equivalent::
>
>     # Calling the function twice
>
>             # Calling the function twice (assuming that side effects can be
> ignored)

That assumption should be roughly inherent in the problem. If the call
has no side effects and low cost, none of this is necessary - just
repeat the expression.

>     stuff = [[f(x), f(x)] for x in range(5)]
>
>     # Helper function
>
>             # External helper function

Not a big deal either way, can toss in the extra word but I'm not
really sure it's needed.


> Please feel free to ignore this, but (trying to improve on the above
> example):
>             # Using a generator:
>             def gen():
>                 for x in range(5):
>                     y = f(x)
>                     yield y,y
>             stuff = list(gen())

I think it's unnecessary; the direct loop is entirely better IMO.
Since the point of these examples is just to contrast against the
proposal, it's no biggie if there are EVEN MORE ways (and I haven't
even mentioned the steak knives!) to do something, unless they're
actually better.

> If calling `f(x)` is expensive or has side effects, the clean operation of
> the list comprehension gets muddled. Using a short-duration name binding
> retains the simplicity; while the extra `for` loop does achieve this, it
> does so at the cost of dividing the expression visually, putting the named
> part at the end of the comprehension instead of the beginning.
>
> Maybe add to last sentence "and of adding (at least conceptually) extra
> steps: building a 1-element list, then extracting the first element"

That's precisely the point that Serhiy's optimization is aiming at,
with the intention of making "for x in [expr]" a standard idiom for
list comp assignment. If we assume that this does become standard, it
won't add the extra steps, but it does still push that expression out
to the far end of the comprehension, whereas a named subexpression
places it at first use.


> Open questions
> ==============
>
> 1. What happens if the name has already been used? `(x, (1 as x), x)`
>    Currently, prior usage functions as if the named expression did not
>    exist (following the usual lookup rules); the new name binding will
>    shadow the other name from the point where it is evaluated until the
>    end of the statement.  Is this acceptable?  Should it raise a syntax
>    error or warning?
>
> IMHO this is not only acceptable, but the (only) correct behaviour.  Your
> crystal-clear statement "the new name binding will shadow the other name
> from the point where it is evaluated until the end of the statement " is
> critical and IMO what should happen.

Regular function-locals don't work that way, though:

x = "global"
def f():
    print(x)
    x = "local"
    print(x)

This won't print "global" followed by "local" - it'll bomb with
UnboundLocalError. I do still think this is correct behaviour, though;
the only other viable option is for the SLNB to fail if it's shadowing
anything at all, and even that has its weird edge cases.

> Perhaps an extra example or two, to clarify that *execution order* is what
> matters, might help, e.g.
>     y if (f() as y) > 0 else None
> will work as expected, because "(f() as y)" is evaluated before the initial
> "y" is (if it is).

Unnecessary in the "open questions" section, but if this proves to be
a point of confusion and I make a FAQ, then yeah, I could put in some
examples like that.

> [Parenthetical comment: Advanced use of this new feature would require
> knowledge of Python's evaluation order.  But this is not an argument against
> the PEP, because the same could be said about almost any feature of Python,
> e.g.
>     [ f(x), f(x) ]
> where evaluating f(x) has side effects.]

Yeah, people should have no problem figuring this out.

> 2. The current implementation [1] implements statement-local names using
>
>    a special (and mostly-invisible) name mangling.  This works perfectly
>    inside functions (including list comprehensions), but not at top
>    level.  Is this a serious limitation?  Is it confusing?
>
> It's great that it works perfectly in functions and list comprehensions, but
> it sounds as if, at top level, in rare circumstances it could produce a
> hard-to-track-down bug, which is not exactly desirable.  It's hard to say
> more without knowing more details.  As a stab in the dark, is it possible to
> avoid it by including the module name in the mangling?  Sorry if I'm talking
> rubbish.

The problem is that it's all done through the special "cell" slots in
a function's locals. To try to do that at module level would
potentially mean polluting the global namespace, which could interfere
with other functions and cause extreme confusion.

Currently, attempting to use an SLNB at top level produces a bizarre
UnboundLocalError, and I don't truly understand why. The disassembly
shows the same name mangling that happens inside a function, but it
doesn't get properly undone. But I'm sure there are many other
implementation bugs too.

> 3. The interaction with locals() is currently[1] slightly buggy.  Should
>    statement-local names appear in locals() while they are active (and
>    shadow any other names from the same function), or should they simply
>    not appear?
>
> IMHO this is an implementation detail.  IMO you should have some idea what
> you're doing when you use locals().  But I think consistency matters -
> either the temporary variable *always* gets into locals() "from the point
> where it is evaluated until the end of the statement", or it *never* gets
> into locals().  (Possibly the language spec should specify one or the other
> - I'm not sure, time may tell.)

Yeah, and I would prefer the former, but that's still potentially
confusing. Consider:

y = "gg"
def g():
    x = 1
    print(x, locals())
    print((3 as x), x, locals())
    print(y, (4 as y), y, locals())
    print(x, locals())
    del x
    print(locals())

Current output:
1 {}
3 3 {'x': 3}
gg 4 4 {'y': 4}
1 {}
{}

Desired output:
1 {'x': 1}
3 3 {'x': 3}
gg 4 4 {'x': 1, 'y': 4}
1 {'x': 1}
{}

Also acceptable (but depreferred) output:
1 {'x': 1}
3 3 {'x': 1}
gg 4 4 {'x': 1}
1 {'x': 1}
{}

If the language spec mandates that "either this or that" happen, I'd
be okay with that; it'd give other Pythons the option to implement
this completely outside of locals() while still being broadly sane.

> 4. Syntactic confusion in `except` statements.
> 5. Similar confusion in `with` statements
>
> This (4. and 5.) shows that we are using "as" in more than one sense, and in
> a perfect world we would use different keywords.  But IMHO (admittedly,
> without having thought about it much) this isn't much of a problem.  Again,
> perhaps some clarifying examples would help.

No, we want to keep using the same keywords - otherwise there are too
many keywords in the language. The "except" case isn't a big deal IMO,
but the "with" one is more serious, and the subtle difference between
"with (x as y):" and "with x as y:" is sure to trip someone up. But
maybe that's one for linters and code review.

> Some pedantry:
>
>     One issue not so far explicitly mentioned: IMHO it should be perfectly
> legal to assign a value to a temporary variable, and then not use that
> temporary variable (just as it is legal to assign to a variable in a regular
> assignment statement, and then not use that variable) though linters should
> IMO point it out.  E.g. you might want to modify (perhaps only temporarily)
>         a = [ (f() as b), b ]
> to
>         a = [ (f() as b), c ]

Yep, perfectly legal. Once linters learn that this is an assignment,
they can flag this as "unused variable". Otherwise, it's not really
hurting much.

> Also (and I'm relying on "In any context where arbitrary Python expressions
> can be used, a named expression can appear." ),
> linters should also IMO point to
>     a = (42 as b)
> which AFAICT is a laborious synonym for
>     a = 42

Ditto - an unused variable. You could also write "a = b = 42" and then
never use b.

> And here's a thought: What are the semantics of
>     a = (42 as a) # Of course a linter should point this out too
> At first I thought this was also a laborious synonym for "a=42".  But then I
> re-read your statement (the one I described above as crystal-clear) and
> realised that its exact wording was even more critical than I had thought:
>     "the new name binding will shadow the other name from the point where it
> is evaluated until the end of the statement"
> Note: "until the end of the statement".  NOT "until the end of the
> expression".  The distinction matters.
> If we take this as gospel, all this will do is create a temporary variable
> "a", assign the value 42 to it twice, then discard it.  I.e. it effectively
> does nothing, slowly.
> Have I understood correctly?  Very likely you have considered this and mean
> exactly what you say, but I am sure you will understand that I mean no
> offence by querying it.

Actually, that's a very good point, and I had to actually go and do
that to confirm. You're correct that the "a =" part is also affected,
but there may be more complicated edge cases. Disassembly can help
track down what the compiler's actually doing:

>>> def f():
...     a = 1
...     a = (2 as a)
...     print(a)
...
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (a)

  3           4 LOAD_CONST               2 (2)
              6 DUP_TOP
              8 STORE_FAST               1 (a)
             10 STORE_FAST               1 (a)
             12 DELETE_FAST              1 (a)

  4          14 LOAD_GLOBAL              0 (print)
             16 LOAD_FAST                0 (a)
             18 CALL_FUNCTION            1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

If you're not familiar with the output of dis.dis(), the first column
(largely blank) is line numbers in the source, the second is byte code
offsets, and then we have the operation and its parameter (if any).
The STORE_FAST and LOAD_FAST opcodes work with local names, which are
identified by their indices; the first such operation sets slot 0
(named "a"), but the two that happen in line 3 (byte positions 8 and
10) are manipulating slot 1 (also named "a"). So you can see that line
3 never touches slot 0, and it is entirely operating within the SLNB
scope. Identical byte code is produced from this function:

>>> def f():
...     a = 1
...     b = (2 as b)
...     print(a)
...
>>> dis.dis(f)
  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (a)

  3           4 LOAD_CONST               2 (2)
              6 DUP_TOP
              8 STORE_FAST               1 (b)
             10 STORE_FAST               1 (b)
             12 DELETE_FAST              1 (b)

  4          14 LOAD_GLOBAL              0 (print)
             16 LOAD_FAST                0 (a)
             18 CALL_FUNCTION            1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

I love dis.dis(), it's such an awesome tool :)

I'll push PEP changes based on your suggestions shortly. Am also going
to add a "performance considerations" section, as features like this
are potentially costly.

Thanks for your input!

ChrisA


More information about the Python-ideas mailing list