
On Wed, Feb 28, 2018 at 2:47 PM, Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
I hope nobody will mind too much if I throw in my (relatively uninformed) 2c before some of the big guns respond.
Not at all! Everyone's contributions are welcomed. Even after the "big guns" respond, other voices are definitely worth hearing. (One small tip though: Responding in plain text is appreciated, as it means information about who said what is entirely copy-and-pasteable.)
First: Well done, Chris, for all the work on this. IMHO this could be a useful Python enhancement (and reduce the newsgroup churn :-)).
Thanks :) It's one of those PEPs that can be immensely useful even if it's rejected.
On 27/02/2018 22:27, Chris Angelico wrote: Programming is all about reusing code rather than duplicating it. When an expression needs to be used twice in quick succession but never again, it is convenient to assign it to a temporary name with very small scope. By permitting name bindings to exist within a single statement only, we make this both convenient and safe against collisions.
It may be pedantic of me (and it will produce a more pedantic-sounding sentence) but I honestly think that "safe against name collisions" is clearer than "safe against collisions", and that clarity matters.
Sure. I'm also aware that I'm using the same words over and over, but I can add that one.
Rationale =========
When an expression is used multiple times in a list comprehension, there are currently several suboptimal ways to spell this, and no truly good ways. A statement-local name allows any expression to be temporarily captured and then used multiple times.
IMHO the first sentence is a bit of an overstatement (though of course it's a big part of the PEP's "sell"). How about "... there are currently several ways to spell this, none of them ideal."
Hmm, I think I prefer the current wording, but maybe there's some other way to say it that's even better.
Also, given that a list comprehension is an expression, which in turn could be part of a larger expression, would it be appropriate to replace "expression" by "sub-expression" in the 2 places where it occurs in the above paragraph?
Thanks, done.
Syntax and semantics ====================
In any context where arbitrary Python expressions can be used, a named expression can appear. This must be parenthesized for clarity,
I agree, pro tem (not that I am claiming that my opinion counts for much). I'm personally somewhat allergic to making parentheses mandatory where they really don't need to be, but trying to think about where they could be unambiguously omitted makes my head spin. At least, if we run with this for now, then making them non-mandatory in some contexts, at some future time, won't lead to backwards incompatibility.
Yeah, definitely. There've been other times when a new piece of syntax is extra restrictive at first, and then gets opened up later. It's way easier than the alternative. (For the record, I had some trouble with this syntax at first, and was almost going to launch this PEP with a syntax of "( > expr as NAME)" to disambiguate. That was never the intention, though, and I'm grateful to the folks on core-mentorship for helping me get that sorted.)
Example usage =============
These list comprehensions are all approximately equivalent::
# Calling the function twice
# Calling the function twice (assuming that side effects can be ignored)
That assumption should be roughly inherent in the problem. If the call has no side effects and low cost, none of this is necessary - just repeat the expression.
stuff = [[f(x), f(x)] for x in range(5)]
# Helper function
# External helper function
Not a big deal either way, can toss in the extra word but I'm not really sure it's needed.
Please feel free to ignore this, but (trying to improve on the above example): # Using a generator: def gen(): for x in range(5): y = f(x) yield y,y stuff = list(gen())
I think it's unnecessary; the direct loop is entirely better IMO. Since the point of these examples is just to contrast against the proposal, it's no biggie if there are EVEN MORE ways (and I haven't even mentioned the steak knives!) to do something, unless they're actually better.
If calling `f(x)` is expensive or has side effects, the clean operation of the list comprehension gets muddled. Using a short-duration name binding retains the simplicity; while the extra `for` loop does achieve this, it does so at the cost of dividing the expression visually, putting the named part at the end of the comprehension instead of the beginning.
Maybe add to last sentence "and of adding (at least conceptually) extra steps: building a 1-element list, then extracting the first element"
That's precisely the point that Serhiy's optimization is aiming at, with the intention of making "for x in [expr]" a standard idiom for list comp assignment. If we assume that this does become standard, it won't add the extra steps, but it does still push that expression out to the far end of the comprehension, whereas a named subexpression places it at first use.
Open questions ==============
1. What happens if the name has already been used? `(x, (1 as x), x)` Currently, prior usage functions as if the named expression did not exist (following the usual lookup rules); the new name binding will shadow the other name from the point where it is evaluated until the end of the statement. Is this acceptable? Should it raise a syntax error or warning?
IMHO this is not only acceptable, but the (only) correct behaviour. Your crystal-clear statement "the new name binding will shadow the other name from the point where it is evaluated until the end of the statement " is critical and IMO what should happen.
Regular function-locals don't work that way, though: x = "global" def f(): print(x) x = "local" print(x) This won't print "global" followed by "local" - it'll bomb with UnboundLocalError. I do still think this is correct behaviour, though; the only other viable option is for the SLNB to fail if it's shadowing anything at all, and even that has its weird edge cases.
Perhaps an extra example or two, to clarify that *execution order* is what matters, might help, e.g. y if (f() as y) > 0 else None will work as expected, because "(f() as y)" is evaluated before the initial "y" is (if it is).
Unnecessary in the "open questions" section, but if this proves to be a point of confusion and I make a FAQ, then yeah, I could put in some examples like that.
[Parenthetical comment: Advanced use of this new feature would require knowledge of Python's evaluation order. But this is not an argument against the PEP, because the same could be said about almost any feature of Python, e.g. [ f(x), f(x) ] where evaluating f(x) has side effects.]
Yeah, people should have no problem figuring this out.
2. The current implementation [1] implements statement-local names using
a special (and mostly-invisible) name mangling. This works perfectly inside functions (including list comprehensions), but not at top level. Is this a serious limitation? Is it confusing?
It's great that it works perfectly in functions and list comprehensions, but it sounds as if, at top level, in rare circumstances it could produce a hard-to-track-down bug, which is not exactly desirable. It's hard to say more without knowing more details. As a stab in the dark, is it possible to avoid it by including the module name in the mangling? Sorry if I'm talking rubbish.
The problem is that it's all done through the special "cell" slots in a function's locals. To try to do that at module level would potentially mean polluting the global namespace, which could interfere with other functions and cause extreme confusion. Currently, attempting to use an SLNB at top level produces a bizarre UnboundLocalError, and I don't truly understand why. The disassembly shows the same name mangling that happens inside a function, but it doesn't get properly undone. But I'm sure there are many other implementation bugs too.
3. The interaction with locals() is currently[1] slightly buggy. Should statement-local names appear in locals() while they are active (and shadow any other names from the same function), or should they simply not appear?
IMHO this is an implementation detail. IMO you should have some idea what you're doing when you use locals(). But I think consistency matters - either the temporary variable *always* gets into locals() "from the point where it is evaluated until the end of the statement", or it *never* gets into locals(). (Possibly the language spec should specify one or the other - I'm not sure, time may tell.)
Yeah, and I would prefer the former, but that's still potentially confusing. Consider: y = "gg" def g(): x = 1 print(x, locals()) print((3 as x), x, locals()) print(y, (4 as y), y, locals()) print(x, locals()) del x print(locals()) Current output: 1 {} 3 3 {'x': 3} gg 4 4 {'y': 4} 1 {} {} Desired output: 1 {'x': 1} 3 3 {'x': 3} gg 4 4 {'x': 1, 'y': 4} 1 {'x': 1} {} Also acceptable (but depreferred) output: 1 {'x': 1} 3 3 {'x': 1} gg 4 4 {'x': 1} 1 {'x': 1} {} If the language spec mandates that "either this or that" happen, I'd be okay with that; it'd give other Pythons the option to implement this completely outside of locals() while still being broadly sane.
4. Syntactic confusion in `except` statements. 5. Similar confusion in `with` statements
This (4. and 5.) shows that we are using "as" in more than one sense, and in a perfect world we would use different keywords. But IMHO (admittedly, without having thought about it much) this isn't much of a problem. Again, perhaps some clarifying examples would help.
No, we want to keep using the same keywords - otherwise there are too many keywords in the language. The "except" case isn't a big deal IMO, but the "with" one is more serious, and the subtle difference between "with (x as y):" and "with x as y:" is sure to trip someone up. But maybe that's one for linters and code review.
Some pedantry:
One issue not so far explicitly mentioned: IMHO it should be perfectly legal to assign a value to a temporary variable, and then not use that temporary variable (just as it is legal to assign to a variable in a regular assignment statement, and then not use that variable) though linters should IMO point it out. E.g. you might want to modify (perhaps only temporarily) a = [ (f() as b), b ] to a = [ (f() as b), c ]
Yep, perfectly legal. Once linters learn that this is an assignment, they can flag this as "unused variable". Otherwise, it's not really hurting much.
Also (and I'm relying on "In any context where arbitrary Python expressions can be used, a named expression can appear." ), linters should also IMO point to a = (42 as b) which AFAICT is a laborious synonym for a = 42
Ditto - an unused variable. You could also write "a = b = 42" and then never use b.
And here's a thought: What are the semantics of a = (42 as a) # Of course a linter should point this out too At first I thought this was also a laborious synonym for "a=42". But then I re-read your statement (the one I described above as crystal-clear) and realised that its exact wording was even more critical than I had thought: "the new name binding will shadow the other name from the point where it is evaluated until the end of the statement" Note: "until the end of the statement". NOT "until the end of the expression". The distinction matters. If we take this as gospel, all this will do is create a temporary variable "a", assign the value 42 to it twice, then discard it. I.e. it effectively does nothing, slowly. Have I understood correctly? Very likely you have considered this and mean exactly what you say, but I am sure you will understand that I mean no offence by querying it.
Actually, that's a very good point, and I had to actually go and do that to confirm. You're correct that the "a =" part is also affected, but there may be more complicated edge cases. Disassembly can help track down what the compiler's actually doing:
def f(): ... a = 1 ... a = (2 as a) ... print(a) ... dis.dis(f) 2 0 LOAD_CONST 1 (1) 2 STORE_FAST 0 (a)
3 4 LOAD_CONST 2 (2) 6 DUP_TOP 8 STORE_FAST 1 (a) 10 STORE_FAST 1 (a) 12 DELETE_FAST 1 (a) 4 14 LOAD_GLOBAL 0 (print) 16 LOAD_FAST 0 (a) 18 CALL_FUNCTION 1 20 POP_TOP 22 LOAD_CONST 0 (None) 24 RETURN_VALUE If you're not familiar with the output of dis.dis(), the first column (largely blank) is line numbers in the source, the second is byte code offsets, and then we have the operation and its parameter (if any). The STORE_FAST and LOAD_FAST opcodes work with local names, which are identified by their indices; the first such operation sets slot 0 (named "a"), but the two that happen in line 3 (byte positions 8 and 10) are manipulating slot 1 (also named "a"). So you can see that line 3 never touches slot 0, and it is entirely operating within the SLNB scope. Identical byte code is produced from this function:
def f(): ... a = 1 ... b = (2 as b) ... print(a) ... dis.dis(f) 2 0 LOAD_CONST 1 (1) 2 STORE_FAST 0 (a)
3 4 LOAD_CONST 2 (2) 6 DUP_TOP 8 STORE_FAST 1 (b) 10 STORE_FAST 1 (b) 12 DELETE_FAST 1 (b) 4 14 LOAD_GLOBAL 0 (print) 16 LOAD_FAST 0 (a) 18 CALL_FUNCTION 1 20 POP_TOP 22 LOAD_CONST 0 (None) 24 RETURN_VALUE I love dis.dis(), it's such an awesome tool :) I'll push PEP changes based on your suggestions shortly. Am also going to add a "performance considerations" section, as features like this are potentially costly. Thanks for your input! ChrisA