Re: [Python-ideas] If branch merging
data:image/s3,"s3://crabby-images/c27a2/c27a2544dfccddb16af23e96cca25f6f5e53cf2e" alt=""
Thank you all for your responses. I didn't realize how much support this mailing list had. In response to several responses: It appears I have hit a soft spot with the 'as' keyword. It seems clear to me that inlining an assignment confuses scope. With any inline solution, that confusion will exist. Now, I will say that I do not like 'if aisb = a == b' because of the potential errors, as others have mentioned. A language should be written as much for the beginners as the experts, or it will never live very long. Avoiding absentminded mistakes is always good to do. There are many other possible solutions from a comma, as in "if a == b, aisb:", to a custom language addition of a new keyword or operator. Irregardless of how inline assignment is written, the scope issue will still exist. As such, it is more important to decide if it is needed first. The fact that this idea has been brought up before means that it deserves some research. Perhaps I can do some analytics and return with more info on where it could be used and if it will actually provide any speed benefits. Ok, that was a bit of a shotgun response to many remarks. Hopefully it will suffice. Thanks again for all the feedback. I would now like to respond to Steven's response directly: On Sat, Jun 6, 2015 at 11:19 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jun 06, 2015 at 09:03:38PM -0600, Cory Beutler wrote:
[...]
This would simplify some logic expressions by allowing the merging of branched code.
*Examples of use:* *Duplicate code in if-chains may be reduced:* # Old Version if a == b: print ('a == b') foo() # <-- duplicate code elif b == c: print ('b == c') foo() # <-- duplicate code elif c == d: print ('c == d') foo() # <-- duplicate code
if a == b: print('a == b') elif b == c: print('b == c') elif c == d: print('c == d') foo()
No new syntax required.
The functionally is not the same. In your example 'foo' gets called even if none of the conditions are true. The above example only runs 'foo' if it enters one of the if-blocks. This basic layout is useful for various parsing and reading operations. It is nice to check if something fits various conditions, then after the specific handling, add on finishing details in a 'foo'-like context.
*Many nested 'if' statements could now be a more linear style:* # Old Version if a == b: print ('a == b') if b == c: print ('b == c') print ('end if')
What's wrong with that code? Nesting the code like that follows the logic of the code: the b==c test *only* occurs if a==b.
# New Version if a == b: print ('a == b') alif b == c: print ('b == c') also: print ('end if')
I consider this significantly worse. It isn't clear that the comparison between b and c is only made if a == b, otherwise it is entirely skipped.
It may only be worse because you are not used to reading it. This type of syntax looks simple once you know how the pieces work. I mean, you know that having multiple if-elif statements will result in only checking conditions until one passes. The 'also' mentality would be the same, but backwards.
One limitation of the 'also' and 'alif' keywords is the restriction to
"all of the above" checking. What I mean by that is that there is no way to pick and choose which branches to merge back together. When using 'also' and 'alif' you are catching all previous if-branches. One easy way to solve this would be to allow for named branching. The most simple way to do
*Selective Branch merging:* the this
is to save the conditions of each branch into a variable with a name. Here is an example of merging only select branches together: # Old Version if a == b: print ('a == b') elif a == c: print ('a == c') elif a == d: print ('a == d') if (a == b) or (a == d): print ('a == b and a == d')
That code is wrong. Was that an intentional error? The final branch prints that a == b == d, but that's not correct, it runs when either a == b or a == d, not just when both are true.
Yeah, that was a mistype. That is why I shouldn't program fake code late at night. It does warm my heart to see that 2 people have corrected my fake code. That means it is easy to learn and understand.
Personally, I would write that as:
if a == b or a == d: if a == b: print('a == b') else: print('a == d') print('a == b or a == d') elif a == c: print('a == c')
You do end up comparing a and b for equality twice, but worrying about that is likely to be premature optimization. It isn't worth adding syntax to the language just for the one time in a million that actually matters.
With that rearrangement, you could write it with 'also': if a == b: print('a == b') elif a == d: print('a == d') also: print('a == b or a == d') elif a == c: print('a == c') but that does not demonstrate the selective branch merging. It would not work so well to combine the next two 'elif' statements, as any other 'also' blocks would capture the 'a == b', 'a == d', and first 'also' branches. I guess what I mean to say is that this example is a little too dumbed down. I think you know what I am after, though. If 'a == b' is a heavy duty calculation, it would be nice to be able to store that inline. Thank you, Steven, for your objective view of things. It is has been useful to see an outside perspective. I look forward to your future input.
data:image/s3,"s3://crabby-images/ae99c/ae99c83a5503af3a14f5b60dbc2d4fde946fec97" alt=""
On Sun, Jun 7, 2015, at 21:06, Cory Beutler wrote:
Thank you all for your responses. I didn't realize how much support this mailing list had.
In response to several responses:
It appears I have hit a soft spot with the 'as' keyword.
I don't have an issue with the as keyword, I was just pointing out that it disguises the fact that what you're really asking for seems to be general assignment expressions, since there is no particular rationale to constrain it to the boolean condition of if statements.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 Jun 2015 11:07, "Cory Beutler" <cgbeutler@gmail.com> wrote:
Thank you all for your responses. I didn't realize how much support this
mailing list had.
In response to several responses:
It appears I have hit a soft spot with the 'as' keyword. It seems clear
Avoiding absentminded mistakes is always good to do. There are many other
to me that inlining an assignment confuses scope. With any inline solution, that confusion will exist. Not really, as we have a number of inline assignment and renaming constructs, and they all use "as" (import, with statements, exception handlers). For loops, function definitions and class definitions also help establish the behaviour of name bindings in compound statement header lines affecting the containing scope rather than only affecting the internal suite. The exception handler case is the odd one out, since that includes an implied "del" whenever execution leaves the cobtained suite. Any form of inline assignment that doesn't use "as NAME" will need a good justification. (It's also worth noting that "as" clauses are specifically for binding to a name, while the LHS of an assignment statement allows attributes, indexing, slicing and tuple unpacking) possible solutions from a comma, as in "if a == b, aisb:", to a custom language addition of a new keyword or operator. Commas are generally out, due to the ambiguity with tuple construction.
Irregardless of how inline assignment is written, the scope issue will still exist. As such, it is more important to decide if it is needed first. The fact that this idea has been brought up before means that it deserves some research. Perhaps I can do some analytics and return with more info on where it could be used and if it will actually provide any speed benefits.
In this particular case, the variant that has always seemed most attractive to me in past discussions is a general purpose "named subexpression" construct that's just a normal local name binding operation affecting whatever namespace the expression is executed in. In the simple if statement case, it wouldn't be much different from having a separate assignment statement before the if statement, but in a while loop it would be executed on each iteration, in an elif it could make the results of subcalculations available to subsequent elif clauses without additional nesting, and in the conditional expression and comprehension cases it could make part of the condition calculation available to the result calculation. It would certainly be possible for folks to go overboard with such a construct and jam way too much into a single expression for it to be readable, but that's already the case today, and the way to handle it would remain the same: refactoring the relevant code to make it easier for readers to follow and hence maintain. Cheers, Nick.
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Nick Coghlan writes:
(It's also worth noting that "as" clauses are specifically for binding to a name, while the LHS of an assignment statement allows attributes, indexing, slicing and tuple unpacking)
+1 (and the point that it's a *binding*, not an assignment, deserves a lot more than a parenthesized aside).
In this particular case, the variant that has always seemed most attractive to me in past discussions is a general purpose "named subexpression" construct that's just a normal local name binding operation affecting whatever namespace the expression is executed in.
Yes, please!
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jun 07, 2015 at 07:06:42PM -0600, Cory Beutler wrote: [...]
The functionally is not the same. In your example 'foo' gets called even if none of the conditions are true. The above example only runs 'foo' if it enters one of the if-blocks.
Ah yes, of course you are correct.
*Many nested 'if' statements could now be a more linear style:* # Old Version if a == b: print ('a == b') if b == c: print ('b == c') print ('end if')
What's wrong with that code? Nesting the code like that follows the logic of the code: the b==c test *only* occurs if a==b.
# New Version if a == b: print ('a == b') alif b == c: print ('b == c') also: print ('end if')
I consider this significantly worse. It isn't clear that the comparison between b and c is only made if a == b, otherwise it is entirely skipped.
It may only be worse because you are not used to reading it. This type of syntax looks simple once you know how the pieces work. I mean, you know that having multiple if-elif statements will result in only checking conditions until one passes. The 'also' mentality would be the same, but backwards.
In the first case, your b==c test only occurs if a==b, which can be easily seen from the structure of the code: if a == b: everything here occurs only when a == b including the b == c test In the second case, there is no hint from the structure: if a == b: ... alif b == c: As you read down the left hand column, you see "if a == b" and you can mentally say "that block only occurs if a == b" and move on. But when you get to the alif block, you have to stop reading forward and go back up to understand whether it runs or not. It's not like elif, which is uneffected by any previous if or elif clauses. Each if/elif clause is independent. The test is always made (assuming execution reaches that line of code at all), and you can decide whether the block is entered or not by looking at the if/elif line alone: ... elif some_condition(): block Here, nothing above the "elif" line matters. If I reach that line, some_condition() *must* be evaluated, and the block entered if it evaluates to a truthy value. It's easy to understand. But: ... alif some_condition(): block I cannot even tell whether some_condition() is called or not. The structure gives no hint as to whether the alif line is reachable. It looks like it is at the same semantic level as the distant "if" line somewhere far above it, but it isn't. Whether it runs or not is dependent on the distant "if" and "elif" lines above it. By it's nature, this cannot be simple, since it introduces coupling between the alif line you are reading and one or more distant lines above it, while disguising the structure of the code by aligning the alif with the if even though it is conceptually part of the if block. -- Steve
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 7, 2015, at 19:18, Steven D'Aprano <steve@pearwood.info> wrote:
As you read down the left hand column, you see "if a == b" and you can mentally say "that block only occurs if a == b" and move on. But when you get to the alif block, you have to stop reading forward and go back up to understand whether it runs or not.
Thanks for putting it this way. I knew there was a more fundamental problem, but I couldn't see it until your message. The proposal is closely analogous to trying to define a Boolean predicate in a list GUI instead of a tree. And that means it has the exact same problems that the early MS Office and Visual C++ Find in File dialogs had. Besides the obvious fact that mixing conjunctions and disjunctions without grouping (via nesting) is insufficiently powerful for many real-life predicates (which is exactly why the proposal needs the assignment-like add-on), even in the simple cases where it works, it's not readable (which is why the examples had at least one mistake, and at least one person misread one of the other examples). If your eye has to travel back upwards to the last also, but the alsos are flush against the left with the elifs instead of nested differently, you have to make an effort to parse each clause in your head, which is not true for a flat chain of elifs. At any rate, as two people (I think Stephen and Nick) suggested, the second half of the proposal (the as-like binding) nearly eliminates the need for the first half, and doesn't have the same problem. The biggest problem it has is that you want the same syntax in other places besides if conditions, which is a better problem to have.
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 06/07/2015 07:18 PM, Steven D'Aprano wrote:
It's not like elif, which is uneffected by any previous if or elif clauses. Each if/elif clause is independent.
This is simply not true: each "elif" encountered is only evaluated if all the previous if/elif lines failed, so you have to pay attention to those previous lines to know if execution will even get this far.
The test is always made (assuming execution reaches that line of code at all),
Exactly. -- ~Ethan~
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Cory Beutler writes:
It may only be worse because you are not used to reading it. This type of syntax looks simple once you know how the pieces work. I mean, you know that having multiple if-elif statements will result in only checking conditions until one passes. The 'also' mentality would be the same, but backwards.
And that inversion is what underlies Steven's point, I think. I see your point, *but only if 'elif' goes away*. Currently the "hangindent" formatting of if ... elif ... else signals a series of alternatives, as similar formatting does (as a convention, rather than syntax) in many other languages. This makes scanning either actions or conditions fairly easy; you don't have to actually read the "elif"s to understand the alternative structure. With also and alif, you now have to not only read the keywords, you have to parse the code to determine what conditions are actually in force. This is definitely a readability minus, a big one. It doesn't help that "else" and "also" and "elif" and "alif" are rather visually confusable pairs, but at this point that's a bikeshed painting issue (except that as proponent you might want to paint it a different color for presentation). There's also the "dangling also" issue: I would suppose that also has all the problems of "dangling else", and some new ones besides. For example, since "elif" really is "else if" (not a C-like "case"), it's easy to imagine situations where you'd like to have one also or alif for the first three cases, and one for the next two, etc. Python, being a language for grownups, could always add a convention that you should generally only use also and alif at the end of an if ... elif ... else series or something like that, but I think that would seriously impair the usefulness of these constructs. I'm definitely -1 on the also, alif syntax at this point. On the other hand, having done a lot of C programming in my misspent youth, I do miss anaphoric conditionals, so I too would like to see the possibility of "if cond as var: do_something_with_var" explored. Of course Nick is right that automatic common subexpression elimination (CSE) is the big win, but manual CSE can improve readability.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I'm definitely -1 on the also, alif syntax at this point. On the other hand, having done a lot of C programming in my misspent youth, I do miss anaphoric conditionals, so I too would like to see the possibility of "if cond as var: do_something_with_var" explored. Of course Nick is right that automatic common subexpression elimination (CSE) is the big win, but manual CSE can improve readability.
Part of the trouble with depending on CSE is that Python is so dynamic that you can't depend on things having no side effects... but the more important part, in my opinion, is that duplication is a source code maintenance problem. Bruce suggested this: x = a and a.b and a.b.c and a.b.c.d # which becomes x = a and a.b if x: x = x.c if x: x = x.d and frankly, I'd be more worried about a subsequent edit missing something than I would be about the performance of all the repeated lookups. Of course, Python does have an alternative, and that's to use attribute absence rather than falsiness: try: x = a.b.c.d except AttributeError: x = None But that won't always be an option. And any kind of expression that says "the thing on the left, if it's false, otherwise the thing on the left modified by this operator" is likely to get messy in anything more than trivial cases; it looks great here: x = a?.b?.c?.d but now imagine something more complicated, and it's a lot more messy. ChrisA
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 7, 2015, at 19:45, Chris Angelico <rosuav@gmail.com> wrote:
Part of the trouble with depending on CSE is that Python is so dynamic that you can't depend on things having no side effects... but the more important part, in my opinion, is that duplication is a source code maintenance problem. Bruce suggested this:
x = a and a.b and a.b.c and a.b.c.d # which becomes x = a and a.b if x: x = x.c if x: x = x.d
and frankly, I'd be more worried about a subsequent edit missing something than I would be about the performance of all the repeated lookups. Of course, Python does have an alternative, and that's to use attribute absence rather than falsiness:
try: x = a.b.c.d except AttributeError: x = None
But that won't always be an option.
I don't have a link, but one of the Swift development blogs shows a number of good examples where it isn't an option. When deciding whether they wanted SmallTalk-style nil chaining or Python-style AttributeError/LookupError, all the simple cases look just as good both ways. So they went out looking for real-life code in multiple languages to find examples that couldn't be translated to the other style. They found plenty of nil-chaining examples that were clumsy to translate to exceptions, but almost all of the exception examples that were clumsy to translate to nil chaining could be solved if they just had multiple levels of nil. So, if they could find a way to provide something like Haskell's Maybe, but without forcing you to think about monads and pattern matching, that would be better than exceptions. So that's what they did. (I'm not sure it's 100% successful, because there are rare times when you really do want to check for Just Nothing, and by hiding things under the covers they made that difficult... But in simple cases it definitely does work.) Anyway, their language design choice isn't directly relevant here (I assume nobody wants a.b.c.d to be None of a.b is missing, or wants to add a?.b?.c?.d syntax to Python), but the examples probably are.
And any kind of expression that says "the thing on the left, if it's false, otherwise the thing on the left modified by this operator" is likely to get messy in anything more than trivial cases; it looks great here:
x = a?.b?.c?.d
but now imagine something more complicated, and it's a lot more messy.
It's surprising how often it doesn't get messy in Swift. But when it does, I really miss being able to pattern match Just Nothing, and there's no way around that without two clumsy assignment statements before the conditional (or defining and calling an extra function), which is even worse than the one that Python often needs...
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 June 2015 at 12:45, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I'm definitely -1 on the also, alif syntax at this point. On the other hand, having done a lot of C programming in my misspent youth, I do miss anaphoric conditionals, so I too would like to see the possibility of "if cond as var: do_something_with_var" explored. Of course Nick is right that automatic common subexpression elimination (CSE) is the big win, but manual CSE can improve readability.
Part of the trouble with depending on CSE is that Python is so dynamic that you can't depend on things having no side effects... but the more important part, in my opinion, is that duplication is a source code maintenance problem.
Yes, this is the part of the problem definition I agree with, which is why I think named subexpressions are the most attractive alternative presented in the past discussions. Our typical answer is "pull the named subexpression out to a separate assignment statement and give it a name", but there are a range of constructs where that poses a problem. For example: x = a.b if a.b else a.c while a.b: x = a.b [a.b for a in iterable if a.b] Eliminating the duplication with named subexpressions would be straightforward (I'd suggest making the parentheses mandatory for this construct, which would also avoid ambiguity in the with statement and exception handler clause cases): x = b if (a.b as b) else a.c while (a.b as x): ... [b for a in iterable if (a.b as b)] By contrast, eliminating the duplication *today* requires switching to very different structures based on the underlying patterns otherwise hidden behind the syntactic sugar: x = a.b if not x: x = a.c while True: x = a.b if not x: break ... result = [] for a in iterable: b = a.b if b: result.append(b) The main *problem* with named subexpressions (aside from the potential for side effects introduced by deliberately letting the name bindings leak into the surrounding namespace) is that it introduces a redundancy at the single assignment level since an expression statement that names the expression would be equivalent to a simple assignment statement: x = a (a as x) On the other hand, there's a similar existing redundancy between function definitions and binding lambda expressions to a name: f = lambda: None def f(): pass And for that, we just have a PEP 8 style guideline recommending the latter form. Something similar would likely work for saying "only use named subexpressions in cases where using a normal assignment statement instead would require completely restructuring the code". Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 7, 2015, at 20:41, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 8 June 2015 at 12:45, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote: I'm definitely -1 on the also, alif syntax at this point. On the other hand, having done a lot of C programming in my misspent youth, I do miss anaphoric conditionals, so I too would like to see the possibility of "if cond as var: do_something_with_var" explored. Of course Nick is right that automatic common subexpression elimination (CSE) is the big win, but manual CSE can improve readability.
Part of the trouble with depending on CSE is that Python is so dynamic that you can't depend on things having no side effects... but the more important part, in my opinion, is that duplication is a source code maintenance problem.
Yes, this is the part of the problem definition I agree with, which is why I think named subexpressions are the most attractive alternative presented in the past discussions.
The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic. You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is:
[b for a in iterable if (a.b as b)]
That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse? Maybe something more like a let expression, where the binding goes as far left as possible instead of as far right would look better,
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
Sorry, early send... Sent from my iPhone
On Jun 7, 2015, at 21:24, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Jun 7, 2015, at 20:41, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 8 June 2015 at 12:45, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote: I'm definitely -1 on the also, alif syntax at this point. On the other hand, having done a lot of C programming in my misspent youth, I do miss anaphoric conditionals, so I too would like to see the possibility of "if cond as var: do_something_with_var" explored. Of course Nick is right that automatic common subexpression elimination (CSE) is the big win, but manual CSE can improve readability.
Part of the trouble with depending on CSE is that Python is so dynamic that you can't depend on things having no side effects... but the more important part, in my opinion, is that duplication is a source code maintenance problem.
Yes, this is the part of the problem definition I agree with, which is why I think named subexpressions are the most attractive alternative presented in the past discussions.
The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic.
You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is:
[b for a in iterable if (a.b as b)]
That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse?
Maybe something more like a let expression, where the binding goes as far left as possible instead of as far right would look better,
... but I can't even begin to think of a way to fit that into Python's syntax that isn't horribly ugly and clunky, and "as" already has lots of precedent, so I think that's not worth exploring.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 June 2015 at 14:24, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 7, 2015, at 20:41, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 8 June 2015 at 12:45, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I'm definitely -1 on the also, alif syntax at this point. On the
other hand, having done a lot of C programming in my misspent youth, I
do miss anaphoric conditionals, so I too would like to see the
possibility of "if cond as var: do_something_with_var" explored. Of
course Nick is right that automatic common subexpression elimination
(CSE) is the big win, but manual CSE can improve readability.
Part of the trouble with depending on CSE is that Python is so dynamic
that you can't depend on things having no side effects... but the more
important part, in my opinion, is that duplication is a source code
maintenance problem.
Yes, this is the part of the problem definition I agree with, which is why I think named subexpressions are the most attractive alternative presented in the past discussions.
The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic.
You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is:
[b for a in iterable if (a.b as b)]
That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse?
Ah, but that's one of the interesting aspects of the idea: since comprehensions and generator expressions *already* define their own nested scope in Python 3 in order to keep the iteration variable from leaking, their named subexpressions wouldn't leak either :) For if/elif clauses and while loops, the leaking would be a desired feature in order to make the subexpression available for use inside the following suite body. That would leave conditional expressions as the main suggested use case where leaking the named subexpressions might not be desirable. Without any dedicated syntax, the two ways that first come to mind for doing expression local named subexpressions would be: x = (lambda a=a: b if (a.b as b) else a.c)() x = next((b if (a.b as b) else a.c) for a in (a,)) Neither of which would be a particularly attractive option. The other possibility that comes to mind is to ask the question: "What happens when a named subexpression appears as part of an argument list to a function call, or as part of a subscript operation, or as part of a container display?", as in: x = func(b if (a.b as b) else a.c) x = y[b if (a.b as b) else a.c] x = (b if (a.b as b) else a.c), x = [b if (a.b as b) else a.c] x = {b if (a.b as b) else a.c} x = {'k': b if (a.b as b) else a.c} Having *those* subexpressions leak seems highly questionable, so it seems reasonable to suggest that in order for this idea to be workable in practice, there would need to be some form of implicit scoping rule where using a named subexpression turned certain constructs into "scoped subexpressions" that implicitly created a function object and called it, rather than being evaluated inline as normal. (The dual pass structure of the code generator should make this technically feasible - it would be similar to the existing behaviour where the presence of a yield expression changes the way a containing "def" statement is handled) However, that complication is significant enough to make me wonder how feasible the idea really is - yes, it handles simple cases nicely, but figuring out how to keep the side effect implications to a manageable level without making the scoping rules impossibly hard to follow would be a non-trivial challenge. Without attempting to implement it, I'm honestly not sure how hard it would be to introduce more comprehension style implicit scopes to bound the propagation of named subexpression bindings. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 8, 2015, at 03:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 8 June 2015 at 14:24, Andrew Barnert <abarnert@yahoo.com> wrote:
The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic.
You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is:
[b for a in iterable if (a.b as b)]
That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse?
Ah, but that's one of the interesting aspects of the idea: since comprehensions and generator expressions *already* define their own nested scope in Python 3 in order to keep the iteration variable from leaking, their named subexpressions wouldn't leak either :)
For if/elif clauses and while loops, the leaking would be a desired feature in order to make the subexpression available for use inside the following suite body.
Except it would also make the subexpression available for use _after_ the suite body. And it would give you a way to accidentally replace rather than shadow a variable from earlier in the function. So it really is just as bad as any other assignment or other mutation inside a condition.
That would leave conditional expressions as the main suggested use case where leaking the named subexpressions might not be desirable. Without any dedicated syntax, the two ways that first come to mind for doing expression local named subexpressions would be:
x = (lambda a=a: b if (a.b as b) else a.c)() x = next((b if (a.b as b) else a.c) for a in (a,))
Neither of which would be a particularly attractive option.
Especially since if you're willing to introduce an otherwise-unnecessary scope, you don't even need this feature: x = (lambda b: b if b else a.c)(a.b) x = (lambda b=a.b: b if b else a.c)() Or, of course, you can just define a reusable ifelse function somewhere: def defaultify(val, defaultval return val if val else defaultval x = defaultify(a.b, a.c)
The other possibility that comes to mind is to ask the question: "What happens when a named subexpression appears as part of an argument list to a function call, or as part of a subscript operation, or as part of a container display?", as in:
x = func(b if (a.b as b) else a.c) x = y[b if (a.b as b) else a.c] x = (b if (a.b as b) else a.c), x = [b if (a.b as b) else a.c] x = {b if (a.b as b) else a.c} x = {'k': b if (a.b as b) else a.c}
Having *those* subexpressions leak seems highly questionable, so it seems reasonable to suggest that in order for this idea to be workable in practice, there would need to be some form of implicit scoping rule where using a named subexpression turned certain constructs into "scoped subexpressions" that implicitly created a function object and called it, rather than being evaluated inline as normal.
Now you really _are_ reinventing let. A let expression like this: x = let b=a.b in (b if b else a.c) ... is effectively just syntactic sugar for the lambda above. And it's a lot more natural and easy to reason about than letting b escape one step out to the conditional expression but not any farther. (Or to the rest of the complete containing expression? Or the statement? What does "x[(a.b as b)] = b" mean, for example? Or "x[(b if (a.b as b) else a.c) + (b if (d.b as b) else d.c)]"? Or "x[(b if (a.b as b) else a.c) + b]"?) As a side note, the initial proposal here was to improve performance by not repeating the a.b lookup; I don't think adding an implicit comprehension-like function definition and call will be faster than a getattr except in very uncommon cases. However, I think there are reasonable cases where it's more about correctness than performance (e.g., the real expression you want to avoid evaluating twice is next(spam) or f.readline(), not a.b), so I'm not too concerned there. Also, I'm pretty sure a JIT could effectively inline a function definition plus call more easily than it could CSE an expression that's hard to prove is static.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jun 08, 2015 at 04:24:33AM -0700, Andrew Barnert via Python-ideas wrote: [...]
For if/elif clauses and while loops, the leaking would be a desired feature in order to make the subexpression available for use inside the following suite body.
Except it would also make the subexpression available for use _after_ the suite body. And it would give you a way to accidentally replace rather than shadow a variable from earlier in the function. So it really is just as bad as any other assignment or other mutation inside a condition.
I don't know why you think this will be a bad thing. Or rather, even if it is a bad thing, it's the Python Way. Apart from classes and functions themselves, indented blocks are *not* new scopes as they may be in some other languages. They are part of the existing scope, and the issues you raise above are already true today: x = 1 if some_condition(): x = 2 # replaces, rather than shadow, the earlier x y = 3 # y may be available for use after the suite body So I don't see the following as any more of a problem: x = 1 if (some_condition() as x) or (another_condition() as y): ... # x is replaced, and y is available The solution to replacing a variable is, use another name. And if you really care about y escaping from the if-block, just use del y at the end of the block. (I can't imagine why anyone would bother.) [...]
The other possibility that comes to mind is to ask the question: "What happens when a named subexpression appears as part of an argument list to a function call, or as part of a subscript operation, or as part of a container display?", as in:
x = func(b if (a.b as b) else a.c) x = y[b if (a.b as b) else a.c] x = (b if (a.b as b) else a.c), x = [b if (a.b as b) else a.c] x = {b if (a.b as b) else a.c} x = {'k': b if (a.b as b) else a.c}
Having *those* subexpressions leak seems highly questionable,
I agree with that in regard to the function call. It just feels wrong and icky for a binding to occur inside a function call like that. But I don't think I agree with respect to the rest. To answer Andrew's later question:
What does "x[(a.b as b)] = b" mean
surely it simply means the same as: b = a.b x[b] = b Now we could apply the same logic to a function call: # func(a.b as b) b = a.b func(b) but I think the reason this feels wrong for function calls is that it looks like the "as b" binding should be inside the function's scope rather than in the caller's scope. (At least that's what it looks like to me.) But that doesn't apply to the others. (At least for me.) But frankly, I think I would prefer to have b escape from the function call than to have to deal with a bunch of obscure, complicated and unintuitive "as" scoping rules. Simplicity and predictability counts for a lot. -- Steve
data:image/s3,"s3://crabby-images/becb0/becb0e095c5bd09b8ccb4a887c52fcdbb7040ff9" alt=""
I just got this funny feeling reading the last few posts. # this f(i+1 as i) # feels a lot like.. f(i++) # but really f(++i)
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 June 2015 at 22:12, Steven D'Aprano <steve@pearwood.info> wrote: [In relation to named subexpressions leaking to the surrounding namespace by default]
I agree with that in regard to the function call. It just feels wrong and icky for a binding to occur inside a function call like that. But I don't think I agree with respect to the rest. To answer Andrew's later question:
What does "x[(a.b as b)] = b" mean
surely it simply means the same as:
b = a.b x[b] = b
Right, but it reveals the execution order jumping around in a way that is less obvious in the absence of side effects. That is, for side effect free functions, the order of evaluation in: x[a()] = b() doesn't matter. Once side effects are in play, the order matters a lot more.
Now we could apply the same logic to a function call:
# func(a.b as b) b = a.b func(b)
but I think the reason this feels wrong for function calls is that it looks like the "as b" binding should be inside the function's scope rather than in the caller's scope. (At least that's what it looks like to me.) But that doesn't apply to the others. (At least for me.)
But frankly, I think I would prefer to have b escape from the function call than to have to deal with a bunch of obscure, complicated and unintuitive "as" scoping rules. Simplicity and predictability counts for a lot.
Hence the ongoing absence of named subexpressions as a feature - the simple cases look potentially interesting, but without careful consideration, the complex cases would inevitably end up depending on CPython specific quirks in subexpression execution order. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/1549a/1549a123d5c95646f310d36ec93519466b1d05b6" alt=""
Am 08.06.2015 um 14:38 schrieb Nick Coghlan:
On 8 June 2015 at 22:12, Steven D'Aprano <steve@pearwood.info> wrote:
[In relation to named subexpressions leaking to the surrounding namespace by default]
What does "x[(a.b as b)] = b" mean
surely it simply means the same as:
b = a.b x[b] = b
Right, but it reveals the execution order jumping around in a way that is less obvious in the absence of side effects.
I'm lost. The evaluation order of today (right hand side first) would make "x[(a.b as b)] = b" mean x[a.b] = b b = a.b (assuming looking up a.b has no side effects). Would the introduction of named subexpressions change that, and how?
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 9 June 2015 at 01:24, Wolfram Hinderer <wolfram.hinderer@googlemail.com> wrote:
Am 08.06.2015 um 14:38 schrieb Nick Coghlan:
On 8 June 2015 at 22:12, Steven D'Aprano <steve@pearwood.info> wrote:
[In relation to named subexpressions leaking to the surrounding namespace by default]
What does "x[(a.b as b)] = b" mean
surely it simply means the same as:
b = a.b x[b] = b
Right, but it reveals the execution order jumping around in a way that is less obvious in the absence of side effects.
I'm lost. The evaluation order of today (right hand side first) would make "x[(a.b as b)] = b" mean
x[a.b] = b b = a.b
(assuming looking up a.b has no side effects).
That assumption that the LHS evaluation has no side effects is the one that gets revealed by named subexpressions:
def subscript(): ... print("Subscript called") ... return 0 ... def value(): ... print("Value called") ... return 42 ... def target(): ... print("Target called") ... return [None] ... target()[subscript()] = value() Value called Target called Subscript called
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/1549a/1549a123d5c95646f310d36ec93519466b1d05b6" alt=""
Am 08.06.2015 um 23:26 schrieb Nick Coghlan:
Am 08.06.2015 um 14:38 schrieb Nick Coghlan:
On 8 June 2015 at 22:12, Steven D'Aprano <steve@pearwood.info> wrote:
[In relation to named subexpressions leaking to the surrounding namespace by default]
What does "x[(a.b as b)] = b" mean surely it simply means the same as:
b = a.b x[b] = b Right, but it reveals the execution order jumping around in a way that is less obvious in the absence of side effects. I'm lost. The evaluation order of today (right hand side first) would make "x[(a.b as b)] = b" mean
x[a.b] = b b = a.b
(assuming looking up a.b has no side effects). That assumption that the LHS evaluation has no side effects is the one
On 9 June 2015 at 01:24, Wolfram Hinderer <wolfram.hinderer@googlemail.com> wrote: that gets revealed by named subexpressions:
def subscript(): ... print("Subscript called") ... return 0 ... def value(): ... print("Value called") ... return 42 ... def target(): ... print("Target called") ... return [None] ... target()[subscript()] = value() Value called Target called Subscript called
Hm, that's my point, isn't it? The evaluation of subscript() happens after the evaluation of value(). The object that the RHS evaluates to (i.e. value()) is determined before subscript() is evaluated. Sideeffects of subscript() may mutate this object, but can't change *which* object is assigned. But if x[(a.b as b)] = b means b = a.b x[b] = b then the evaluation of the LHS *does* change which object is assigned. That's why I asked for clarification. (I mentioned the thing about a.b not having side effects only because in my alternative x[a.b] = b b = a.b a.b is called twice, so it's no exact representation of what is going on either. But it's a lot closer, at least the right object is assigned ;-) )
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 10 Jun 2015 05:00, "Wolfram Hinderer" <wolfram.hinderer@googlemail.com> wrote:
Hm, that's my point, isn't it? The evaluation of subscript() happens after the evaluation of value(). The object that the RHS evaluates to (i.e. value()) is determined before subscript() is evaluated. Sideeffects of subscript() may mutate this object, but can't change *which* object is assigned. But if
x[(a.b as b)] = b
means
b = a.b x[b] = b
That would be: x[b] = (a.b as b)
then the evaluation of the LHS *does* change which object is assigned. That's why I asked for clarification.
Execution order wouldn't change, so it would mean the following: _temp = b b = a.b x[b] = _temp This means you'd get the potentially surprising behaviour where the name binding would still happen even if the subscript assignment fails. However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases. Cheers, Nick.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 9, 2015, at 16:46, Nick Coghlan <ncoghlan@gmail.com> wrote:
However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases.
I don't think here's any consistent way to define "containing expression" that makes any sense for while/if statements. But "containing _statement_", that's easy. In addition to the function local scope that exists today, add a statement local scope. Only an as-binding expression creates a new statement-local binding, and it does so in the smallest containing statement (so, e.g., in a while statement's condition, it's the whole while statement, suite and else suite as well as the rest of the condition). These bindings shadow outer as-bindings and function-locals. Assignments inside a statement that as-binds the variable change the statement-local variable, rather than creating a function-local. Two as-bindings within the same statement are treated like an as-binding followed by assignment in the normal (possibly implementation-dependent) evaluation order (which should rarely be relevant, unless you're deliberately writing pathological code). Of course this is much more complex than Python's current rules. But it's not that hard to reason about. In particular, even in silly cases akin to "x[(a.b as b)] = b" and "x[b] = (a.b as b)", either it does what you'd naively expect or raises an UnboundLocalError; it never uses any outer value of b. And, I think, in all of the cases you actually want people to use, it means what you want it to. It even handles cases where you put multiple as bindings for the same name in different subexpressions of an expression in the same part of a statement. Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa. The question is, is that the behavior you'd intuitively want, or is escaping to the rest of the smallest statement sometimes unacceptable, or are the rules about assignments inside a controlled suite wrong in some case?
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
I'd actually rather see this implemented the other way around: instead of turning this into a function call, actually have a real concept of nested scoping. Nested functions imply changes to tracebacks and such, which scoping doesn't require. How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same? Example: def f(x): e = 2.718281828 try: return e/x except ZeroDivisionError as e: raise ContrivedCodeException from e Currently, f.__code__.co_varnames is ('x', 'e'), and all the references to e are working with slot 1; imagine if, instead, co_varnames were ('x', 'e', 'e') and the last two lines used slot 2 instead. Then the final act of the except clause would be to unbind its local name e (slot 2), and then any code after the except block would use slot 1 for e, and the original value would "reappear". The only place that would need to "know" about the stack of scopes is the compilation step; everything after that just uses the slots. Is this feasible? ChrisA
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 9, 2015, at 17:54, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
I'd actually rather see this implemented the other way around: instead of turning this into a function call, actually have a real concept of nested scoping. Nested functions imply changes to tracebacks and such, which scoping doesn't require.
How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same?
Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes. The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()? What if the outer e was nonlocal or global? What if either e is referenced by an inner function? What if another statement re-rebinds e inside the first statement? What if you do this inside a class (or at top level)?I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. (And obviously the same if some C extension calls PyFrame_LocalsToFast or equivalent.) But for a real implementation, I'm not even sure what the rules should be, much less how to implement them. (I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.) Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true? It might be better to not start off worrying about how to get there from here, and instead first try to design the complete scoping rules for a language that's like Python but with nested scopes, and then identify all the places that it would differ from Python, and then decide which parts of the existing machinery you can hack up and which parts you have to completely replace. (Maybe, for example, would be easier with new bytecodes to replace LOAD_CLOSURE, LOAD_DEREF, MAKE_CLOSURE, etc. than trying to modify the data to make those bytecodes work properly.)
Example:
def f(x): e = 2.718281828 try: return e/x except ZeroDivisionError as e: raise ContrivedCodeException from e
Currently, f.__code__.co_varnames is ('x', 'e'), and all the references to e are working with slot 1; imagine if, instead, co_varnames were ('x', 'e', 'e') and the last two lines used slot 2 instead. Then the final act of the except clause would be to unbind its local name e (slot 2), and then any code after the except block would use slot 1 for e, and the original value would "reappear".
I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff).
The only place that would need to "know" about the stack of scopes is the compilation step; everything after that just uses the slots. Is this feasible?
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jun 10, 2015 at 11:58 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same?
Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes.
That's something like what I was thinking of, yeah.
The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()?
Ow, that one I have no idea about. Hmm. That could be majorly problematic; if you call locals() inside the inner scope, and then use that dictionary outside it, you should expect it to work. This would be hard.
What if the outer e was nonlocal or global?
The inner e will always get its magic name, and it doesn't matter what the outer e is. That's exactly the same as would happen if there were no shadowing:
def f(x): ... global e ... try: 1/x ... except ZeroDivisionError as e: pass ... return e**x ... e=2.718281828 f(3) 20.085536913011932 f(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in f NameError: name 'e' is not defined
If x is nonzero, the except clause doesn't happen, and no shadowing happens. With this theory, the same would happen if x is zero - the "as e" would effectively be "as <e.0>" or whatever the magic name is, and then "e**x" would use the global e. It would have to be an error to use a global or nonlocal statement *inside* the as-governed block: def f(x): try: whatever except Exception as e: global e # SyntaxError I can't imagine that this would be a problem to anyone. The rule is that "as X" makes X into a statement-local name, and that's incompatible with a global declaration.
What if either e is referenced by an inner function?
I don't know about internals and how hard it'd be, but I would expect that the as-name propagation should continue into the function. A quick check with dis.dis() suggests that CPython uses a LOAD_DEREF/STORE_DEREF bytecode to work with nonlocals, so that one might have to become scope-aware too. (It would be based on definition, not call, so it should be able to be compiled in somehow, but I can't say for sure.)
What if another statement re-rebinds e inside the first statement?
As in, something like this? def f(x): e = 2.718 try: 1/0 except Exception as e: e = 1 print(e) The "e = 1" would assign to <e.0>, because it's in a scope where the local name e translates into that. Any use of that name, whether rebinding or referencing, will use the inner scope. But I would expect this sort of thing to be unusual.
What if you do this inside a class (or at top level)?
At top level, it would presumably have to create another global. If you call a function from inside that block, it won't see your semi-local, though I'm not sure what happens if you _define_ a function inside a block like that: with open("spam.log", "a") as logfile: def log(x): logfile.write(x) Given that this example wouldn't work anyway (the file would get closed before the function gets called), and I can't think of any non-trivial examples where you'd actually want this, I can't call what ought to happen.
I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. But for a real implementation, I'm not even sure what the rules should be, much less how to implement them.
Sure, for a quick-and-dirty. I think some will be illegal long-term too.
(I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.)
I have no idea about the CPython internals, but my broad thinking is something like this: You start with an empty stack, and add to it whenever you hit an "as" clause. Whenever you look up a name, you proceed through the stack from newest to oldest; if you find the name, you use the mangled name from that stack entry. Otherwise, you use the same handling as current.
Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true?
I'm not entirely sure, but I think that tracebacks etc will start with the index and then look it up. Having duplicate names in co_varnames would allow them to look correct. Can someone confirm?
Example:
def f(x): e = 2.718281828 try: return e/x except ZeroDivisionError as e: raise ContrivedCodeException from e
Currently, f.__code__.co_varnames is ('x', 'e'), and all the references to e are working with slot 1; imagine if, instead, co_varnames were ('x', 'e', 'e') and the last two lines used slot 2 instead. Then the final act of the except clause would be to unbind its local name e (slot 2), and then any code after the except block would use slot 1 for e, and the original value would "reappear".
I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff).
The unbind is there to prevent a reference loop from causing problems. And yes, it's effectively the implicit "del e" on slot 2. ChrisA
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 9, 2015, at 20:27, Chris Angelico <rosuav@gmail.com> wrote:
with open("spam.log", "a") as logfile: def log(x): logfile.write(x)
Given that this example wouldn't work anyway (the file would get closed before the function gets called), and I can't think of any non-trivial examples where you'd actually want this, I can't call what ought to happen.
The obvious one is: with open("spam.log", "a") as logfile: def log(x): logfile.write(x) do_lots_of_stuff(logfunc=log) Of course in this case you could just pass logfile.write instead of a function, but more generally, anywhere you create a helper or callback as a closure to use immediately (e.g., in a SAX parser) instead of later (e.g., in a network server or GUI) it makes sense to put a closure inside a with statement. Also, remember that the whole point here is to extend as-binding so it works in if and while conditions, and maybe arbitrary expressions, and those cases it's even more obvious why you'd want to create a closure. Anyway, I think I know what all the compiled bytecode and code attributes for that case could look like (although I'd need to think through the edge cases), I'm just not sure if the code that compiles it today will be able to handle things without some rename-and-rename-back hack. I suppose the obvious answer is for someone to just try writing it and see. :) But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jun 10, 2015 at 2:03 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 9, 2015, at 20:27, Chris Angelico <rosuav@gmail.com> wrote:
with open("spam.log", "a") as logfile: def log(x): logfile.write(x)
Given that this example wouldn't work anyway (the file would get closed before the function gets called), and I can't think of any non-trivial examples where you'd actually want this, I can't call what ought to happen.
The obvious one is:
with open("spam.log", "a") as logfile: def log(x): logfile.write(x) do_lots_of_stuff(logfunc=log)
Of course in this case you could just pass logfile.write instead of a function, but more generally, anywhere you create a helper or callback as a closure to use immediately (e.g., in a SAX parser) instead of later (e.g., in a network server or GUI) it makes sense to put a closure inside a with statement.
Sure. In this example, there'd have to be some kind of "thing" that exists as a global, and can be referenced by the log function. That's not too hard; the usage all starts and ends inside the duration of the "as" effect; any other global named "logfile" would simply be unavailable. The confusion would come if you try to span the boundary in some way - when it would be possible to call log(logfile) and have it write to the log file defined by the with block, but have its argument come from outside. At very least, that would want to be strongly discouraged for reasons of readability.
Also, remember that the whole point here is to extend as-binding so it works in if and while conditions, and maybe arbitrary expressions, and those cases it's even more obvious why you'd want to create a closure.
Anyway, I think I know what all the compiled bytecode and code attributes for that case could look like (although I'd need to think through the edge cases), I'm just not sure if the code that compiles it today will be able to handle things without some rename-and-rename-back hack. I suppose the obvious answer is for someone to just try writing it and see. :)
But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.
Okay. I'll start poking around with CPython and see what I can do. I'm reminded of that spectacular slide from David Beazley's talk on CPython and PyPy tinkering, where he has that VW called CPython, and then talks about patches, extensions, PEPs... and python-ideas. https://www.youtube.com/watch?v=l_HBRhcgeuQ at the four minute mark. ChrisA
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jun 10, 2015 at 4:17 PM, Chris Angelico <rosuav@gmail.com> wrote:
But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.
Okay. I'll start poking around with CPython and see what I can do.
Here's a gross, disgusting, brutal hack. It applies only to try/except (but can easily be expanded to other places; it's just a matter of calling one function at top and bottom), and it currently assumes that you're in a function scope (not at top level, not directly in a class; methods are supported). (Should I create a tracker issue? It's not even at proof-of-concept at this point.) Here's how it works: As an 'except' block is entered (at compilation stage), a new subscope is defined. At the end of the except block, after the "e = None; del e" opcodes get added in, the subscope is popped off and disposed of. So long as there is a subscope attached to the current compilation unit, any name lookups will be redirected through it. Finally, when co_varnames is populated, names get de-mangled, thus (possibly) making duplicates in the tuple, but more importantly, getting tracebacks and such looking correct. The subscope is a tiny thing that just says "this name now becomes that mangled name", where the mangled name is the original name dot something (eg mangle "e" and get back "e.0x12345678"); they're stored in a linked list in the current compiler_unit. Currently, locals() basically ignores the magic. If there is no "regular" name to be shadowed, then it correctly picks up the interior one; if there are both forms, I've no idea how it picks which one to put into the dictionary, but it certainly can't logically retain both. The fact that it manages to not crash and burn is, in my opinion, pure luck :) Can compiler_nameop() depend on all names being interned? I have a full-on PyObject_RichCompareBool() to check for name equality; if they're all interned, I could simply do a pointer comparison instead. Next plan: Change compiler_comprehension_generator() to use subscopes rather than a full nested function, and then do performance testing. Currently, this can only have slowed things down. Removing the function call overhead from list comps could give that speed back. ChrisA
data:image/s3,"s3://crabby-images/52bd8/52bd80b85ad23b22cd55e442f406b4f3ee8efd9f" alt=""
Maybe it's just me, but nesting_demo.py has several junk characters at the end (^@). On June 10, 2015 10:06:26 AM CDT, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Jun 10, 2015 at 4:17 PM, Chris Angelico <rosuav@gmail.com> wrote:
But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first.
Okay. I'll start poking around with CPython and see what I can do.
Here's a gross, disgusting, brutal hack. It applies only to try/except (but can easily be expanded to other places; it's just a matter of calling one function at top and bottom), and it currently assumes that you're in a function scope (not at top level, not directly in a class; methods are supported).
(Should I create a tracker issue? It's not even at proof-of-concept at this point.)
Here's how it works: As an 'except' block is entered (at compilation stage), a new subscope is defined. At the end of the except block, after the "e = None; del e" opcodes get added in, the subscope is popped off and disposed of. So long as there is a subscope attached to the current compilation unit, any name lookups will be redirected through it. Finally, when co_varnames is populated, names get de-mangled, thus (possibly) making duplicates in the tuple, but more importantly, getting tracebacks and such looking correct.
The subscope is a tiny thing that just says "this name now becomes that mangled name", where the mangled name is the original name dot something (eg mangle "e" and get back "e.0x12345678"); they're stored in a linked list in the current compiler_unit.
Currently, locals() basically ignores the magic. If there is no "regular" name to be shadowed, then it correctly picks up the interior one; if there are both forms, I've no idea how it picks which one to put into the dictionary, but it certainly can't logically retain both. The fact that it manages to not crash and burn is, in my opinion, pure luck :)
Can compiler_nameop() depend on all names being interned? I have a full-on PyObject_RichCompareBool() to check for name equality; if they're all interned, I could simply do a pointer comparison instead.
Next plan: Change compiler_comprehension_generator() to use subscopes rather than a full nested function, and then do performance testing. Currently, this can only have slowed things down. Removing the function call overhead from list comps could give that speed back.
ChrisA
------------------------------------------------------------------------
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, Jun 11, 2015 at 1:12 AM, Ryan Gonzalez <rymg19@gmail.com> wrote:
Maybe it's just me, but nesting_demo.py has several junk characters at the end (^@).
Hmm, I just redownloaded it, and it appears correct. The end of the file has some triple-quoted strings, the last one ends with three double quote characters and then a newline, then that's it. But maybe that's Gmail being too smart and just giving me back what I sent. ChrisA
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, Jun 11, 2015 at 1:06 AM, Chris Angelico <rosuav@gmail.com> wrote:
Next plan: Change compiler_comprehension_generator() to use subscopes rather than a full nested function, and then do performance testing. Currently, this can only have slowed things down. Removing the function call overhead from list comps could give that speed back.
Or maybe the next plan is to hack in a "while cond as name:" handler. It works! And the name is bound only within the scope of the while block and any else block (so when you get a falsey result, you can see precisely _what_ falsey result it was). The surprising part, in my opinion, is that this actually appears to work outside a function. The demangling doesn't, but the original mangling does. It doesn't play ideally with locals() or globals(); the former appears to take the first one that it sees, and ignore the others (though I wouldn't promise that; certainly it takes exactly one local of any given name. With globals(), you get the mangled name: while input("Spam? ") as spam: print(globals()) break Spam? yes {... 'spam.0x7f2080260228': 'yes'...} With brand new syntax like "while cond as name:", it won't break anything to use a mangled name, but this is a backward-incompatible change as regards exception handling and globals(). Still, it's a fun hack. Aside from being a fun exercise for me, building a Volkswagen Helicopter, is this at all useful to anybody? ChrisA
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 06/11/2015 03:56 AM, Chris Angelico wrote:
while input("Spam? ") as spam: print(globals()) break
Spam? yes {... 'spam.0x7f2080260228': 'yes'...}
Having names not leak from listcomps and genexps is a good thing. Having names not leak from try/execpt blocks is a necessary thing. Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it at the end.
Aside from being a fun exercise for me, building a Volkswagen Helicopter, is this at all useful to anybody?
I would find the 'as NAME' portion very useful as long as it wasn't shadowing nor unset. -- ~Ethan~
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Fri, Jun 12, 2015 at 5:10 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 06/11/2015 03:56 AM, Chris Angelico wrote:
while input("Spam? ") as spam: print(globals()) break
Spam? yes {... 'spam.0x7f2080260228': 'yes'...}
Having names not leak from listcomps and genexps is a good thing.
Having names not leak from try/execpt blocks is a necessary thing.
Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it at the end.
Aside from being a fun exercise for me, building a Volkswagen Helicopter, is this at all useful to anybody?
I would find the 'as NAME' portion very useful as long as it wasn't shadowing nor unset.
Sure. Removing the scoping from the "while cond as target" rule is simple. Just delete a couple of lines of code (one at the top, one at the bottom), and it'll do a simple name binding. On the subject of try/except unbinding, though, there's a surprising thing in the code: the last action in an except clause is to assign None to the name, and *then* del it: try: suite except Something as e: try: except_block finally: e = None del e Why set it to None just before delling it? It's clearly no accident, so it must have a reason for existing. (With CPython sources, it's always safest to assume intelligent design.) ChrisA
data:image/s3,"s3://crabby-images/ef1c2/ef1c2b0cd950cc4cbc0d26a5e2b8ae2dd6375afc" alt=""
On 06/11/2015 03:10 PM, Ethan Furman wrote:
On 06/11/2015 03:56 AM, Chris Angelico wrote:
while input("Spam? ") as spam: print(globals()) break
Spam? yes {... 'spam.0x7f2080260228': 'yes'...}
Having names not leak from listcomps and genexps is a good thing.
In a way this makes sense because you can think of them as a type of function literal.
Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it at the end.
Having a group of statement share a set of values is fairly easy to think about. Having them share some values at some times, and not others at other times is not so easy to think about. I also get the feeling the solution is more complex than the problem. Ummm... to clarify that. The inconvenience of not having the solution to the apparent problem, is less of a problem than the possible problems I think might arise with the solution. It's Kind of like parsing that sentence, Ron
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 06/11/2015 04:23 PM, Ron Adam wrote:
On 06/11/2015 03:10 PM, Ethan Furman wrote:
On 06/11/2015 03:56 AM, Chris Angelico wrote:
while input("Spam? ") as spam: print(globals()) break
Spam? yes {... 'spam.0x7f2080260228': 'yes'...}
Having names not leak from listcomps and genexps is a good thing.
In a way this makes sense because you can think of them as a type of function literal.
Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it at the end.
Having a group of statement share a set of values is fairly easy to think about.
But that is not how Python works. When you bind a name, that name stays until the scope is left (with one notable exception).
Having them share some values at some times, and not others at other times is not so easy to think about.
Which is why I would not have the psuedo-scope on any of them. The only place where that currently happens is in a try/except clause, and that should remain the only exception. -- ~Ethan~
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 11, 2015, at 17:12, Ethan Furman <ethan@stoneleaf.us> wrote:
On 06/11/2015 04:23 PM, Ron Adam wrote:
On 06/11/2015 03:10 PM, Ethan Furman wrote:
On 06/11/2015 03:56 AM, Chris Angelico wrote:
while input("Spam? ") as spam: print(globals()) break
Spam? yes {... 'spam.0x7f2080260228': 'yes'...}
Having names not leak from listcomps and genexps is a good thing.
In a way this makes sense because you can think of them as a type of function literal.
Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it at the end.
Having a group of statement share a set of values is fairly easy to think about.
But that is not how Python works. When you bind a name, that name stays until the scope is left (with one notable exception).
What Nick was proposing was to explicitly change the way Python works. And what Chris hacked up was (part of) what Nick proposed. So you're just pointing out that this change to the way Python works would be a change to the way Python works. Well, of course it would. The question is whether it would be a good change. Nick's point was that they tried a similar change to implement comprehensions without needing to "fake it" with a hidden function, and it makes the implementation far too complex, so it doesn't even matter if it's a well-designed and desirable change. Of course it's also possible that it's not a desirable change (e.g., the current scoping rules are simple enough to keep things straight in your head while reading any function that isn't already too long to be a function, but more complex rules wouldn't be), or that it's possible desirable but not as designed (e.g., I still think Nick's idea of binding within the expression or the statement in a somewhat complex way is more confusing than just binding within the statement). But Chris's attempt to show that the implementation problems might be resolvable, and/or to give people a hack they can play with instead of having to guess, is still a reasonable response to Nick's point. I agree with your implied point that a language with two kinds of locality, one nested by block and the other function-wide, is probably not as good a design as one with only the first kind (like C) or only the second (like Python), and that's even more true in a language with closures or implicit declarations (both of which Python has), so I think any design is going to be a mess (definitely including my own straw-man design, and Nick's, and what Chris's hack implements). But it's certainly a _possible_ design, and there's nothing about Python 3.5 that means it would be impossible or backward-incompatible (as opposed to just a bad idea) to have such a design for Python 3.6.
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Ethan Furman writes:
I would find the 'as NAME' portion very useful as long as it wasn't shadowing nor unset.
I don't understand the "not shadowing" requirement. If you're not going to create a new scope, then from foo import * if expr as val: use(val) bar(val) might very well shadow foo.val and break the invocation of bar. Is use of the identifier "val" in this context an error? Or what?
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 06/11/2015 11:55 PM, Stephen J. Turnbull wrote:
Ethan Furman writes:
I would find the 'as NAME' portion very useful as long as it wasn't shadowing nor unset.
I don't understand the "not shadowing" requirement. If you're not going to create a new scope, then
from foo import *
if expr as val: use(val)
bar(val)
might very well shadow foo.val and break the invocation of bar. Is use of the identifier "val" in this context an error? Or what?
Likewise: for val in some_iterator: use(val) bar(val) will shadow foo.val and break bar; yet for loops do not create their own scopes. with open('somefile') as val: stuff = val.read() bar(val) will also shadow foo.val and break bar, yet with contexts do not create their own scopes. And let's not forget: val = some_func() bar(val) Again -- no micro-scope, and foo.val is shadowed. -- ~Ethan~
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote:
Ethan Furman writes:
Likewise:
for val in some_iterator: use(val)
bar(val)
will shadow foo.val
Yes, I understand that. What I don't understand is your statement that you would like "if expr as val:" if it *doesn't* shadow.
Ah, I think I see your point. My use of the word "shadow" was in relation to the micro-scope and the previously existing name being shadowed and then un-shadowed when the micro-scope was destroyed. If we are at module-level (not class nor function) then there should be no shadowing, but a rebinding of the name. Even try/except blocks don't "shadow", but rebind and then delete the name used to catch the exception. -- ~Ethan~
data:image/s3,"s3://crabby-images/ef1c2/ef1c2b0cd950cc4cbc0d26a5e2b8ae2dd6375afc" alt=""
On 06/12/2015 01:21 PM, Ethan Furman wrote:
On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote:
Ethan Furman writes:
Likewise:
for val in some_iterator: use(val)
bar(val)
will shadow foo.val
Yes, I understand that. What I don't understand is your statement that you would like "if expr as val:" if it *doesn't* shadow.
Ah, I think I see your point. My use of the word "shadow" was in relation to the micro-scope and the previously existing name being shadowed and then un-shadowed when the micro-scope was destroyed. If we are at module-level (not class nor function) then there should be no shadowing, but a rebinding of the name. Even try/except blocks don't "shadow", but rebind and then delete the name used to catch the exception.
The problem can be turned around/over. Instead of specifying a name to be shadowed, the names to be shared can be specified. Then it translates to function with specified nonlocals. a = 1 # will be shared b = 2 # will be shadowed def do_loop_with_shared_items(): nonlocal a # a is a shared value. for b in some_iterator: a = use(b) do_loop_with_shared_items() print(a) # changed by loop print(b) # print 2. Not changed by loop That might be expressed as... a = 1 b = 2 with nonlocal a: # a is shared for b in some_iterator: a = use(b) # other values (b) are local to block. print(a) # changed by loop print(b) # prints 2. Not changed by loop And with this, the "as" modifier isn't needed, just don't list the item as a nonlocal. with nonlocal: a = foo.bar # a as foo.bar in this block scope only. ,,, This has the advantage of not complicating other statements and keeps the concept in a separate mental box. I like this better, but am still -0.5. I'd need to see some examples where it would be "worth it". It still feels like a solution looking for a problem to me. Cheers, Ron
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 12, 2015, at 11:25, Ron Adam <ron3200@gmail.com> wrote:
On 06/12/2015 01:21 PM, Ethan Furman wrote:
On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote: Ethan Furman writes:
Likewise:
for val in some_iterator: use(val)
bar(val)
will shadow foo.val
Yes, I understand that. What I don't understand is your statement that you would like "if expr as val:" if it *doesn't* shadow.
Ah, I think I see your point. My use of the word "shadow" was in relation to the micro-scope and the previously existing name being shadowed and then un-shadowed when the micro-scope was destroyed. If we are at module-level (not class nor function) then there should be no shadowing, but a rebinding of the name. Even try/except blocks don't "shadow", but rebind and then delete the name used to catch the exception.
The problem can be turned around/over. Instead of specifying a name to be shadowed, the names to be shared can be specified. Then it translates to function with specified nonlocals.
I really like making it explicit. I'm not sure about the turning-it-around bit. That means inside a with-nonlocal block, things don't work the same as in any another block, and that won't be at all obvious. But even without that, the idea works; to make something nested-local, you write: with local b: for b in some_iterator: a = use(b) That leaves function-local as the default, and defines statement-local in a way that's as similar as possible to the other alternatives, environment-nonlocal and global; the only real difference is that it has a suite, which is pretty much implicit in the fact that it's defining something as local to the suite. Either way seems better than the quasi-magic scoping (both my version and Nick's took a couple paragraphs to explain...) caused by as expressions and/or clauses. And that's in addition to the advantages you suggested of not complicating the syntax and keeping separate concepts separate.
I like this better, but am still -0.5. I'd need to see some examples where it would be "worth it". It still feels like a solution looking for a problem to me.
Agreed. I think everyone (including myself) has put thought into this just because it's an interesting puzzle, not necessarily because the language needs it...
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Ethan Furman writes:
Yes, I understand that. What I don't understand is your statement that you would like "if expr as val:" if it *doesn't* shadow.
Ah, I think I see your point. My use of the word "shadow" was in relation to the micro-scope and the previously existing name being shadowed and then un-shadowed when the micro-scope was destroyed.
I see. Your use of "shadow" implies later "unshadowing", which can only happen with scope. Mine doesn't, I just associate "shadow" with rebinding. I think your usage is more accurate. Especially in Python, which has a much flatter (and more formalized) use of scopes than, say, Lisp. Thank you for your explanation, it helped (me, anyway).
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Chris Angelico wrote:
How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same?
Back when list comprehensions were changed to not leak the variable, it was apparently considered too hard to be worth the effort, since we ended up with the nested function implementation. -- Greg
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, Jun 11, 2015 at 8:16 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Chris Angelico wrote:
How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same?
Back when list comprehensions were changed to not leak the variable, it was apparently considered too hard to be worth the effort, since we ended up with the nested function implementation.
Yeah. I now have a brutal hack that does exactly that, so I'm fully expecting someone to point out "Uhh, this isn't going to work because...". ChrisA
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 10 June 2015 at 10:54, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
I'd actually rather see this implemented the other way around: instead of turning this into a function call, actually have a real concept of nested scoping. Nested functions imply changes to tracebacks and such, which scoping doesn't require.
How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same?
I tried to do this when working with Georg Brandl to implement the Python 3 change to hide the iteration variable in comprehensions and generator expressions, and I eventually gave up and used an implicit local function definition: https://mail.python.org/pipermail/python-3000/2007-March/006017.html This earlier post from just before we started working on that covers some of the approaches I tried, as well as noting why this problem is much harder than it might first seem: https://mail.python.org/pipermail/python-3000/2006-December/005207.html One of the other benefits that I don't believe came up in either of those threads is that using real frames for implicit scoping means that *other tools* already know how to cope with it - pdb, gdb, inspect, dis, traceback, etc, are all able to deal with what's going on. If you introduce a new *kind* of scope, rather than just implicitly using another level of our *existing* scoping rules, then there's a whole constellation of tools (including other interpreter implementations) that will need adjusting to model an entirely new semantic concept, rather than another instance of an existing concept. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 11 June 2015 at 11:16, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 10 June 2015 at 10:54, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa.
I'd actually rather see this implemented the other way around: instead of turning this into a function call, actually have a real concept of nested scoping. Nested functions imply changes to tracebacks and such, which scoping doesn't require.
How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same?
I tried to do this when working with Georg Brandl to implement the Python 3 change to hide the iteration variable in comprehensions and generator expressions, and I eventually gave up and used an implicit local function definition: https://mail.python.org/pipermail/python-3000/2007-March/006017.html
Re-reading that post, I found this: https://mail.python.org/pipermail/python-3000/2007-March/006085.html I don't think anyone has yet tried speeding up simple function level cases at the peephole optimiser stage of the code generation pipeline (at module and class level, the nested function is already often a speed increase due to the use of optimised local variable access in the implicitly created function scope). However, I'm not sure our pattern matching is really up to the task of detecting this at bytecode generation time - doing something about in a JIT-compiled runtime like PyPy, Numba or Pyston might be more feasible. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 10 June 2015 at 10:20, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 9, 2015, at 16:46, Nick Coghlan <ncoghlan@gmail.com> wrote:
However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases.
I don't think here's any consistent way to define "containing expression" that makes any sense for while/if statements.
Sure there is. The gist of the basic "no leak" behaviour could be something like: 1. Any expression containing a named subexpression would automatically be converted to a lambda expression that is defined and called inline (expressions that already implicitly define their own scope, specifically comprehensions and generation expressions, would terminate the search for the "containing expression" node and allow this step to be skipped). 2. Any name references from within the expression that are not references to named subexpressions or comprehension iteration variables would be converted to parameter names for the implicitly defined lambda expression, and thus resolved in the containing scope rather than the nested scope. In that basic mode, the only thing made available from the implicitly created scope would be the result of the lambda expression. Something like: x = (250 as a)*a + b would be equivalent to: x= (lambda b: ((250 as a)*a + b))(b) if/elif/while clauses would define the behaviour of their conditional expressions slightly differently: for those, the values of any named subexpressions would also be passed back out, allowing them to be bound appropriately in the outer scope (requiring compatibility with class and module namespaces means it wouldn't be possible to use cell references here). Whether there should be a separate "bindlocal" statement for lifting named subexpressions out of an expression and binding them all locally would be an interesting question - I can't think of a good *use case* for that, but it would be a good hook for explaining the difference between the default behaviour of named subexpressions and the variant used in if/elif/while conditional expressions.
But "containing _statement_", that's easy.
No, it's not, because statements already contain name binding operations that persist beyond the scope of the statement. In addition to actual assignment statements, there are also for loops, with statements, class definitions and function definitions. Having the presence of a named subexpression magically change the scope of the statement level name binding operations wouldn't be acceptable, and having some name bindings propagate but not others gets very tricky in the general case. (PEP's 403 and 3150 go into some of the complexities) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 June 2015 at 21:24, Andrew Barnert <abarnert@yahoo.com> wrote:
Now you really _are_ reinventing let. A let expression like this:
x = let b=a.b in (b if b else a.c)
... is effectively just syntactic sugar for the lambda above.
Sure, I've thought a *lot* about adding let-type syntax - hence PEP's 403 (@in) and 3150 (given) for a couple of variations on statement level local variables. The problem with a let expression is that you still end up having to jumble up the order of things, just as you do with the trick of defining and calling a function, rather than being able to just name the subexpression on first execution and refer back to it by name later rather than repeating the calculation. Thus a let expression doesn't actually help all that much with improving the flow of reading or writing code - you still have the step of pulling the subexpression out and declaring both its name and value first, before proceeding on with the value of the calculation. That's not only annoying when writing, but also increases the cognitive load when reading, since the subexpressions are introduced in a context free fashion. When the named subexpressions are inlined, they work more like the way pronouns in English work: When the (named subexpressions as they) are inlined, they work more like the way pronouns in English work. It's a matter of setting up a subexpression for a subsequent backreference, rather than pulling it out into a truly independent step.
And it's a lot more natural and easy to reason about than letting b escape one step out to the conditional expression but not any farther. (Or to the rest of the complete containing expression? Or the statement? What does "x[(a.b as b)] = b" mean, for example? Or "x[(b if (a.b as b) else a.c) + (b if (d.b as b) else d.c)]"? Or "x[(b if (a.b as b) else a.c) + b]"?)
Exactly, that's the main problem with named subexpressions - if you let them *always* leak, you get some very confusing consequences, and if you *never* let them leak, than you don't address the if statement and while loop use cases. So to make them work as desired, you have to say they "sometimes" leak, and then define what that means in a comprehensible way. One possible way to do that would be to say that they *never* leak by default (i.e. using a named subexpression always causes the expression containing them to be executed in its own scope), and then introduce some form of special casing into if statements and while loops to implicitly extract named subexpressions.
As a side note, the initial proposal here was to improve performance by not repeating the a.b lookup; I don't think adding an implicit comprehension-like function definition and call will be faster than a getattr except in very uncommon cases. However, I think there are reasonable cases where it's more about correctness than performance (e.g., the real expression you want to avoid evaluating twice is next(spam) or f.readline(), not a.b), so I'm not too concerned there. Also, I'm pretty sure a JIT could effectively inline a function definition plus call more easily than it could CSE an expression that's hard to prove is static.
Yes, I'm not particularly interested in speed here - I'm personally interested in maintainability and expressiveness. (That's also why I consider this a very low priority project for me personally, as it's very, very hard to make a programming language easier to use by *adding* concepts to it. You really want to be giving already emergent patterns names and syntactic sugar, since you're then replacing learning a pattern that someone would have eventually had to learn anyway with learning the dedicated syntax). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jun 8, 2015, at 05:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
The problem with a let expression is that you still end up having to jumble up the order of things, just as you do with the trick of defining and calling a function, rather than being able to just name the subexpression on first execution and refer back to it by name later rather than repeating the calculation.
But notice that in two of your three use cases--and, significantly, the ones that are expressions--the place of first execution comes lexically _after_ the reference, so in normal reading order, you're referring _forward_ to it by name. He can front a clause without swapping the pronoun and its referent if Nick intends that special emphasis, but otherwise he wouldn't do that in English. That's a valid English sentence, but you have to think for a second to parse it, and then think again to guess what the odd emphasis is supposed to connote. Sometimes you actually do want that odd emphasis (it seems like a major point of your given proposal), but that's not the case here. It's the temporary name "b" that's unimportant, not its definition; the only reason you need the name at all is to avoid evaluating "a.b" twice. So having it come halfway through the expression is a little weird. Of course the same thing does happen in comprehensions, but (a) those are one of the few things in Python that are intended to read as much like math as like English, and (b) it almost always _is_ the expression rather than the loop variable that's the interesting part of a comprehension; that isn't generally true for a conditional.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 8 June 2015 at 23:21, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 8, 2015, at 05:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
The problem with a let expression is that you still end up having to jumble up the order of things, just as you do with the trick of defining and calling a function, rather than being able to just name the subexpression on first execution and refer back to it by name later rather than repeating the calculation.
But notice that in two of your three use cases--and, significantly, the ones that are expressions--the place of first execution comes lexically _after_ the reference, so in normal reading order, you're referring _forward_ to it by name.
Right, but as you note later, that jumping around in execution order is inherent in the way conditional expressions and comprehensions are constructed, and the named subexpressions track execution order rather than lexical order. It's also worth noting that the comprehension case causes the same problem for a let expression that "pull it out to a separate statement" does for while loops: # This works x = a.b if x: # use x # This doesn't x = a.b while x: # use x And similarly: # This could work x = (let b = a.b in (b if b else a.c)) # This can't be made to work x = (let b = a.b in (b for a in iterable if b) By contrast, these would both be possible: x = b if (a.b as b) else a.c x = (b for a in iterable if (a.b as b)) If it's accepted that letting subexpressions of binary, ternary and quaternary expressions refer to each other is a desirable design goal, then a new scope definition expression can't handle that requirement - cross-references require a syntax that can be interleaved with the existing constructs and track their execution flow, rather than a syntax that wraps them in a new larger expression.
He can front a clause without swapping the pronoun and its referent if Nick intends that special emphasis, but otherwise he wouldn't do that in English. That's a valid English sentence, but you have to think for a second to parse it, and then think again to guess what the odd emphasis is supposed to connote.
Yeah, I didn't adequately think through the way the out-of-order execution weakened the pronoun-and-back-reference analogy.
Sometimes you actually do want that odd emphasis (it seems like a major point of your given proposal),
It's just a consequence of tracking execution order rather than lexical order. The *reason* for needing to track execution order is because it's the only way to handle loops properly (by rebinding the name to a new value on each iteration). It's also possible to get a conditional expression to use a back reference instead of a forward reference by inverting the check: x = a.c if not (a.b as b) else b Or by using the existing "pull the subexpression out to a separate statement" trick: b = a.b x = b if b else a.c You'd never be *forced* to use a forward reference if you felt it made the code less readable. The forward reference would be mandatory in comprehensions, but that's already at least somewhat familiar due to the behaviour of the iteration variable.
but that's not the case here. It's the temporary name "b" that's unimportant, not its definition; the only reason you need the name at all is to avoid evaluating "a.b" twice. So having it come halfway through the expression is a little weird.
I'd consider elif clauses, while loops, comprehensions and generator expressions to be the most useful cases - they're all situations where pulling the subexpression out to a preceding assignment statement doesn't work due to the conditional execution of the clause (elif) or the repeated execution (while loops, comprehensions, generator expressions). For other cases, the semantics would need to be clearly *defined* in any real proposal, but I would expect a preceding explicit assignment statement to be clearer most of the time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (13)
-
Andrew Barnert
-
Chris Angelico
-
Cory Beutler
-
Ethan Furman
-
Greg Ewing
-
Joonas Liik
-
Nick Coghlan
-
random832@fastmail.us
-
Ron Adam
-
Ryan Gonzalez
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Wolfram Hinderer