PEP-xxx: Unification of for statement and list-comp syntax
Hi all! The following PEP tries to make the case for a slight unification of for statement and list comprehension syntax. Comments appreciated, including on the sample implementation. === PEP: xxx Title: Unification of for-statement and list-comprehension syntax Version: $Revision$ Last-Modified: $Date$ Author: Heiko Wundram <me@modelnine.org> Status: Active Type: Standards Track Content-Type: text/plain Created: 21-May-2006 Post-History: 21-May-2006 17:00 GMT+0200 Abstract When list comprehensions were introduced, they added the ability to add conditions which are tested before the expression which is associated with the list comprehension is evaluated. This is often used to create new lists which consist only of those items of the original list which match the specified condition(s). For example: [node for node in tree if node.haschildren()] will create a new list which only contains those items of the original list (tree) whose items match the havechildren() condition. Generator expressions work similarily. With a standard for-loop, this corresponds to adding a continue statement testing for the negated expression at the beginning of the loop body. As I've noticed that I find myself typing the latter quite often in code I write, it would only be sensible to add the corresponding syntax for the for statement: for node in tree if node.haschildren(): <do something with node> as syntactic sugar for: for node in tree: if not node.haschildren(): continue <do something with node> There are several other methods (including generator-expressions or list-comprehensions, the itertools module, or the builtin filter function) to achieve this same goal, but all of them make the code longer and harder to understand and/or require more memory, because of the generation of an intermediate list. Implementation details The implementation of this feature requires changes to the Python grammar, to allow for a variable number of 'if'-expressions before the colon of a 'for'-statement: for_stmt: 'for' exprlist 'in' testlist_safe ('if' old_test)* ':' suite ['else' ':' suite] This change would replace testlist with testlist_safe as the 'in'-expression of a for statement, in line with the definition of list comprehensions in the Python grammar. Each of the 'if'-expressions is evaluated in turn (if present), until one is found False, in which case the 'for'-statement restarts at the next item from the generator of the 'in'-expression immediately (the tests are thus short-circuting), or until all are found to be True (or there are no tests), in which case the suite body is executed. The behaviour of the 'else'-suite is unchanged. The intermediate code that is generated is modelled after the byte-code that is generated for list comprehensions: def f(): for x in range(10) if x == 1: print x would generate: 2 0 SETUP_LOOP 42 (to 45) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (10) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 28 (to 44) 16 STORE_FAST 0 (x) 19 LOAD_FAST 0 (x) 22 LOAD_CONST 2 (1) 25 COMPARE_OP 2 (==) 28 JUMP_IF_FALSE 9 (to 40) 31 POP_TOP 3 32 LOAD_FAST 0 (x) 35 PRINT_ITEM 36 PRINT_NEWLINE 37 JUMP_ABSOLUTE 13 >> 40 POP_TOP 41 JUMP_ABSOLUTE 13 >> 44 POP_BLOCK >> 45 LOAD_CONST 0 (None) 48 RETURN_VALUE where all tests are inserted immediately at the beginning of the loop body, and jump to a new block if found to be false which pops the comparision from the stack and jumps back to the beginning of the loop to fetch the next item. Implementation issues The changes are backwards-compatible, as they don't change the default behaviour of the 'for'-loop. Also, as the changes that this PEP proposes don't change the byte-code structure of the interpreter, old byte-code continues to run on Python with this addition unchanged. Implementation A sample implementation (with updates to the grammar documentation and a small test case) is available at: http://sourceforge.net/tracker/index.php?func=detail&aid=1492509&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. === --- Heiko.
-1. The contraction just makes it easier to miss the logic. Also, it would be a parsing conflict for the new conditional expressions (x if T else y). This was proposed and rejected before. --Guido On 5/21/06, Heiko Wundram <me+python-dev@modelnine.org> wrote:
Hi all!
The following PEP tries to make the case for a slight unification of for statement and list comprehension syntax.
Comments appreciated, including on the sample implementation.
=== PEP: xxx Title: Unification of for-statement and list-comprehension syntax Version: $Revision$ Last-Modified: $Date$ Author: Heiko Wundram <me@modelnine.org> Status: Active Type: Standards Track Content-Type: text/plain Created: 21-May-2006 Post-History: 21-May-2006 17:00 GMT+0200
Abstract
When list comprehensions were introduced, they added the ability to add conditions which are tested before the expression which is associated with the list comprehension is evaluated. This is often used to create new lists which consist only of those items of the original list which match the specified condition(s). For example:
[node for node in tree if node.haschildren()]
will create a new list which only contains those items of the original list (tree) whose items match the havechildren() condition. Generator expressions work similarily.
With a standard for-loop, this corresponds to adding a continue statement testing for the negated expression at the beginning of the loop body.
As I've noticed that I find myself typing the latter quite often in code I write, it would only be sensible to add the corresponding syntax for the for statement:
for node in tree if node.haschildren(): <do something with node>
as syntactic sugar for:
for node in tree: if not node.haschildren(): continue <do something with node>
There are several other methods (including generator-expressions or list-comprehensions, the itertools module, or the builtin filter function) to achieve this same goal, but all of them make the code longer and harder to understand and/or require more memory, because of the generation of an intermediate list.
Implementation details
The implementation of this feature requires changes to the Python grammar, to allow for a variable number of 'if'-expressions before the colon of a 'for'-statement:
for_stmt: 'for' exprlist 'in' testlist_safe ('if' old_test)* ':' suite ['else' ':' suite]
This change would replace testlist with testlist_safe as the 'in'-expression of a for statement, in line with the definition of list comprehensions in the Python grammar.
Each of the 'if'-expressions is evaluated in turn (if present), until one is found False, in which case the 'for'-statement restarts at the next item from the generator of the 'in'-expression immediately (the tests are thus short-circuting), or until all are found to be True (or there are no tests), in which case the suite body is executed. The behaviour of the 'else'-suite is unchanged.
The intermediate code that is generated is modelled after the byte-code that is generated for list comprehensions:
def f(): for x in range(10) if x == 1: print x
would generate:
2 0 SETUP_LOOP 42 (to 45) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (10) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 28 (to 44) 16 STORE_FAST 0 (x) 19 LOAD_FAST 0 (x) 22 LOAD_CONST 2 (1) 25 COMPARE_OP 2 (==) 28 JUMP_IF_FALSE 9 (to 40) 31 POP_TOP
3 32 LOAD_FAST 0 (x) 35 PRINT_ITEM 36 PRINT_NEWLINE 37 JUMP_ABSOLUTE 13 >> 40 POP_TOP 41 JUMP_ABSOLUTE 13 >> 44 POP_BLOCK >> 45 LOAD_CONST 0 (None) 48 RETURN_VALUE
where all tests are inserted immediately at the beginning of the loop body, and jump to a new block if found to be false which pops the comparision from the stack and jumps back to the beginning of the loop to fetch the next item.
Implementation issues
The changes are backwards-compatible, as they don't change the default behaviour of the 'for'-loop. Also, as the changes that this PEP proposes don't change the byte-code structure of the interpreter, old byte-code continues to run on Python with this addition unchanged.
Implementation
A sample implementation (with updates to the grammar documentation and a small test case) is available at:
http://sourceforge.net/tracker/index.php?func=detail&aid=1492509&group_id=5470&atid=305470
Copyright
This document has been placed in the public domain. ===
--- Heiko. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Am Sonntag 21 Mai 2006 17:38 schrieb Guido van Rossum:
-1. The contraction just makes it easier to miss the logic.
I actually don't think so, because it's pretty synonymous to what 'if' means for list comprehensions which use the same keywords (that's why I called it "unification of ... syntax"), but I guess it's superfluous to discuss this if you're -1.
Also, it would be a parsing conflict for the new conditional expressions (x if T else y).
It isn't, if you change the grammar to use testlist_safe as the 'in'-expression, just as is used for list comprehensions. As I've said in the PEP, I've created a patch that implements this, and Python's test suite passes cleanly (except for a little buglet in test_dis, which stems from the fact that the generated byte-code for a for-loop is slightly altered by this patch).
This was proposed and rejected before.
I haven't seen this proposed before (at least not in PEP form, or with a working implementation against the current trunk, or just in some form of mail on python-dev), so that's why I posted this. But, if I've really repeated things that were proposed before, feel free to ignore this. --- Heiko.
On 5/21/06, Heiko Wundram <me+python-dev@modelnine.org> wrote:
Am Sonntag 21 Mai 2006 17:38 schrieb Guido van Rossum:
This was proposed and rejected before.
I haven't seen this proposed before (at least not in PEP form, or with a working implementation against the current trunk, or just in some form of mail on python-dev), so that's why I posted this. But, if I've really repeated things that were proposed before, feel free to ignore this.
While this has been proposed before, I'd like to thank you for putting together a full PEP and a working implementaiton. I think you should still submit the PEP, if for nothing else so that when the issue comes up again, we can point to the PEP and explain that Guido's already rejected it. Steve -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy
Am Sonntag 21 Mai 2006 18:08 schrieb Steven Bethard:
While this has been proposed before, I'd like to thank you for putting together a full PEP and a working implementaiton. I think you should still submit the PEP, if for nothing else so that when the issue comes up again, we can point to the PEP and explain that Guido's already rejected it.
I'll submit the PEP tomorrow, after I've reworded it slightly. I'll also add Guido's rejection notice to the PEP, so that a number can be assigned to it properly and it can be added to the PEP list on python.org. --- Heiko.
Steven Bethard wrote:
On 5/21/06, Heiko Wundram <me+python-dev@modelnine.org> wrote:
Am Sonntag 21 Mai 2006 17:38 schrieb Guido van Rossum:
This was proposed and rejected before.
I haven't seen this proposed before (at least not in PEP form, or with a working implementation against the current trunk, or just in some form of mail on python-dev), so that's why I posted this. But, if I've really repeated things that were proposed before, feel free to ignore this.
While this has been proposed before, I'd like to thank you for putting together a full PEP and a working implementaiton. I think you should still submit the PEP, if for nothing else so that when the issue comes up again, we can point to the PEP and explain that Guido's already rejected it.
Steve
I'd like to second Steve's point. I'm impressed by the thoroughness and organization of the PEP. As a general guideline, I've noticed that proposals which are purely syntactic sugar are unlikely to be accepted unless there is some additional benefit other than just compression of source code. -- Talin
Am Sonntag 21 Mai 2006 22:11 schrieb Talin:
As a general guideline, I've noticed that proposals which are purely syntactic sugar are unlikely to be accepted unless there is some additional benefit other than just compression of source code.
I know about this, but generally, I find there's more to this "syntactic sugar" than just source code compression. 1) It unifies the syntax for list comprehensions and for loops, which use the same keywords (and are thus identified easily). Basically, you can then think of a for-loop as a list comprehension which evaluates a suite instead of an expression, and doesn't store the evaluated items, or vice versa, which at the moment you can't, because of the difference in setup. 2) Just as I've replied to Terry J. Reed, if you find list comprehensions easy to read, you're also bound to be able to understand what "for <expr> in <expr> if <expr>:" does, at least AFAICT. 3) Generally, indentation, as Terry J. Reed suggested, isn't always good. If the body of the loop is more than a few lines long (which happens more often than not in my code), extra indentation is bound to confuse me. That's why I use "if not <expr>: continue" at the moment, so that I don't need extra indentation. If I could just append the condition to the loop to get it out of the way (just as you do with a list comprehension), I save the obnoxious "if not <expr>: continue" (which destroys my read flow of the body somewhat too), and still get the same behaviour without any extra (errorprone, in my eyes) indentation. Anyway, it's fine if Guido says this is -1. I don't need to discuss this, as I guess it's a pretty futile attempt to get people to consider this against Guido's will. I just found this to be something I'd personally longed for for a long time, and had time free today, so I thought I'd just implement it. ;-) And, additionally, nobody's keeping me from making my own Python tree where I can keep this patch for my very personal scripts where I don't need to have interoperability. This is open source, isn't it? ;-) --- Heiko.
Heiko Wundram <me+python-dev@modelnine.org> wrote:
Am Sonntag 21 Mai 2006 22:11 schrieb Talin:
As a general guideline, I've noticed that proposals which are purely syntactic sugar are unlikely to be accepted unless there is some additional benefit other than just compression of source code.
I know about this, but generally, I find there's more to this "syntactic sugar" than just source code compression.
1) It unifies the syntax for list comprehensions and for loops, which use the
No, it /partially unifies/ list comprehensions and for loops. To actually unify them, you would need to allow for arbitrarily nested fors and ifs... for ... in ... [if ...] [for ... in ... [if ...]]*: If I remember correctly, this was why it wasn't accepted before; because actual unification is ugly. Further, because quite literally the only difference is some indentation, a colon, and a line break, the vast majority of people would likely stick with "the one obvious way to do it", which has always worked.
2) Just as I've replied to Terry J. Reed, if you find list comprehensions easy to read, you're also bound to be able to understand what "for <expr> in <expr> if <expr>:" does, at least AFAICT.
Not everyone finds list comprehensions easy to read.
3) Generally, indentation, as Terry J. Reed suggested, isn't always good. If the body of the loop is more than a few lines long (which happens more often than not in my code), extra indentation is bound to confuse me. That's why I [snip]
I feel for you; I really do. I've done the same thing myself. However, I don't believe that it is a good practice, in general, and I don't think that syntax should support this special case.
And, additionally, nobody's keeping me from making my own Python tree where I can keep this patch for my very personal scripts where I don't need to have interoperability. This is open source, isn't it? ;-)
Do whatever you want. Good luck with keeping your patches current against whatever Python version you plan on using. - Josiah
Am Montag 22 Mai 2006 01:59 schrieb Josiah Carlson:
1) It unifies the syntax for list comprehensions and for loops, which use the
No, it /partially unifies/ list comprehensions and for loops. To actually unify them, you would need to allow for arbitrarily nested fors and ifs...
for ... in ... [if ...] [for ... in ... [if ...]]*:
If I remember correctly, this was why it wasn't accepted before; because actual unification is ugly.
This syntax is ugly, that's why the PEP doesn't try to make a case for this. But, one level equivalence to list comprehensions isn't bad, again, at least in my eyes.
2) Just as I've replied to Terry J. Reed, if you find list comprehensions easy to read, you're also bound to be able to understand what "for <expr> in <expr> if <expr>:" does, at least AFAICT.
Not everyone finds list comprehensions easy to read.
Why has Python added list-comprehensions, then? (or at least, why has Python added the 'if'-expression to list-comprehensions if they're hard to read? filter/itertools/any of the proposed "workarounds" in the PEP would also work for list-comprehensions).
3) Generally, indentation, as Terry J. Reed suggested, isn't always good. If the body of the loop is more than a few lines long (which happens more often than not in my code), extra indentation is bound to confuse me. That's why I
[snip]
I feel for you; I really do. I've done the same thing myself. However, I don't believe that it is a good practice, in general, and I don't think that syntax should support this special case.
Why isn't this good practice? It's not always sensible to refactor loop code to call methods (to make the loop body shorter), and it's a pretty general case that you only want to iterate over part of a generator, not over the whole content. Because of this, list comprehensions grew the 'if'-clause. So: why doesn't the for-loop? --- Heiko.
Heiko Wundram wrote:
Am Montag 22 Mai 2006 01:59 schrieb Josiah Carlson:
Not everyone finds list comprehensions easy to read.
Why has Python added list-comprehensions, then? (or at least, why has Python added the 'if'-expression to list-comprehensions if they're hard to read?
LCs are useful because they're expressions rather than statements. Being expressions, they need if-clauses in order to be able to conditionally include items in the list. LCs with if-clauses don't *have* to be hard to read; you just need to lay them out on separate lines, as you would do when writing nested statements. With a for-statement, there is no need for if-clauses, since you can use a nested if-statement to get the same effect. The only possible reason for wanting an if-clause would be so that you can write it on the same line, which reduces readability for no corresponding benefit. -- Greg
Heiko Wundram <me+python-dev@modelnine.org> wrote:
Why isn't this good practice? It's not always sensible to refactor loop code to call methods (to make the loop body shorter), and it's a pretty general case that you only want to iterate over part of a generator, not over the whole content. Because of this, list comprehensions grew the 'if'-clause. So: why doesn't the for-loop?
List comprehensions grew an if clause because they are expressions that were supposed to replace one of the most the most common idioms in Python: x = [] for i in ...: if ...: x.append(...) And you know what? They've done that very well. Seemingly so well that you are asking for the list comprehension syntax to make it back into for loops. So far, you have failed to convince me (and most others it looks like) that it is a good idea, you have re-hashed the same arguments you made in the PEP, and Guido chimed in as the first message to say: "-1. The contraction just makes it easier to miss the logic." "This was proposed and rejected before." - Guido If you want some advice: let it drop. - Josiah
Heiko Wundram wrote:
2) Just as I've replied to Terry J. Reed, if you find list comprehensions easy to read, you're also bound to be able to understand what "for <expr> in <expr> if <expr>:" does, at least AFAICT.
I tend to write non-trivial LCs on multiple lines, e.g. l = [foo(x) for x in stuff if something_about(x)] for the very reason that it makes them easier to read. -- Greg
Am Montag 22 Mai 2006 02:46 schrieben Sie:
Heiko Wundram wrote:
2) Just as I've replied to Terry J. Reed, if you find list comprehensions easy to read, you're also bound to be able to understand what "for <expr> in <expr> if <expr>:" does, at least AFAICT.
I tend to write non-trivial LCs on multiple lines, e.g.
l = [foo(x) for x in stuff if something_about(x)]
for the very reason that it makes them easier to read.
You can also do the same here (by using normal bracketing): for <expr> in (<some> <non-trivial> <stuff>) if (<one expr> and <two expr> and <three expr>): foo(x) --- Heiko.
Heiko Wundram wrote:
You can also do the same here (by using normal bracketing):
for <expr> in (<some> <non-trivial> <stuff>) if (<one expr> and <two expr> and <three expr>):
So you want to be able to write the if in-line, and then format it so that it's no longer in-line? The point of doing that eludes me... -- Greg
"Heiko Wundram" <me+python-dev@modelnine.org> wrote in message news:200605211710.51720.me+python-dev@modelnine.org...
As I've noticed that I find myself typing the latter quite often in code I write, it would only be sensible to add the corresponding syntax for the for statement:
for node in tree if node.haschildren(): <do something with node>
as syntactic sugar for:
for node in tree: if not node.haschildren(): continue <do something with node>
Isn't this the same as for node in tree: if node.haschildren(): <do something with node> so that all you would save is ':\n' and the extra indents? tjr
Am Sonntag 21 Mai 2006 20:31 schrieb Terry Reedy:
Isn't this the same as
for node in tree: if node.haschildren(): <do something with node>
so that all you would save is ':\n' and the extra indents?
Saving an extra indent and the ':\n' makes it quite a bit easier for me to read and understand in the long run. I find: for x in range(10): if not x % 2: print "Another even number:", print x ... print "Finished processing the even number" a lot harder to read and understand than: for x in range(10): if x % 2: continue print "Another even number:", print x ... print "Finished processing the even number" because with the latter it's much more obvious for me how the flow of control in the indented block is structured (and to what part of the loop the condition applies, namely to the whole loop body). That's why I'd prefer the "combination" of the two as: for x in range(10) if not x % 2: print "Another even number:", print x ... print "Finished processing the even number" because the logic is immediately obvious (at least if you find the logic to be obvious when reading list comprehensions, simply because the format is similar), and there's no extra indentation (as in your example), which makes the block, as I said, a lot easier to parse for my brain. But, probably, that's personal preference. --- Heiko.
Heiko Wundram wrote:
for node in tree: if not node.haschildren(): continue <do something with node>
Er, you do realise that can be written more straightforwardly as for node in tree: if node.haschildren(): <do something with node> -- Greg
Am Montag 22 Mai 2006 02:22 schrieb Greg Ewing:
Heiko Wundram wrote:
for node in tree: if not node.haschildren(): continue <do something with node>
Er, you do realise that can be written more straightforwardly as
for node in tree: if node.haschildren(): <do something with node>
Yes, of course. Read my replies to Terry J. Reed, to Josiah Carlton, to Talin, to see why I chose to compare it to the 'continue' syntax. --- Heiko.
Heiko Wundram wrote:
Yes, of course. Read my replies to Terry J. Reed, to Josiah Carlton, to Talin, to see why I chose to compare it to the 'continue' syntax.
I saw them. Your brain must be wired very differently to mine, because I find loops with a continue in them harder to follow than ones without -- exactly the opposite of what you seem to prefer. Also, I don't find an extra indendation level to be a problem at all, unless the code under it is more than a screen long -- in which case you've got big readability problems already. -- Greg
I saw them. Your brain must be wired very differently to mine, because I find loops with a continue in them harder to follow than ones without -- exactly the opposite of what you seem to prefer.
Delurking for no particular reason: For what it's worth, I also favor the continue syntax Heiko compared his code against. Without it, you have to scroll to the end of the loop to know whether there is an else clause; it also allows you to have multiple conditions expressed up-front without a lot of indentation. In general, I favor the idea that the main flow of computation should have the least indentation. I also am +1 on this PEP, even if it is doomed, as I've often longed to use list- comprehension like syntax in a for loop; to me it is mentally taxing to remember which syntax is supported where. Just thought I'd throw Heiko some support. :) Niko
Niko Matsakis <niko@alum.mit.edu> wrote:
I saw them. Your brain must be wired very differently to mine, because I find loops with a continue in them harder to follow than ones without -- exactly the opposite of what you seem to prefer.
Delurking for no particular reason:
For what it's worth, I also favor the continue syntax Heiko compared his code against. Without it, you have to scroll to the end of the loop to know whether there is an else clause; it also allows you to
One could also consider documenting the fact that there is no else clause... for i in ...: if ...: #no else clause ... - Josiah
Niko Matsakis <niko@alum.mit.edu> wrote:
For what it's worth, I also favor the continue syntax Heiko compared his code against. Without it, you have to scroll to the end of the loop to know whether there is an else clause;
Only if the code doesn't fit on one screen, which it should. -- Greg
participants (8)
-
Greg Ewing
-
Guido van Rossum
-
Heiko Wundram
-
Josiah Carlson
-
Niko Matsakis
-
Steven Bethard
-
Talin
-
Terry Reedy