[Python-ideas] Is this PEP-able? for X in ListY while conditionZ:

Andrew Barnert abarnert at yahoo.com
Thu Jun 27 07:13:28 CEST 2013


From: Ron Adam <ron3200 at gmail.com>

Sent: Wednesday, June 26, 2013 7:44 PM


> The 'if' after the iterator handles the selection of values.  Given 
> that, it makes sense to allow flow control statements after the last if, but not 
> expressions effecting the values.

First, why restrict break to the last if clause? Saying it can't go in a for clause or in the controlled expression I understand, but… currently, there's nothing special about the last clause (or any other clause, except for the first, in genexps only), and it seems odd to change that.


But meanwhile, your description made me realize exactly what feels weird about the whole idea.

A comprehension doesn't have flow control statements. It has _clauses_. Obviously they're related things; there's a mapping that's rigorously definable and intuitively clear. But they don't have a colon and a suite. (They don't even take the same set of expressions, but that's not important here.)

Intuitively, that's because the whole point of a comprehension is that the rest of the comprehension, especially including the core expression, is the suite. Adding a suite breaks that. 

Forget that break means the flow control is no longer always downward-local. It also completely turns the meaning of the controlling if statement around. Instead of "if this: the rest of the expression" it's "if this: the suite, else: the rest of the expression".

On top of that, the clauses map to nested statements. You can't nest break statements.

All of this becomes more obvious if you try to describe things rigorously. Let's rewrite http://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries to explain the semantics of break.

In order to make the issues more obvious, I'm going to ignore continue, use your "only at the end" restriction, and not add a rule saying that the break has to be controlled by an if. (If you want to write [x for subiter in iter for x in subiter break], fine, you get back []—this actually won't be interpretable for a genexp with only one clause, but let that be an error because it translates to an error.) I don't think changing any of those would solve any of the problems, they'd just make things more complicated and harder to see.

First, add the new syntax. There are two obvious ways to do it:

    comp_if ::=  "if" comp_if_body expression_nocond ["break" | comp_iter]

Or:

    comp_iter ::= comp_for | comp_if | comp_break
    comp_break ::= "break"

If you keep the existing semantics, this:


    x = [value for value in iter if pred(value) break]

… maps to:

    x = []
    for value in iter:
        if pred(value):
            break
                x.append(value)

Or maybe the append is a sibling of the break rather than a child? Either way, it makes no sense. It has to map to something like this:

    x = []
    for value in iter:
        if pred(value):
            break
        x.append(value)

Which means the clauses are no longer nested, so the simple explanation no longer works. Instead, you need something more convoluted, like:

The comprehension consists of a single expression followed by at least one for clause, zero or more for or if clauses, and zero or one break clause. In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and the break clause, if present, a simple statement, and evaluating the expression to produce an element each time the innermost block that does not directly contain a break is entered and a block directly containing a break, if any, is not entered.

I think you can make it a little less convoluted by using the concept of exiting a block (which is well-defined, thanks to with statements), but I think that just makes it even less intuitive:

The comprehension consists of a single expression followed by at least one for clause, zero or more for or if clauses, and zero or one break clause. In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and the break clause, if present, a simple statement, and evaluating the expression to produce an element each time the innermost block is exited without executing a break.

There's no way to explain this simply, because it's no longer simple.


As a side note:


>     [x for x in range(n) if x<limit pass else continue]


This is already hard to distinguish from a ternary expression, even without the other stuff you bring in later. Compare:

>     [x for x in range(n) if n<10 else range(10)]

I don't think that's valid (although I'm not sure without testing). And I'm not sure if it would be as hard for the parser to deal with as it is for a human. But really, if a human can't parse it, it's meaningless.



More information about the Python-ideas mailing list