Generator syntax hooks?

The generator syntax, (x for x in i if c), currently always creates a new generator. I find this quite inefficient: {x for x in integers if 1000 <= x < 1000000} # never completes, because it's trying to iterate over all integers What if, somehow, object `integers` could hook the generator and produce the equivalent of {x for x in range(1000, 1000000)}, which does complete? What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar for (x for x in range(1000, 1000000))? (I like mathy syntax. Do you like mathy syntax?)

On Tue, Aug 8, 2017 at 5:30 AM, Soni L. <fakedme+py@gmail.com> wrote:
I don't. I prefer to stick with the syntax we already have. The alternative is a more verbose way to identify a range, plus you need a new global "integers" which implies that you could iterate over "reals" the same way (after all, mathematics doesn't mind you working with a subset of reals the same way you'd work with a subset of ints). And good luck iterating over all the reals. :) ChrisA

On Mon, Aug 7, 2017 at 4:14 PM, Chris Angelico <rosuav@gmail.com> wrote:
that's what it's for -- I'm confused as to what the problem is.
this is a set comprehension -- but what is "integers"? is it a generator? in which case, it should take an argument so it knows when to end. Or if it's really that symple, that's what range() is for. However, similarly, I find that sometimes I want to iterate over a slice of a sequence, but do'nt want to actually make the slice first. So there is itertools.islice() If "integers" is a sequence: {x for x in integers[1000:10000]} makes an unneeded copy of that slice. {x for x in itertools.islice(integers, 1000, 10000)} will iterate on the fly, and not make any extra copies. It would be nice to have an easier access to an "slice iterator" though -- one of these days I may write up a proposal for that. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 8 August 2017 at 09:06, Chris Barker <chris.barker@noaa.gov> wrote:
It would be nice to have an easier access to an "slice iterator" though -- one of these days I may write up a proposal for that.
An idea I've occasionally toyed with [1] is some kind of "iterview" that wraps around an arbitrary iterable and produces lazy itertools based results rather than immediate views or copies. However, my experience is also that folks are *really* accustomed to syntactic operations on containers producing either full live views (e.g. memoryview or numpy slices, range as a dynamically computed container), or actual copies (builtin container types). Having them produce consumable iterators instead then gets confusing due to the number of operations that will implicitly consume them (including simple "x in y" checks). The OP's proposal doesn't fit into that category though: rather it's asking about the case where we have an infinite iterator (e.g. itertools.count(0)), and want to drop items until they start meeting some condition (i.e. itertools.dropwhile) and then terminate the iterator as soon as another condition is no longer met (i.e. itertools.takewhile). Right now, getting the "terminate when false" behaviour requires the use of takewhile: {itertools.takewhile(lambda x: x < 1000000, itertools.count(1000)} In these cases, the standard generator expression syntax is an attractive nuisance because it *looks* right from a mathematical perspective, but hides an infinite loop: {x for x in itertools.count(0) if 1000 <= x < 1000000} The most credible proposal to address this that I've seen is to borrow the "while" keyword in its "if not x: break" interpretation to get: {x for x in itertools.count(0) if 1000 <= x while x < 1000000} which would be compiled as equivalent to: x = set() for x in itertools.count(0): if 1000 <= x: set.add(x) if not x < 1000000: break (and similarly for all of the other comprehension variants) There aren't any technical barriers I'm aware of to implementing that, with the main historical objection being that instead of the comprehension level while clause mapping to a while loop directly the way the for and if clauses map to their statement level counterparts, it would instead map to the conditional break in the expanded loop-and-a-half form: while True: if not condition: break While it's taken me a long time to come around to the idea, "Make subtle infinite loops in mathematical code easier to avoid" *is* a pretty compelling user-focused justification for incurring that extra complexity at the language design level. Cheers, Nick. [1] https://mail.python.org/pipermail/python-ideas/2010-April/006983.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't think that's what the OP meant. The original proposal seemed to assume that it would be somehow reasonable for the input ("integers" in the example) to be able to see and parse the condition in the generator expression ("1000 <= x < 100000" in the example, with "x" somehow known to be bound to the iteration value). That's at least what I think the remark "I like mathy syntax" referred to.
I haven't come around to this yet. It looks like it will make explaining comprehensions more complex, since the translation of "while X" into "if not X: break" feels less direct than the translations of "for x in xs" or "if pred(x)". (In particular, your proposal seems to require more experience with mentally translating loops and conditions into jumps -- most regulars of this forum do that for a living, but I doubt it's second nature for the OP.) -- --Guido van Rossum (python.org/~guido)

On 9 August 2017 at 15:38, Guido van Rossum <guido@python.org> wrote:
Right, I was separating the original request to make "{x for x in integers if 1000 <= x < 1000000}" work into the concrete proposal to make exactly *that* syntax work (which I don't think is feasible), and the slightly more general notion of offering a more math-like syntax that allows finite sets to be built from infinite iterators by defining a termination condition in addition to a filter condition.
Yeah, if we ever did add something like this, I suspect a translation using takewhile would potentially be easier for at least some users to understand than the one to a break condition: {x for x in itertools.count(0) if 1000 <= x while x < 1000000} <=> x = set() for x in itertools.count(0): if 1000 <= x: set.add(x) # If you've never used the loop-and-a-half idiom, it's # not obvious why "while <expr>" means "if not <expr>: break" if not x < 1000000: break is roughly {x for x in itertools.takewhile(itertools.count(0), lambda x: x < 1000000) if 1000 <= x} <=> x = set() for x in takewhile(itertools.count(0), lambda x: x < 1000000): if 1000 <= x: set.add(x) However, the break condition is the translation that would make sense at a language *implementation* level (and would hence be the one that determined the relative location of the while clause in the expression form). That discrepancy *still* sets off alarm bells for me (since it's a clear sign that "how people would think this works" and "how it would actually work" probably wouldn't match), I'm also conscious of the amount of syntactic noise that "takewhile" introduces vs the "while" keyword. The counter-argument (which remains valid even against my own change of heart) is that adding a new comprehension clause doesn't actually fix the "accidental infinite loop" problem: "{x for x in itertools.count(0) if 1000 <= x < 1000000}" will still loop forever, it would just have a nicer fix to get it to terminate (adding " while x" to turn the second filter condition into a termination condition). So while I'm +0 where I used to be a firm -1, it's still only a +0 :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2017-08-09 11:54 AM, Nick Coghlan wrote:
Ok. A concrete proposal would give a read-only 'filter' argument to the iterator somehow, which represents some form of simplified AST of the condition. So e.g. {x for x in integers if (lambda v: 1000 <= v < 1000000)(x)} would never complete, but {x for x in integers if 1000 <= x < 1000000} would. (But perhaps lambda objects should include an AST attribute... Having it for normal functions would introduce too much overhead tho, and then it would no longer be a simplified AST, but rather a complete python AST, which we don't want.)

On 10 August 2017 at 01:49, Soni L. <fakedme+py@gmail.com> wrote:
There have been a variety of different "thunking" proposals over the years, but they've all foundered on the question of what the *primitive* quoted form should look like, and how the thunks should subsequently be executed. For cases like this, where integration with Python's name resolution mechanism isn't actually required, folks have ended up just using strings, where the only downside is the fact that syntax highlighters and other static analysers don't know that the contents are supposed to be valid Python code. In a case like this, that might look like: {x for x in integers.build_set("1000 <= x < 1000000")} As with regexes, the cost of dynamically parsing such strings can then be amortised at runtime through the use of an appropriate caching strategy. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 10, 2017 at 12:39:34PM -0300, Soni L. wrote:
I don't understand what you mean by this. The syntax for lambda is (roughly): lambda parameter-list : expression The syntax for generators is (again, roughly): def name ( parameter-list ) : suite-containing-yield Obviously the generator suite can contain expressions, and both have a parameter-list. What shared syntax are you referring to, and how is it relevant? Or are you referring to generator expressions, rather than generators? ( expression for target in expression ... ) Obviously a Python expression is a Python expression, wherever it is, so a lambda can contain generator expressions, and generator expressions can contain lambdas... And what do you mean by "simplified AST" API? I'm afraid your comment is too abstract for me to understand. -- Steve

On 11 August 2017 at 01:39, Soni L. <fakedme+py@gmail.com> wrote:
We already do, via the "mode" argument to the compile builtin and to ast.parse: >>> ast.dump(ast.parse("1000 <= x < 1000000", mode="eval")) "Expression(body=Compare(left=Num(n=1000), ops=[LtE(), Lt()], comparators=[Name(id='x', ctx=Load()), Num(n=1000000)]))" >>> ast.parse("import sys", mode="eval") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.6/ast.py", line 35, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<unknown>", line 1 import sys ^ SyntaxError: invalid syntax It's a large part of the reason why passing strings around has so far qualified as "good enough" - providing dedicated syntax for it doesn't actually increase the language's expressiveness all that much, it just has the potential to make static analysis easier by eagerly rendering to an AST rather than having that be handled by the function receiving the argument. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8/9/2017 10:54 AM, Nick Coghlan wrote:
We already have three nice one liners for that, one of which you gave. x = set(filter(filter_condition, takewhile(continue_condition, source))) x = set(x for x in takewhile(continue_condition, source) if filter_condition) x = {x for x in takewhile(continue_condition, source) if filter_condition} Replace takewhile with islice(source, max) if the continue condition is (number seen < max). Add enumerate if the running count is needed otherwise. Terminating an infinite iterator and filtering the initial slice are different operations. The operations are easily composed as they are, in multiple ways. Trying to mix them together in one jumbled special syntax is a bad idea to me.
In other words, aside from other issues, you would have 'while' mean 'do...while' in this one special place. -1. -- Terry Jan Reedy

On 10 August 2017 at 00:54, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ugh, this discrepancy is worse than I thought, since the translation with that clause order is actually wrong (Terry mentioned this by pointing out that the proposed syntactic translation implemented "do...while" ordering). The takewhile example is also wrong, since it has the arguments in the wrong order. Fixing both of those issues gives the comparison: {x for x in itertools.count(0) while x < 1000000 if 1000 <= x} <=> x = set() for x in itertools.count(0): # If you've never used the loop-and-a-half idiom, it's # not obvious why "while <expr>" means "if <expr>: <loop body> else: break" if x < 1000000: if 1000 <= x: set.add(x) else: break is roughly: {x for x in itertools.takewhile(lambda x: x < 1000000, itertools.count(0)) if 1000 <= x} <=> x = set() for x in takewhile(lambda x: x < 1000000, itertools.count(0)): if 1000 <= x: set.add(x) And I think that gets me back to pretty much where I was the last time this came up: a while clause in comprehensions really only makes sense in combination with a while clause on for loops, where: for x in itertools.count(0) while x < 1000000: ... was roughly equivalent to: for x in itertools.count(0): if x < 1000000: ... else: <loop else clause, if any, still runs here> break (such that there's only one loop from the point of view of break/continue/else, but the loop may terminate based on either exhaustion of the underlying iterator *or* some specific condition becoming false) While I do think such a clause would be more readable for more people than the dropwhile/takewhile equivalents (especially when the latter end up needing to use lambda expressions), I'm still dubious that these cases come up often enough to justify the addition of a for-while loop as a composite construct (the old "dropwhile and takewhile aren't even common enough to justify being builtins, why should they jump all the way to syntactic support?" question applies). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 10, 2017 at 12:54:57AM +1000, Nick Coghlan wrote: Guido wrote:
"Some users"? Sure, why not? There's probably somebody out there who understands takewhile, but if so, I don't know who they are :-) I always have to look at the docs for takewhile to remind myself whether it drops items ("takes them away") while the condition is true, or yields items ("gives items") while the condition is true.
I'd like to take issue with that "not obvious" comment. I think that anyone who knows while loops knows that the loop exits when the condition becomes false. That's exactly the behaviour we get for the (hypothetical) [expr for x in seq while condition] syntax: when the condition is false, the loop and hence the comprehension, exits. For such simple cases, there's no need to think about "loop and a half". The obvious explanation is that the loop exits when the while condition fails. Based on my experience with beginners on the tutor mailing list, and elsewhere, I think there's a definite learning "hump" to get over before people grok even the trivial case of [expression for x in sequence] but once they do, then adding an "if" clause is obvious, and I expect that the same will apply to "when". Once you move beyond the simple case of a single for and no more than a single if (or while), I don't think there's *anything* obvious about comprehension syntax at all, while clause or no while clause. Holding the while clause to a standard that comprehensions already fail (in my opinion) is unfair: [expression for x in seq1 for y in seq2 if pred1 for z in seq3 if pred2 if pred3 if pred4 for w in seq4 while condition for v in seq5] I don't think it's the "while" that tips that over the edge, readability-wise :-) In any case, I think we're all guessing whether or not people will understand the "while condition" syntax. So I've done an informal survey on the Python-Ideas list, and once folks have had a day or so to answer I'll report what they say. It's not a truly scientific UI test, but it's the best I can do. -- Steve

Nick Coghlan writes:
My objection to this interpretation is different from Guido's (I think): if you're really thinking in terms of math, sets are *unordered*, and therefore "takewhile" doesn't guarantee exhaustion of the desired subset. Another way to put this is that in order to make it harder to get bit by subtle infloops, you're going to give more teeth to "Miller time came early"[1] bugs. This may be a bigger issue than some may think, because sets and dicts are iterable, and order of iteration is arbitrary (at best history- dependent). Footnotes: [1] American beer commercial claiming that real men go to drink beer after a full day's work.

On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree -- which is why I"m thinking only adding a simple "iterable slice", rather than changing the overall behavior of the container. It would be quite clear what you are asking for. Right now, getting the "terminate when false" behaviour requires the
use of takewhile:
I can't recall the use case(s) at the moment, but I have definitely wanted a way to break out of a comprehension -- and not always with infinite iterators. After all, we have "break" in both for and while loops, so clearly there is the use case... If someone comes up with a clean and not confusing (and general purpose) syntax, I think it would be very useful. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Aug 09, 2017 at 01:23:28PM -0700, Chris Barker wrote:
Indeed :-)
If someone comes up with a clean and not confusing (and general purpose) syntax, I think it would be very useful.
We used to be able to (ab)use StopIteration to do this: def Break(): raise StopIteration # generator expressions only, not list comprehensions result = (expression for x in sequence if condition or Break()) but I believe that loophole has been closed in 3.6. Comprehensions in Clojure have this feature: http://clojuredocs.org/clojure_core/clojure.core/for Clojure uses "when" where Python uses "if", giving: ;; :when continues through the collection even if some have the ;; condition evaluate to false, like filter user=> (for [x (range 3 33 2) :when (prime? x)] x) (3 5 7 11 13 17 19 23 29 31) ;; :while stops at the first collection element that evaluates to ;; false, like take-while user=> (for [x (range 3 33 2) :while (prime? x)] x) (3 5 7) Translating into Python: [x for x in range(3, 33, 2) if is_prime(x)] [x for x in range(3, 33, 2) while is_prime(x)] # hypothetical syntax I don't think it is confusing. Regardless of the implementation, the meaning of: [expression for x in sequence while condition] should (I believe) be obvious to anyone who already groks comprehension syntax. The mapping to a for-loop is admittedly a tad more complex: result = [] for x in sequence: if not condition: break result.append(expression) but I'm yet to meet anyone who routinely and regularly reads comprehensions by converting them to for loops like that. And if they did, all they need do is mentally map "while condition" to "if not condition: break" and it should all Just Work™. -- Steve

On 10 August 2017 at 14:42, Steven D'Aprano <steve@pearwood.info> wrote:
The hard part is the interaction between if and while. Consider (expr for var in seq if cond1 while cond2): This means: for var in seq: if cond1: if not cond2: break yield expr Note that unlike all other comprehension clauses (for and if) while doesn't introduce a new level of nesting. That's an inconsistency, and while it's minor, it would need clarifying (my original draft of this email was a mess, because I misinterpreted how if and while would interact, precisely over this point). Also, there's a potential issue here - consider [expr for var in even_numbers() if is_odd(var) while var < 100] This is an infinite loop, even though it has a finite termination condition (var < 100), because we only test the termination condition if var is odd, which it never will be. Obviously, this is a contrived example. And certainly "don't do that, then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment. That example could have started off being "for var in count(0)" and then someone realised they could "optimise" it by omitting odd numbers, introducing the bug in the process. (And I'm sure real life code could come up with much subtler examples ;-)) Overall, I agree with Steven's point. It seems pretty obvious what the intention is, and while it's probably possible to construct examples that are somewhat unclear, 1. The mechanical rule gives an explicit meaning 2. People shouldn't be writing such complex comprehensions, so if the rule doesn't give what they expect, they can always rewrite the code with an explicit (and clearer) loop. But while I think this says that the above interpretation of while is the only sensible one, and in general other approaches are unlikely to be as natural, I *don't* think that it unequivocally says that allowing while is a good thing. It may still be better to omit it, and force people to state their intent explicitly (albeit a bit more verbosely). Paul

On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore <p.f.moore@gmail.com> wrote:
why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first.... so maybe better to do: [expr for var in even_numbers() while var < 100 if is_odd(var)] Maybe it's just me, but I would certainly expect the while to have precedence. I guess I think of it like this: "if" is providing a filtering mechanism "while" is providing a termination mechanism -- is there a use case anyone can think of when they would want the while to be applied to the list AFTER filtering? Obviously, this is a contrived example. And certainly "don't do that,
then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment.
sure, but would there be an issue if teh while were given precedence? Overall, I agree with Steven's point. It seems pretty obvious what the
me too -- a direct translation to a for loop isn't necessary to understand how it works. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 10 August 2017 at 21:25, Chris Barker <chris.barker@noaa.gov> wrote:
See? That's my point - the "obvious" interpretation stops being obvious pretty fast...
so maybe better to do:
[expr for var in even_numbers() while var < 100 if is_odd(var)]
That would work. But I bet people's intuition wouldn't immediately lead to that fix (or indeed, necessarily incline them to put the clauses in this order in the first place).
Probably not, but when you can have multiple FORs, WHILEs and IFs, in any order, explaining the behaviour precisely while still preserving some sense of "filtering comes after termination" is going to be pretty difficult. [expr for var1 in seq1 if cond1 for var2 in seq2 for var3 in seq3 if cond2 if cond3] is legal - stupid, but legal. Now add while clauses randomly in that, and define your expected semantics clearly so a user (and the compiler!) can determine what the resulting mess means. The main benefit of the current "works like a for loop" interpretation is that it's 100% explicit. Nothing will make a mess like the above good code, but at least it's well-defined. Paul

The logical solution to me is to allow any order of while and if, and follow the same 'rule' as multiple for loops - just nest/test those in that order. Then you can have whatever priority you need. One question though is how this should handle multiple loops - break all of them, or just the current one? - Spencer Brown On 11 Aug 2017, at 6:27 am, Chris Barker <chris.barker@noaa.gov<mailto:chris.barker@noaa.gov>> wrote: On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore <p.f.moore@gmail.com<mailto:p.f.moore@gmail.com>> wrote: Also, there's a potential issue here - consider [expr for var in even_numbers() if is_odd(var) while var < 100] This is an infinite loop, even though it has a finite termination condition (var < 100), because we only test the termination condition if var is odd, which it never will be. why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first.... so maybe better to do: [expr for var in even_numbers() while var < 100 if is_odd(var)] Maybe it's just me, but I would certainly expect the while to have precedence. I guess I think of it like this: "if" is providing a filtering mechanism "while" is providing a termination mechanism -- is there a use case anyone can think of when they would want the while to be applied to the list AFTER filtering? Obviously, this is a contrived example. And certainly "don't do that, then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment. sure, but would there be an issue if teh while were given precedence? Overall, I agree with Steven's point. It seems pretty obvious what the intention is, and while it's probably possible to construct examples that are somewhat unclear, 1. The mechanical rule gives an explicit meaning 2. People shouldn't be writing such complex comprehensions, so if the rule doesn't give what they expect, they can always rewrite the code with an explicit (and clearer) loop. me too -- a direct translation to a for loop isn't necessary to understand how it works. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov<mailto:Chris.Barker@noaa.gov> _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Thu, Aug 10, 2017 at 1:53 PM, Spencer Brown <spencerb21@live.com> wrote:
Actually, I think it would be better to only allow one order, and have the "while" always teeted first -- which may mean it should be placed first for clarity.
just the current one, just like a "break", or for that matter, a nested while... -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 11 August 2017 at 06:53, Spencer Brown <spencerb21@live.com> wrote:
This is why I think a for-while construct in comprehensions would really only make sense in combination with a *statement* level for-while construct, as the problem we have is: - a termination condition can't readily use "if" (even in combination with "break") because that's visually and syntactically ambiguous with a filter condition - a naive translation of a "while" based syntax makes it look like a nested *non-terminating* loop Both of those problems may be resolved if a "for-while" loop exists as a top level looping construct that can terminate based on *either* an iterator being exhausted *or* a condition becoming false. The question then becomes whether or not a "for-while" loop is actually useful enough to be added as a general construct, given that we already have "if not condition: break" as a way of modeling a loop ending early because a condition became false. One way to gather evidence on that front would be to survey the standard library for places where we use "break", and see if any of them would be more readable given a for-while construct, whether as a statement, or as part of the comprehension syntax. (Note: I'm not interested enough in the idea to do that evidence gathering myself, I'm just pointing it out in case anyone is curious enough to take the time to collect those details) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote:
I'm not sure why Paul thinks this is an issue. There are plenty of ways to accidentally write an infinite loop in a comprehension, or a for loop, already: [expr for var in even_numbers()] will do it, if even_numbers is unexpectedly an infinite iterator. Or you could write: for num in even_numbers(): if is_odd(num) and num > 100: break No loop syntax, whether it is functional style (takewhile, map, etc.), comprehension, or traditional style for loops, enables the programmer to avoid thinking about what they write.
why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first....
Because we're following the standard Python rule of left-to-right execution. The while clause is tested only if the if clause is true because it follows the if clause. I think that there's an argument to be made for the rule: We can have `if` in a comprehension, or `while`, but not both in order to limit complexity. Analogy: (1) we intentionally limit the decorator @ syntax to a subset of expressions; (2) likewise we intentionally allow (but don't encourage) monkey- patching of Python classes only, not built-ins. Just because we *can* allow arbitrary code combinations doesn't mean we *must*. We have a choice to say: "No, you cannot mix `if` and `when` in the same comprehension. Why? Because we say so. Because it is confusing if you do." I'd be okay with that rule. But if we decide to allow arbitrary combinations of for/if/while in comprehensions, then I think we must keep the same left-to-right rule we have now. Currently we process multiple for/if clauses left-to-right: [expr for x in a if cond for y in b] is equivalent to: for x in a: if cond: for y in b: expr rather than moving the `if` to the end. If you want it at the end, put it there yourself. Adding `while` shouldn't change that. It would be crazy-complicated to have a rule: "the presence of a while means the comprehension is processed in parallel" or "all the while clauses are processed before (after?) the if clauses, regardless of their order of appearance."
so maybe better to do:
[expr for var in even_numbers() while var < 100 if is_odd(var)]
Well sure, that's the *correct* way to write the code: for var in even_numbers(): if not (var < 100): break if is_odd(var): results.append(expr) (for some definition of "correct" -- this is clearly an expensive way to generate an empty list.) But in general one might wish to test the if or the while in either order.
Maybe it's just me, but I would certainly expect the while to have precedence.
Does that apply to these idioms as well? while cond: if flag: ... versus: if flag: while cond: ... I would not expect them to be the same, and nor would I expect these to be the same: [expr for x in seq if flag while cond] [expr for x in seq while cond if flag]
[process(n) for n in numbers while n > 0 if is_odd(n)] Halt on the first zero or negative number, regardless of whether it is even or odd, but process only odd numbers. Paul:
That's the argument for limiting comprehensions to either `if` or `while` but not both. And I actually would be okay with that -- especially if we leave open the possibility of relaxing the prohibition in the future. But personally, I think that's under-estimating the ability of programmers to reason about loops. Of course a comprehension with multiple for/if/while clauses is hard to reason about, and we shouldn't *encourage* them, but we don't prohibit multiple for/if clauses. Why should `while` be held to a higher standard? If we allow people to shoot themselves in the foot by writing complex list comprehensions with ten `for` loops and seven `if` clauses, why should we baulk at allowing them a `while` clause as well? -- Steve

On Fri, Aug 11, 2017 at 02:49:10PM +1000, Steven D'Aprano wrote:
On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote:
Oops, sorry I had a thinko and read your question in the opposite sense than it actually is. See my response to Nick for an example: I have an iterable of arbitrary objects. I want to ignore anything that isn't a string, and halt if the string doesn't start with "A". [expr for s in objects if isinstance(s, str) while s.startswith("A")] -- Steve

On 11 August 2017 at 05:49, Steven D'Aprano <steve@pearwood.info> wrote:
Mostly because I work in a support and maintenance environment, where we routinely see code that *originally* made sense, but which was over time modified in ways that break things - usually precisely because coders who in theory understand how to write such things correctly, end up not taking the time to fully understand the constructs they are modifying. Of course that's wrong, but it's sadly all too common, and for that reason I'm always wary of constructs that need thinking through carefully to understand the implications. Nick's original {x for x in itertools.count(0) if 1000 <= x while x < 1000000} was like that. It was *sort of* obvious that it meant "numbers between 1_000 and 1_000_000, but the interaction between "if" and "while" wasn't clear to me. If I were asked to rush in a change to only pick odd numbers, {x for x in itertools.count(0) if 1000 <= x and is_odd(x) while x < 1000000} seems right to me, but quick - what about edge cases? It's not that I can't get it right, nor is it that I can't test that I *did* get it right, just that this sort of "quick fix" is very common in the sort of real-world coding I see regularly, and a huge advantage of Python is that it's hard to get in a situation where the obvious guess is wrong. Don't get me wrong - I'm not arguing that the sky is falling. Just that this construct isn't as easy to understand as it seems at first (and that hard-to-understand cases appear *before* you hit the point where it's obvious that the statement is too complex and should be refactored. Paul

On 11 August 2017 at 01:39, Paul Moore <p.f.moore@gmail.com> wrote:
This is actually how I came to the conclusion that if we were ever to do something like this, the termination condition would need to go *before* the filter condition: (expr for var in seq while loop_cond if filter_cond) <=> for var in seq: if loop_cond: if filter_cond: yield expr else: break With the clauses in that order, the "while" keyword effectively operates as "if-else-break" the same way it does in a regular while loop, and could potentially be introduced as a modifying clause on regular for loops at the same time. One of the neat things the latter would allow is to make it even easier to introduce a diagnostic loop counter into while loops: while condition: ... could become: for iteration in itertools.count(1) while condition: ... rather than having to implement a manually incremented loop counter the way you do today.
This is another good reason why a termination condition would need to be checked before the filter condition rather than either after it, or only when the filter condition was true. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Aug 11, 2017 at 02:34:53PM +1000, Nick Coghlan wrote:
What if you want to check the filter condition before the termination condition? I have an iterable of arbitrary objects. I want to ignore anything that isn't a string, and halt if the string doesn't start with "A". This is easy: [expr for s in objects if isinstance(s, str) while s.startswith("A")] Why should we prohibit expressing this, and instead write it as this? [expr for s in objects while (s.startswith("A")) if isinstance(s, str) else True) if isinstance(s, str)] Or split into multiple comprehensions? [expr for s in [obj for obj in objects if isinstance(obj, str)] while s.startswith("A")]
We can still expand the clauses if they are presented in the opposite order: (expr for var in seq if filter_cond while loop_cond) <=> for var in seq: if filter_cond: if loop_cond: yield expr else: break There's no need to prohibit that. It is meaningful and useful and just because somebody might accidentally fail to exit an infinite loop is no reason to ban this.
Why is this a problem that needs solving? Who is to say that an infinite generator expression isn't exactly what the programmer wants? If the halting condition is not true, the generator expression will either keep going until the iterator is exhausted, or it will be an infinite generator just like the unprocessed, unfiltered source iterator. This is not necessarily a problem. -- Steve

On 11 August 2017 at 15:13, Steven D'Aprano <steve@pearwood.info> wrote:
Because the most obvious interpretation of a completely independent "while" clause in comprehensions would be as a nested loop inside the outer for loop, not as a nested if-else-break statement. As a result of that, I'm only personally prepared to support for-while comprehensions if they're syntactic sugar for a combined statement level for-while loop that makes it clear why only the "for" clauses in a comprehension create new loops. I *wouldn't* be prepared to support them if they could only be explained in terms of a direct mapping to an if statement and had no statement level counterpart that actually used the "while" keyword. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8/10/2017 9:42 AM, Steven D'Aprano wrote:
In both cases, we use 'break' to mean break. If we want to break comprehensions, I think we should continue to use 'break' to mean break instead of twisting 'while' to mean 'break'.
This is the same as result = [] for x in sequence: if condition: result.append(expression) else: break which could be written [expression for x in sequence if condition break] -- Terry Jan Reedy

On Thu, Aug 10, 2017 at 1:03 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I was thinking that too.
[expression for x in sequence if condition break]
hmm, but if you want to filter, also? [expression for x in sequence if condition if condition break] or [expression for x in sequence if condition break if condition ] both of those seem more confusing to me than while. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

(re-posting here as I first mistakenly answered directly to Terry. Sorry about that!) Le 10/08/17 à 22:03, Terry Reedy a écrit :
It's what I thought too. Adding a `while` clause here just overly complicates the understanding of the comprehension. The `break` keyword is already easily understandable and helps to map the comprehension with the plain for-loop (I like this mapping for its reverse counterpart, as I often start with plain for-loops to rewrite them later to comprehensions when it makes sense). I would probably suggest this instead of Terry's proposal, though: [expression for x in sequence if condition*or *break] (maybe it's what you meant?). I suggest this because it doesn't imply the execution of a statement inside the comprehension, but just to continue the evaluation as it's always done. I admit it feels a bit hacky, but maybe just until we get used to it? -Brice

[...]
having the "if condition" there seems confusing to me, particularly if you want an if condition as a filter as well: [expression for x in sequence if condition1 if condition2 break] which makes me want: [expression for x in sequence if condition1 breakif condition2] adding another keyword is a pretty big deal though! would it be possible to add a keyword ("breakif" in this case) that was ONLY legal in comprehensions? Though I still dop'nt think using "while" would really introduce that much confusion -- sure, it doesn't introduce a new loop, but, as someone pointed out earlier in this thread it really is only changing from a: "while do" to a "do while" construct -- so means pretty much the same thing. I agree that scanning a code base to see if there really are many loops in practice that could use this construct would be a good way to see if there is any point. And it would also be interesting to do a survey of "random" folks as to how they would interpret such a construct -- it's pretty hard for a small group to know what is and isn't "confusing" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Though I still dop'nt think using "while" would really introduce that much confusion -- sure, it doesn't introduce a new loop, but, as someone pointed out earlier in this thread it really is only changing from a: "while do" to a "do while" construct -- so means pretty much the same thing. +1 for the "while" from me too, I don't think most people would find it confusing (supposing they don't find current [x for x in foo if ...] confusing either), and introducing a break there is just more of a mess. To those who say that it might get ugly if you do something like: [x for y in foo for x in y while x != y if x + y < 100] This still isn't even unbearable, and once it gets that hard, maybe you should consider something else anyways.

Hi Soni, and welcome! On Mon, Aug 07, 2017 at 04:30:05PM -0300, Soni L. wrote:
What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar for (x for x in range(1000, 1000000))?
If you want the integers from 1000 to 1000000, use: range(1000, 1000000) Don't waste your time slowing down the code with an unnecessary and pointless wrapper that does nothing but pass every value on unchanged: (x for x in range(1000, 1000000)) # waste of time and effort -- Steve

Soni L. writes: Steven d'Aprano writes:
range(1000, 1000000) (x for x in range(1000, 1000000)) # waste of time and effort
Actually, those have different semantics!
That's not real important. As Stefan Behnel points out, it's simple (and efficient) to get iterator semantics by using iter(). The big issue here is that Python is not the kind of declarative language where (x for x in int if 1_000 ≤ x ≤ 1_000_000)[1] is natural to write, let alone easy to implement efficiently. Aside from the problem of (x for x in float if 1_000 ≤ x ≤ 1_000_000) (where the answer is "just don't do that"), I can't think of any unbounded collections in Python that aren't iterables, except some types. That makes Steven's criticism pretty compelling. If you need to design a collection's __iter__ specially to allow it to decide whether the subset that satisfies some condition is exhausted, why not just subclass some appropriate existing collection with a more appropriate __iter__? Footnotes: [1] See what I did there? ;-)

On Tue, Aug 8, 2017 at 5:30 AM, Soni L. <fakedme+py@gmail.com> wrote:
I don't. I prefer to stick with the syntax we already have. The alternative is a more verbose way to identify a range, plus you need a new global "integers" which implies that you could iterate over "reals" the same way (after all, mathematics doesn't mind you working with a subset of reals the same way you'd work with a subset of ints). And good luck iterating over all the reals. :) ChrisA

On Mon, Aug 7, 2017 at 4:14 PM, Chris Angelico <rosuav@gmail.com> wrote:
that's what it's for -- I'm confused as to what the problem is.
this is a set comprehension -- but what is "integers"? is it a generator? in which case, it should take an argument so it knows when to end. Or if it's really that symple, that's what range() is for. However, similarly, I find that sometimes I want to iterate over a slice of a sequence, but do'nt want to actually make the slice first. So there is itertools.islice() If "integers" is a sequence: {x for x in integers[1000:10000]} makes an unneeded copy of that slice. {x for x in itertools.islice(integers, 1000, 10000)} will iterate on the fly, and not make any extra copies. It would be nice to have an easier access to an "slice iterator" though -- one of these days I may write up a proposal for that. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 8 August 2017 at 09:06, Chris Barker <chris.barker@noaa.gov> wrote:
It would be nice to have an easier access to an "slice iterator" though -- one of these days I may write up a proposal for that.
An idea I've occasionally toyed with [1] is some kind of "iterview" that wraps around an arbitrary iterable and produces lazy itertools based results rather than immediate views or copies. However, my experience is also that folks are *really* accustomed to syntactic operations on containers producing either full live views (e.g. memoryview or numpy slices, range as a dynamically computed container), or actual copies (builtin container types). Having them produce consumable iterators instead then gets confusing due to the number of operations that will implicitly consume them (including simple "x in y" checks). The OP's proposal doesn't fit into that category though: rather it's asking about the case where we have an infinite iterator (e.g. itertools.count(0)), and want to drop items until they start meeting some condition (i.e. itertools.dropwhile) and then terminate the iterator as soon as another condition is no longer met (i.e. itertools.takewhile). Right now, getting the "terminate when false" behaviour requires the use of takewhile: {itertools.takewhile(lambda x: x < 1000000, itertools.count(1000)} In these cases, the standard generator expression syntax is an attractive nuisance because it *looks* right from a mathematical perspective, but hides an infinite loop: {x for x in itertools.count(0) if 1000 <= x < 1000000} The most credible proposal to address this that I've seen is to borrow the "while" keyword in its "if not x: break" interpretation to get: {x for x in itertools.count(0) if 1000 <= x while x < 1000000} which would be compiled as equivalent to: x = set() for x in itertools.count(0): if 1000 <= x: set.add(x) if not x < 1000000: break (and similarly for all of the other comprehension variants) There aren't any technical barriers I'm aware of to implementing that, with the main historical objection being that instead of the comprehension level while clause mapping to a while loop directly the way the for and if clauses map to their statement level counterparts, it would instead map to the conditional break in the expanded loop-and-a-half form: while True: if not condition: break While it's taken me a long time to come around to the idea, "Make subtle infinite loops in mathematical code easier to avoid" *is* a pretty compelling user-focused justification for incurring that extra complexity at the language design level. Cheers, Nick. [1] https://mail.python.org/pipermail/python-ideas/2010-April/006983.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't think that's what the OP meant. The original proposal seemed to assume that it would be somehow reasonable for the input ("integers" in the example) to be able to see and parse the condition in the generator expression ("1000 <= x < 100000" in the example, with "x" somehow known to be bound to the iteration value). That's at least what I think the remark "I like mathy syntax" referred to.
I haven't come around to this yet. It looks like it will make explaining comprehensions more complex, since the translation of "while X" into "if not X: break" feels less direct than the translations of "for x in xs" or "if pred(x)". (In particular, your proposal seems to require more experience with mentally translating loops and conditions into jumps -- most regulars of this forum do that for a living, but I doubt it's second nature for the OP.) -- --Guido van Rossum (python.org/~guido)

On 9 August 2017 at 15:38, Guido van Rossum <guido@python.org> wrote:
Right, I was separating the original request to make "{x for x in integers if 1000 <= x < 1000000}" work into the concrete proposal to make exactly *that* syntax work (which I don't think is feasible), and the slightly more general notion of offering a more math-like syntax that allows finite sets to be built from infinite iterators by defining a termination condition in addition to a filter condition.
Yeah, if we ever did add something like this, I suspect a translation using takewhile would potentially be easier for at least some users to understand than the one to a break condition: {x for x in itertools.count(0) if 1000 <= x while x < 1000000} <=> x = set() for x in itertools.count(0): if 1000 <= x: set.add(x) # If you've never used the loop-and-a-half idiom, it's # not obvious why "while <expr>" means "if not <expr>: break" if not x < 1000000: break is roughly {x for x in itertools.takewhile(itertools.count(0), lambda x: x < 1000000) if 1000 <= x} <=> x = set() for x in takewhile(itertools.count(0), lambda x: x < 1000000): if 1000 <= x: set.add(x) However, the break condition is the translation that would make sense at a language *implementation* level (and would hence be the one that determined the relative location of the while clause in the expression form). That discrepancy *still* sets off alarm bells for me (since it's a clear sign that "how people would think this works" and "how it would actually work" probably wouldn't match), I'm also conscious of the amount of syntactic noise that "takewhile" introduces vs the "while" keyword. The counter-argument (which remains valid even against my own change of heart) is that adding a new comprehension clause doesn't actually fix the "accidental infinite loop" problem: "{x for x in itertools.count(0) if 1000 <= x < 1000000}" will still loop forever, it would just have a nicer fix to get it to terminate (adding " while x" to turn the second filter condition into a termination condition). So while I'm +0 where I used to be a firm -1, it's still only a +0 :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2017-08-09 11:54 AM, Nick Coghlan wrote:
Ok. A concrete proposal would give a read-only 'filter' argument to the iterator somehow, which represents some form of simplified AST of the condition. So e.g. {x for x in integers if (lambda v: 1000 <= v < 1000000)(x)} would never complete, but {x for x in integers if 1000 <= x < 1000000} would. (But perhaps lambda objects should include an AST attribute... Having it for normal functions would introduce too much overhead tho, and then it would no longer be a simplified AST, but rather a complete python AST, which we don't want.)

On 10 August 2017 at 01:49, Soni L. <fakedme+py@gmail.com> wrote:
There have been a variety of different "thunking" proposals over the years, but they've all foundered on the question of what the *primitive* quoted form should look like, and how the thunks should subsequently be executed. For cases like this, where integration with Python's name resolution mechanism isn't actually required, folks have ended up just using strings, where the only downside is the fact that syntax highlighters and other static analysers don't know that the contents are supposed to be valid Python code. In a case like this, that might look like: {x for x in integers.build_set("1000 <= x < 1000000")} As with regexes, the cost of dynamically parsing such strings can then be amortised at runtime through the use of an appropriate caching strategy. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 10, 2017 at 12:39:34PM -0300, Soni L. wrote:
I don't understand what you mean by this. The syntax for lambda is (roughly): lambda parameter-list : expression The syntax for generators is (again, roughly): def name ( parameter-list ) : suite-containing-yield Obviously the generator suite can contain expressions, and both have a parameter-list. What shared syntax are you referring to, and how is it relevant? Or are you referring to generator expressions, rather than generators? ( expression for target in expression ... ) Obviously a Python expression is a Python expression, wherever it is, so a lambda can contain generator expressions, and generator expressions can contain lambdas... And what do you mean by "simplified AST" API? I'm afraid your comment is too abstract for me to understand. -- Steve

On 11 August 2017 at 01:39, Soni L. <fakedme+py@gmail.com> wrote:
We already do, via the "mode" argument to the compile builtin and to ast.parse: >>> ast.dump(ast.parse("1000 <= x < 1000000", mode="eval")) "Expression(body=Compare(left=Num(n=1000), ops=[LtE(), Lt()], comparators=[Name(id='x', ctx=Load()), Num(n=1000000)]))" >>> ast.parse("import sys", mode="eval") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.6/ast.py", line 35, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<unknown>", line 1 import sys ^ SyntaxError: invalid syntax It's a large part of the reason why passing strings around has so far qualified as "good enough" - providing dedicated syntax for it doesn't actually increase the language's expressiveness all that much, it just has the potential to make static analysis easier by eagerly rendering to an AST rather than having that be handled by the function receiving the argument. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8/9/2017 10:54 AM, Nick Coghlan wrote:
We already have three nice one liners for that, one of which you gave. x = set(filter(filter_condition, takewhile(continue_condition, source))) x = set(x for x in takewhile(continue_condition, source) if filter_condition) x = {x for x in takewhile(continue_condition, source) if filter_condition} Replace takewhile with islice(source, max) if the continue condition is (number seen < max). Add enumerate if the running count is needed otherwise. Terminating an infinite iterator and filtering the initial slice are different operations. The operations are easily composed as they are, in multiple ways. Trying to mix them together in one jumbled special syntax is a bad idea to me.
In other words, aside from other issues, you would have 'while' mean 'do...while' in this one special place. -1. -- Terry Jan Reedy

On 10 August 2017 at 00:54, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ugh, this discrepancy is worse than I thought, since the translation with that clause order is actually wrong (Terry mentioned this by pointing out that the proposed syntactic translation implemented "do...while" ordering). The takewhile example is also wrong, since it has the arguments in the wrong order. Fixing both of those issues gives the comparison: {x for x in itertools.count(0) while x < 1000000 if 1000 <= x} <=> x = set() for x in itertools.count(0): # If you've never used the loop-and-a-half idiom, it's # not obvious why "while <expr>" means "if <expr>: <loop body> else: break" if x < 1000000: if 1000 <= x: set.add(x) else: break is roughly: {x for x in itertools.takewhile(lambda x: x < 1000000, itertools.count(0)) if 1000 <= x} <=> x = set() for x in takewhile(lambda x: x < 1000000, itertools.count(0)): if 1000 <= x: set.add(x) And I think that gets me back to pretty much where I was the last time this came up: a while clause in comprehensions really only makes sense in combination with a while clause on for loops, where: for x in itertools.count(0) while x < 1000000: ... was roughly equivalent to: for x in itertools.count(0): if x < 1000000: ... else: <loop else clause, if any, still runs here> break (such that there's only one loop from the point of view of break/continue/else, but the loop may terminate based on either exhaustion of the underlying iterator *or* some specific condition becoming false) While I do think such a clause would be more readable for more people than the dropwhile/takewhile equivalents (especially when the latter end up needing to use lambda expressions), I'm still dubious that these cases come up often enough to justify the addition of a for-while loop as a composite construct (the old "dropwhile and takewhile aren't even common enough to justify being builtins, why should they jump all the way to syntactic support?" question applies). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 10, 2017 at 12:54:57AM +1000, Nick Coghlan wrote: Guido wrote:
"Some users"? Sure, why not? There's probably somebody out there who understands takewhile, but if so, I don't know who they are :-) I always have to look at the docs for takewhile to remind myself whether it drops items ("takes them away") while the condition is true, or yields items ("gives items") while the condition is true.
I'd like to take issue with that "not obvious" comment. I think that anyone who knows while loops knows that the loop exits when the condition becomes false. That's exactly the behaviour we get for the (hypothetical) [expr for x in seq while condition] syntax: when the condition is false, the loop and hence the comprehension, exits. For such simple cases, there's no need to think about "loop and a half". The obvious explanation is that the loop exits when the while condition fails. Based on my experience with beginners on the tutor mailing list, and elsewhere, I think there's a definite learning "hump" to get over before people grok even the trivial case of [expression for x in sequence] but once they do, then adding an "if" clause is obvious, and I expect that the same will apply to "when". Once you move beyond the simple case of a single for and no more than a single if (or while), I don't think there's *anything* obvious about comprehension syntax at all, while clause or no while clause. Holding the while clause to a standard that comprehensions already fail (in my opinion) is unfair: [expression for x in seq1 for y in seq2 if pred1 for z in seq3 if pred2 if pred3 if pred4 for w in seq4 while condition for v in seq5] I don't think it's the "while" that tips that over the edge, readability-wise :-) In any case, I think we're all guessing whether or not people will understand the "while condition" syntax. So I've done an informal survey on the Python-Ideas list, and once folks have had a day or so to answer I'll report what they say. It's not a truly scientific UI test, but it's the best I can do. -- Steve

Nick Coghlan writes:
My objection to this interpretation is different from Guido's (I think): if you're really thinking in terms of math, sets are *unordered*, and therefore "takewhile" doesn't guarantee exhaustion of the desired subset. Another way to put this is that in order to make it harder to get bit by subtle infloops, you're going to give more teeth to "Miller time came early"[1] bugs. This may be a bigger issue than some may think, because sets and dicts are iterable, and order of iteration is arbitrary (at best history- dependent). Footnotes: [1] American beer commercial claiming that real men go to drink beer after a full day's work.

On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree -- which is why I"m thinking only adding a simple "iterable slice", rather than changing the overall behavior of the container. It would be quite clear what you are asking for. Right now, getting the "terminate when false" behaviour requires the
use of takewhile:
I can't recall the use case(s) at the moment, but I have definitely wanted a way to break out of a comprehension -- and not always with infinite iterators. After all, we have "break" in both for and while loops, so clearly there is the use case... If someone comes up with a clean and not confusing (and general purpose) syntax, I think it would be very useful. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Aug 09, 2017 at 01:23:28PM -0700, Chris Barker wrote:
Indeed :-)
If someone comes up with a clean and not confusing (and general purpose) syntax, I think it would be very useful.
We used to be able to (ab)use StopIteration to do this: def Break(): raise StopIteration # generator expressions only, not list comprehensions result = (expression for x in sequence if condition or Break()) but I believe that loophole has been closed in 3.6. Comprehensions in Clojure have this feature: http://clojuredocs.org/clojure_core/clojure.core/for Clojure uses "when" where Python uses "if", giving: ;; :when continues through the collection even if some have the ;; condition evaluate to false, like filter user=> (for [x (range 3 33 2) :when (prime? x)] x) (3 5 7 11 13 17 19 23 29 31) ;; :while stops at the first collection element that evaluates to ;; false, like take-while user=> (for [x (range 3 33 2) :while (prime? x)] x) (3 5 7) Translating into Python: [x for x in range(3, 33, 2) if is_prime(x)] [x for x in range(3, 33, 2) while is_prime(x)] # hypothetical syntax I don't think it is confusing. Regardless of the implementation, the meaning of: [expression for x in sequence while condition] should (I believe) be obvious to anyone who already groks comprehension syntax. The mapping to a for-loop is admittedly a tad more complex: result = [] for x in sequence: if not condition: break result.append(expression) but I'm yet to meet anyone who routinely and regularly reads comprehensions by converting them to for loops like that. And if they did, all they need do is mentally map "while condition" to "if not condition: break" and it should all Just Work™. -- Steve

On 10 August 2017 at 14:42, Steven D'Aprano <steve@pearwood.info> wrote:
The hard part is the interaction between if and while. Consider (expr for var in seq if cond1 while cond2): This means: for var in seq: if cond1: if not cond2: break yield expr Note that unlike all other comprehension clauses (for and if) while doesn't introduce a new level of nesting. That's an inconsistency, and while it's minor, it would need clarifying (my original draft of this email was a mess, because I misinterpreted how if and while would interact, precisely over this point). Also, there's a potential issue here - consider [expr for var in even_numbers() if is_odd(var) while var < 100] This is an infinite loop, even though it has a finite termination condition (var < 100), because we only test the termination condition if var is odd, which it never will be. Obviously, this is a contrived example. And certainly "don't do that, then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment. That example could have started off being "for var in count(0)" and then someone realised they could "optimise" it by omitting odd numbers, introducing the bug in the process. (And I'm sure real life code could come up with much subtler examples ;-)) Overall, I agree with Steven's point. It seems pretty obvious what the intention is, and while it's probably possible to construct examples that are somewhat unclear, 1. The mechanical rule gives an explicit meaning 2. People shouldn't be writing such complex comprehensions, so if the rule doesn't give what they expect, they can always rewrite the code with an explicit (and clearer) loop. But while I think this says that the above interpretation of while is the only sensible one, and in general other approaches are unlikely to be as natural, I *don't* think that it unequivocally says that allowing while is a good thing. It may still be better to omit it, and force people to state their intent explicitly (albeit a bit more verbosely). Paul

On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore <p.f.moore@gmail.com> wrote:
why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first.... so maybe better to do: [expr for var in even_numbers() while var < 100 if is_odd(var)] Maybe it's just me, but I would certainly expect the while to have precedence. I guess I think of it like this: "if" is providing a filtering mechanism "while" is providing a termination mechanism -- is there a use case anyone can think of when they would want the while to be applied to the list AFTER filtering? Obviously, this is a contrived example. And certainly "don't do that,
then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment.
sure, but would there be an issue if teh while were given precedence? Overall, I agree with Steven's point. It seems pretty obvious what the
me too -- a direct translation to a for loop isn't necessary to understand how it works. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 10 August 2017 at 21:25, Chris Barker <chris.barker@noaa.gov> wrote:
See? That's my point - the "obvious" interpretation stops being obvious pretty fast...
so maybe better to do:
[expr for var in even_numbers() while var < 100 if is_odd(var)]
That would work. But I bet people's intuition wouldn't immediately lead to that fix (or indeed, necessarily incline them to put the clauses in this order in the first place).
Probably not, but when you can have multiple FORs, WHILEs and IFs, in any order, explaining the behaviour precisely while still preserving some sense of "filtering comes after termination" is going to be pretty difficult. [expr for var1 in seq1 if cond1 for var2 in seq2 for var3 in seq3 if cond2 if cond3] is legal - stupid, but legal. Now add while clauses randomly in that, and define your expected semantics clearly so a user (and the compiler!) can determine what the resulting mess means. The main benefit of the current "works like a for loop" interpretation is that it's 100% explicit. Nothing will make a mess like the above good code, but at least it's well-defined. Paul

The logical solution to me is to allow any order of while and if, and follow the same 'rule' as multiple for loops - just nest/test those in that order. Then you can have whatever priority you need. One question though is how this should handle multiple loops - break all of them, or just the current one? - Spencer Brown On 11 Aug 2017, at 6:27 am, Chris Barker <chris.barker@noaa.gov<mailto:chris.barker@noaa.gov>> wrote: On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore <p.f.moore@gmail.com<mailto:p.f.moore@gmail.com>> wrote: Also, there's a potential issue here - consider [expr for var in even_numbers() if is_odd(var) while var < 100] This is an infinite loop, even though it has a finite termination condition (var < 100), because we only test the termination condition if var is odd, which it never will be. why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first.... so maybe better to do: [expr for var in even_numbers() while var < 100 if is_odd(var)] Maybe it's just me, but I would certainly expect the while to have precedence. I guess I think of it like this: "if" is providing a filtering mechanism "while" is providing a termination mechanism -- is there a use case anyone can think of when they would want the while to be applied to the list AFTER filtering? Obviously, this is a contrived example. And certainly "don't do that, then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment. sure, but would there be an issue if teh while were given precedence? Overall, I agree with Steven's point. It seems pretty obvious what the intention is, and while it's probably possible to construct examples that are somewhat unclear, 1. The mechanical rule gives an explicit meaning 2. People shouldn't be writing such complex comprehensions, so if the rule doesn't give what they expect, they can always rewrite the code with an explicit (and clearer) loop. me too -- a direct translation to a for loop isn't necessary to understand how it works. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov<mailto:Chris.Barker@noaa.gov> _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Thu, Aug 10, 2017 at 1:53 PM, Spencer Brown <spencerb21@live.com> wrote:
Actually, I think it would be better to only allow one order, and have the "while" always teeted first -- which may mean it should be placed first for clarity.
just the current one, just like a "break", or for that matter, a nested while... -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 11 August 2017 at 06:53, Spencer Brown <spencerb21@live.com> wrote:
This is why I think a for-while construct in comprehensions would really only make sense in combination with a *statement* level for-while construct, as the problem we have is: - a termination condition can't readily use "if" (even in combination with "break") because that's visually and syntactically ambiguous with a filter condition - a naive translation of a "while" based syntax makes it look like a nested *non-terminating* loop Both of those problems may be resolved if a "for-while" loop exists as a top level looping construct that can terminate based on *either* an iterator being exhausted *or* a condition becoming false. The question then becomes whether or not a "for-while" loop is actually useful enough to be added as a general construct, given that we already have "if not condition: break" as a way of modeling a loop ending early because a condition became false. One way to gather evidence on that front would be to survey the standard library for places where we use "break", and see if any of them would be more readable given a for-while construct, whether as a statement, or as part of the comprehension syntax. (Note: I'm not interested enough in the idea to do that evidence gathering myself, I'm just pointing it out in case anyone is curious enough to take the time to collect those details) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote:
I'm not sure why Paul thinks this is an issue. There are plenty of ways to accidentally write an infinite loop in a comprehension, or a for loop, already: [expr for var in even_numbers()] will do it, if even_numbers is unexpectedly an infinite iterator. Or you could write: for num in even_numbers(): if is_odd(num) and num > 100: break No loop syntax, whether it is functional style (takewhile, map, etc.), comprehension, or traditional style for loops, enables the programmer to avoid thinking about what they write.
why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first....
Because we're following the standard Python rule of left-to-right execution. The while clause is tested only if the if clause is true because it follows the if clause. I think that there's an argument to be made for the rule: We can have `if` in a comprehension, or `while`, but not both in order to limit complexity. Analogy: (1) we intentionally limit the decorator @ syntax to a subset of expressions; (2) likewise we intentionally allow (but don't encourage) monkey- patching of Python classes only, not built-ins. Just because we *can* allow arbitrary code combinations doesn't mean we *must*. We have a choice to say: "No, you cannot mix `if` and `when` in the same comprehension. Why? Because we say so. Because it is confusing if you do." I'd be okay with that rule. But if we decide to allow arbitrary combinations of for/if/while in comprehensions, then I think we must keep the same left-to-right rule we have now. Currently we process multiple for/if clauses left-to-right: [expr for x in a if cond for y in b] is equivalent to: for x in a: if cond: for y in b: expr rather than moving the `if` to the end. If you want it at the end, put it there yourself. Adding `while` shouldn't change that. It would be crazy-complicated to have a rule: "the presence of a while means the comprehension is processed in parallel" or "all the while clauses are processed before (after?) the if clauses, regardless of their order of appearance."
so maybe better to do:
[expr for var in even_numbers() while var < 100 if is_odd(var)]
Well sure, that's the *correct* way to write the code: for var in even_numbers(): if not (var < 100): break if is_odd(var): results.append(expr) (for some definition of "correct" -- this is clearly an expensive way to generate an empty list.) But in general one might wish to test the if or the while in either order.
Maybe it's just me, but I would certainly expect the while to have precedence.
Does that apply to these idioms as well? while cond: if flag: ... versus: if flag: while cond: ... I would not expect them to be the same, and nor would I expect these to be the same: [expr for x in seq if flag while cond] [expr for x in seq while cond if flag]
[process(n) for n in numbers while n > 0 if is_odd(n)] Halt on the first zero or negative number, regardless of whether it is even or odd, but process only odd numbers. Paul:
That's the argument for limiting comprehensions to either `if` or `while` but not both. And I actually would be okay with that -- especially if we leave open the possibility of relaxing the prohibition in the future. But personally, I think that's under-estimating the ability of programmers to reason about loops. Of course a comprehension with multiple for/if/while clauses is hard to reason about, and we shouldn't *encourage* them, but we don't prohibit multiple for/if clauses. Why should `while` be held to a higher standard? If we allow people to shoot themselves in the foot by writing complex list comprehensions with ten `for` loops and seven `if` clauses, why should we baulk at allowing them a `while` clause as well? -- Steve

On Fri, Aug 11, 2017 at 02:49:10PM +1000, Steven D'Aprano wrote:
On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote:
Oops, sorry I had a thinko and read your question in the opposite sense than it actually is. See my response to Nick for an example: I have an iterable of arbitrary objects. I want to ignore anything that isn't a string, and halt if the string doesn't start with "A". [expr for s in objects if isinstance(s, str) while s.startswith("A")] -- Steve

On 11 August 2017 at 05:49, Steven D'Aprano <steve@pearwood.info> wrote:
Mostly because I work in a support and maintenance environment, where we routinely see code that *originally* made sense, but which was over time modified in ways that break things - usually precisely because coders who in theory understand how to write such things correctly, end up not taking the time to fully understand the constructs they are modifying. Of course that's wrong, but it's sadly all too common, and for that reason I'm always wary of constructs that need thinking through carefully to understand the implications. Nick's original {x for x in itertools.count(0) if 1000 <= x while x < 1000000} was like that. It was *sort of* obvious that it meant "numbers between 1_000 and 1_000_000, but the interaction between "if" and "while" wasn't clear to me. If I were asked to rush in a change to only pick odd numbers, {x for x in itertools.count(0) if 1000 <= x and is_odd(x) while x < 1000000} seems right to me, but quick - what about edge cases? It's not that I can't get it right, nor is it that I can't test that I *did* get it right, just that this sort of "quick fix" is very common in the sort of real-world coding I see regularly, and a huge advantage of Python is that it's hard to get in a situation where the obvious guess is wrong. Don't get me wrong - I'm not arguing that the sky is falling. Just that this construct isn't as easy to understand as it seems at first (and that hard-to-understand cases appear *before* you hit the point where it's obvious that the statement is too complex and should be refactored. Paul

On 11 August 2017 at 01:39, Paul Moore <p.f.moore@gmail.com> wrote:
This is actually how I came to the conclusion that if we were ever to do something like this, the termination condition would need to go *before* the filter condition: (expr for var in seq while loop_cond if filter_cond) <=> for var in seq: if loop_cond: if filter_cond: yield expr else: break With the clauses in that order, the "while" keyword effectively operates as "if-else-break" the same way it does in a regular while loop, and could potentially be introduced as a modifying clause on regular for loops at the same time. One of the neat things the latter would allow is to make it even easier to introduce a diagnostic loop counter into while loops: while condition: ... could become: for iteration in itertools.count(1) while condition: ... rather than having to implement a manually incremented loop counter the way you do today.
This is another good reason why a termination condition would need to be checked before the filter condition rather than either after it, or only when the filter condition was true. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Aug 11, 2017 at 02:34:53PM +1000, Nick Coghlan wrote:
What if you want to check the filter condition before the termination condition? I have an iterable of arbitrary objects. I want to ignore anything that isn't a string, and halt if the string doesn't start with "A". This is easy: [expr for s in objects if isinstance(s, str) while s.startswith("A")] Why should we prohibit expressing this, and instead write it as this? [expr for s in objects while (s.startswith("A")) if isinstance(s, str) else True) if isinstance(s, str)] Or split into multiple comprehensions? [expr for s in [obj for obj in objects if isinstance(obj, str)] while s.startswith("A")]
We can still expand the clauses if they are presented in the opposite order: (expr for var in seq if filter_cond while loop_cond) <=> for var in seq: if filter_cond: if loop_cond: yield expr else: break There's no need to prohibit that. It is meaningful and useful and just because somebody might accidentally fail to exit an infinite loop is no reason to ban this.
Why is this a problem that needs solving? Who is to say that an infinite generator expression isn't exactly what the programmer wants? If the halting condition is not true, the generator expression will either keep going until the iterator is exhausted, or it will be an infinite generator just like the unprocessed, unfiltered source iterator. This is not necessarily a problem. -- Steve

On 11 August 2017 at 15:13, Steven D'Aprano <steve@pearwood.info> wrote:
Because the most obvious interpretation of a completely independent "while" clause in comprehensions would be as a nested loop inside the outer for loop, not as a nested if-else-break statement. As a result of that, I'm only personally prepared to support for-while comprehensions if they're syntactic sugar for a combined statement level for-while loop that makes it clear why only the "for" clauses in a comprehension create new loops. I *wouldn't* be prepared to support them if they could only be explained in terms of a direct mapping to an if statement and had no statement level counterpart that actually used the "while" keyword. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8/10/2017 9:42 AM, Steven D'Aprano wrote:
In both cases, we use 'break' to mean break. If we want to break comprehensions, I think we should continue to use 'break' to mean break instead of twisting 'while' to mean 'break'.
This is the same as result = [] for x in sequence: if condition: result.append(expression) else: break which could be written [expression for x in sequence if condition break] -- Terry Jan Reedy

On Thu, Aug 10, 2017 at 1:03 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I was thinking that too.
[expression for x in sequence if condition break]
hmm, but if you want to filter, also? [expression for x in sequence if condition if condition break] or [expression for x in sequence if condition break if condition ] both of those seem more confusing to me than while. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

(re-posting here as I first mistakenly answered directly to Terry. Sorry about that!) Le 10/08/17 à 22:03, Terry Reedy a écrit :
It's what I thought too. Adding a `while` clause here just overly complicates the understanding of the comprehension. The `break` keyword is already easily understandable and helps to map the comprehension with the plain for-loop (I like this mapping for its reverse counterpart, as I often start with plain for-loops to rewrite them later to comprehensions when it makes sense). I would probably suggest this instead of Terry's proposal, though: [expression for x in sequence if condition*or *break] (maybe it's what you meant?). I suggest this because it doesn't imply the execution of a statement inside the comprehension, but just to continue the evaluation as it's always done. I admit it feels a bit hacky, but maybe just until we get used to it? -Brice

[...]
having the "if condition" there seems confusing to me, particularly if you want an if condition as a filter as well: [expression for x in sequence if condition1 if condition2 break] which makes me want: [expression for x in sequence if condition1 breakif condition2] adding another keyword is a pretty big deal though! would it be possible to add a keyword ("breakif" in this case) that was ONLY legal in comprehensions? Though I still dop'nt think using "while" would really introduce that much confusion -- sure, it doesn't introduce a new loop, but, as someone pointed out earlier in this thread it really is only changing from a: "while do" to a "do while" construct -- so means pretty much the same thing. I agree that scanning a code base to see if there really are many loops in practice that could use this construct would be a good way to see if there is any point. And it would also be interesting to do a survey of "random" folks as to how they would interpret such a construct -- it's pretty hard for a small group to know what is and isn't "confusing" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Though I still dop'nt think using "while" would really introduce that much confusion -- sure, it doesn't introduce a new loop, but, as someone pointed out earlier in this thread it really is only changing from a: "while do" to a "do while" construct -- so means pretty much the same thing. +1 for the "while" from me too, I don't think most people would find it confusing (supposing they don't find current [x for x in foo if ...] confusing either), and introducing a break there is just more of a mess. To those who say that it might get ugly if you do something like: [x for y in foo for x in y while x != y if x + y < 100] This still isn't even unbearable, and once it gets that hard, maybe you should consider something else anyways.

Hi Soni, and welcome! On Mon, Aug 07, 2017 at 04:30:05PM -0300, Soni L. wrote:
What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar for (x for x in range(1000, 1000000))?
If you want the integers from 1000 to 1000000, use: range(1000, 1000000) Don't waste your time slowing down the code with an unnecessary and pointless wrapper that does nothing but pass every value on unchanged: (x for x in range(1000, 1000000)) # waste of time and effort -- Steve

Soni L. writes: Steven d'Aprano writes:
range(1000, 1000000) (x for x in range(1000, 1000000)) # waste of time and effort
Actually, those have different semantics!
That's not real important. As Stefan Behnel points out, it's simple (and efficient) to get iterator semantics by using iter(). The big issue here is that Python is not the kind of declarative language where (x for x in int if 1_000 ≤ x ≤ 1_000_000)[1] is natural to write, let alone easy to implement efficiently. Aside from the problem of (x for x in float if 1_000 ≤ x ≤ 1_000_000) (where the answer is "just don't do that"), I can't think of any unbounded collections in Python that aren't iterables, except some types. That makes Steven's criticism pretty compelling. If you need to design a collection's __iter__ specially to allow it to decide whether the subset that satisfies some condition is exhausted, why not just subclass some appropriate existing collection with a more appropriate __iter__? Footnotes: [1] See what I did there? ;-)
participants (13)
-
Brice Parent
-
Chris Angelico
-
Chris Barker
-
Guido van Rossum
-
Markus Meskanen
-
Nick Coghlan
-
Paul Moore
-
Soni L.
-
Spencer Brown
-
Stefan Behnel
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy