while conditional in list comprehension ??

Dear all, I guess this is so obvious that someone must have suggested it before: in list comprehensions you can currently exclude items based on the if conditional, e.g.: [n for n in range(1,1000) if n % 4 == 0] Why not extend this filtering by allowing a while statement in addition to if, as in: [n for n in range(1,1000) while n < 400] Trivial effect, I agree, in this example since you could achieve the same by using range(1,400), but I hope you get the point. This intuitively understandable extension would provide a big speed-up for sorted lists where processing all the input is unnecessary. Consider this: some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # a sorted list of names [n for n in some_names if n.startswith("A")] # certainly gives a list of all names starting with A, but . [n for n in some_names while n.startswith("A")] # would have saved two comparisons Best, Wolfgang

On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
The time machine strikes again! Check out itertools.takewhile - it can do pretty much that: import itertools [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] It's not quite list comp notation, but it works.
ChrisA

Isn't "while" kind just the "if" of a looping construct? Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000) if n < 400]? I guess your kind of looking for an "else break" feature to exit the list comprehension before evaluating all the input values. Wouldn't that complete the "while()" functionality? Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 28, 2013, at 5:59 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

On Tue, Jan 29, 2013 at 1:32 AM, Shane Green <shane@umbrellacode.com> wrote:
In the specific case given, they'll produce the same result, but there are two key differences: 1) If the condition becomes true again later in the original iterable, the 'if' will pick up those entries, but the 'while' won't; and 2) The 'while' version will not consume more than the one result that failed to pass the condition. I daresay it would be faster and maybe cleaner to implement this with a language feature rather than itertools.takewhile, but list comprehensions can get unwieldy too; is there sufficient call for this to justify the syntax? ChrisA

Yeah, I realized (1) after a minute and came up with "else break": if n < 400 else break. Could that be functionally equivalent, not based on a loop construct within an iterator? Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 28, 2013, at 6:43 AM, Chris Angelico <rosuav@gmail.com> wrote:

On Mon, Jan 28, 2013 at 9:51 AM, Shane Green <shane@umbrellacode.com> wrote:
You mean: `[n for n in range(0, 400) if n < 100 else break]`? That is definitely more obvious (in my opinion) than using the while syntax, but what does `break` mean in the context of a list comprehension? I understand the point, but I dislike the execution. I guess coming from a background in pure mathematics, this just seems wrong for a list (or set) comprehension.

On Tue, Jan 29, 2013 at 2:17 AM, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
It's easy enough in the simple case. What would happen if you added an "else break" to this: [(x,y) for x in range(10) for y in range(2) if x<3] Of course, this would be better written with the if between the two fors, but the clarity isn't that big a problem when it's not going to change the result. Would it be obvious that the "else break" would only halt the "for y" loop? ChrisA

On Mon, Jan 28, 2013 at 8:59 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
The while syntax definitely reads better, and I would guess that dis could clarify how much more efficient using `if n < 400` would be compared to the lambda. Then again this is a rather uncommon situation and it could be handled with the if syntax. Also, if we recall the zen of python "There should be one-- and preferably only one --obvious way to do it." which is argument enough against the `while` syntax.

On Mon, Jan 28, 2013 at 5:33 AM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
-1 This isn't adding a feature that the language can't currently perform. It can, with itertools, with an explicit 'for' loop and probably other methods. List comprehensions are a useful shortcut that should be kept as simple as possible. The semantics of the proposed 'while' aren't immediately obvious, which makes it out of place in list comprehensions, IMO. Eli

I thought everything that can be done with a list comprehension can also be done with an explicit 'for' loop! So following your logic, one would have to remove comprehensions from the language altogether. In terms of semantics I do not really see what isn't immediately obvious about my proposal. Since the question of use cases was brought up: I am working as a scientist, and one of the uses I thought of when proposing this was that it could be used in combination with any kind of iterator that can yield an infinite number of elements, but you only want the first few elements up to a certain value (note: this is related to, but not the same as saying I want a certain number of elements from the iterator). Let´s take the often used example of the Fibonacci iterator and assume you have an instance 'fibo' of its iterable class implementation, then: [n for n in fibo while n <10000] would return a list with all Fibonacci numbers that are smaller than 10000 (without having to know in advance how many such numbers there are). Likewise, with prime numbers and a 'prime' iterator: [n for n in prime while n<10000] and many other scientifically useful numeric sequences. I would appreciate such a feature, and, even though everything can be solved with itertools, I think it´s too much typing and thinking for generating a list quickly. Best, Wolfgang

On Mon, Jan 28, 2013 at 11:19 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Sarcasm will not help your argument. The difference (as I would expect you to know) between the performance of a list comprehension and an explict `for` loop is significant and the comprehension is already a feature of the language. Removing it would be nonsensical.
This is definitely a problematic use case for a simple list comprehension, but the takewhile solution works exactly as expected and even resembles your solution. It is in the standard library and it's performance seems to be fast enough (to me at least, on a 10 year old laptop). And the key phrase here is "simple list comprehension". Yours is in theory a simple list comprehension but is rather a slightly more complex case that can be handled in a barely more complex way. itertools is a part of the standard library that needs more affection, in my opinion and really does its best to accommodate these more complex cases in sensible ways. I am still -1 on this. Cheers, Ian

Ok, I am sorry for the sarcasm. Essentially this is exactly what I wanted to say with it. Because comprehensions are faster than for loops, I am using them, and this is why I'd like the while feature in them. I fully agree with everybody here that itertools provides a solution for it, but imagine for a moment the if clause wouldn't exist and people would point you to a similar itertools solution for it, e.g.: [n for n in itertools.takeif(lambda n: n % 4 == 0, range(1,1000))] What would you prefer? I think it is true that this is mostly about how often people would make use of the feature. And, yes, it was a mistake to disturb the ongoing voting with sarcasm. Best, Wolfgang
This is definitely a problematic use case for a simple list comprehension, but the takewhile solution works exactly as expected and even resembles your solution. It is in the standard library and it's performance seems to be fast enough (to me at least, on a 10 year old laptop). And the key phrase here is "simple list comprehension". Yours is in theory a simple list comprehension but is rather a slightly more complex case that can be handled in a barely more complex way. itertools is a part of the standard library that needs more affection, in my opinion and really does its best to accommodate these more complex cases in sensible ways. I am still -1 on this. Cheers, Ian

On 01/28/2013 05:33 AM, Wolfgang Maier wrote:
What happens when you want the names that start with 'B'? The advantage of 'if' is it processes the entire list so grabs all items that match, and the list does not have to be ordered. The disadvantage (can be) that it processes the entire list. Given that 'while' would only work on sorted lists, and could only start from the beginning, I think it may be too specialized. But I wouldn't groan if someone wanted to code it up. :) +0 ~Ethan~

Ethan Furman <ethan@...> writes:
I thought about this question, and I agree this is not what the while clause would be best for. However, currently you could solve tasks like this with itertools.takewhile in the following (almost perl-like) way (I illustrate things with numbers to keep it simpler): l=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] # now retrieve all numbers from 10 to 19 (combining takewhile and slicing) [n for n in itertools.takewhile(lambda n:n<20,l[len([x for x in itertools.takewhile(lambda x:x<10,l)]):])] Nice, isn't it? If I am not mistaken, then with my suggestion this would at least simplify to: [n for n in l[len([x for x in l while x<10]):] while n<20] Not great either, I admit, but at least it's fun to play this mindgame. Best, Wolfgang

On 1/28/2013 8:33 AM, Wolfgang Maier wrote:
Dear all, I guess this is so obvious that someone must have suggested it before:
No one who understands comprehensions would suggest this.
Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You want to break, not filter; and you are depending on the order of the items from the iterator. Comprehensions are a math-logic idea invented for (unordered) sets and borrowed by computer science and extended to sequences. However, sequences do not replace sets. https://en.wikipedia.org/wiki/Set-builder_notation https://en.wikipedia.org/wiki/List_comprehension Python has also extended the idea to dicts and iterators and uses almost exactly the same syntax for all 4 variations.
[n for n in range(1,1000) while n < 400]
This would translate as def _temp(): res = [] for n in range(1, 1000): while n < 400): res.append(n) return res _temp() which makes an infinite loop, not a truncated loop. What you actually want is res = [] for n in range(1, 1000): if >= 400): break res.append(n) which is not the form of a comprehension. -- Terry Jan Reedy

On 28 January 2013 23:27, Terry Reedy <tjreedy@udel.edu> wrote:
That's a little strong.
Python's comprehensions are based on iterators that are inherently ordered (although in some cases the order is arbitrary). In the most common cases the comprehensions produce lists or generators that preserve the order of the underlying iterable. I find that the cases where the order of an iterable is relevant are very common in my own usage of iterables and of comprehensions.
Although dicts and sets should be considered unordered they may still be constructed from a naturally ordered iterable. There are still cases where it makes sense to define the construction of such an object in terms of an order-dependent rule on the underlying iterator.
I guess this is what you mean by "No one who understands comprehensions would suggest this." Of course those are not the suggested semantics but I guess from this that you would object to a while clause that had a different meaning.
The form of a comprehension is not unchangeable. Oscar

On Mon, Jan 28, 2013 at 8:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Technically they are not inherently ordered. You give the perfect example below.
They may be, but they may also be constructed from an unordered iterable. How so? Let `d` be a non-empty dictionary, and `f` a function that defines some mutation of it's input such that there doesn't exist x such that x = f(x). e = {k: f(v) for k, v in d.items()} You're taking an unordered object (a dictionary) and making a new one from it. An order dependent rule here would not make sense. Likewise, if we were to do: e = [(k, f(v)) for k, v in d.items()] We're creating order from an object in which there is none. How could the while statement be useful there? An if statement works fine. A `while` statement as suggested wouldn't.
They are not the suggested semantics. You are correct. But based upon how list comprehensions are currently explained, one would be reasonable to expect a list comprehension with `while` to operate like this.
Agreed it is definitely mutable. I am just of the opinion that this is one of those instances where it shouldn't be changed.

On 29 January 2013 01:12, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
I was referring to the case of constructing an object that does not preserve order by iterating over an object that does. Clearly a while clause would be a lot less useful if you were iterating over an object whose order was arbitrary: so don't use it in that case. A (contrived) example - caching Fibonacci numbers: # Fibonacci number generator def fib(): a = b = 1 while True: yield a a, b = b, a+b # Cache the first N fibonacci numbers fib_cache = {n: x for n, x in zip(range(N), fib())} # Alternative fib_cache = {n: x for n, x in enumerate(fib()) while n < N} # Cache the Fibonacci numbers less than X fib_cache = {} for n, x in enumerate(fib()): if x > X: break fib_cache[n] = x # Alternative 1 fib_cache = {n: x for n, x in enumerate(takewhile(lambda x: x < X, fib()))} # Alternative 2 fib_cache = {n: x for n, x in enumerate(fib()) while x < X} Oscar

On Mon, Jan 28, 2013 at 8:34 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Yeah, I'm not sure how well telling someone to use a construct of the language will go over.
As contrived as it may be, it is a good example. Still, I dislike the use of `while` and would rather Steven's suggestion of `until` were this to be included. This would make `until` a super special case, but then again, this construct seems special enough that only a few examples of its usefulness can be constructed. I guess I'm more -0 with `until` than -1. Thanks for the extra example Oscar. It was helpful. Cheers, Ian

On 29/01/13 10:27, Terry Reedy wrote:
Why would it translate that way? That would be a silly decision to make. Python can decide on the semantics of a while clause in a comprehension in whatever way makes the most sense, not necessarily according to some mechanical, nonsensical translation. We could easily decide that although [n for n in range(1,1000) if n < 400] has the semantics of: res = [] for n in range(1, 1000): if n < 400): res.append(n) [n for n in range(1,1000) while n < 400] could instead have the semantics of: res = [] for n in range(1, 1000): if not (n < 400): break res.append(n) If it were decided that reusing the while keyword in this way was too confusing (which doesn't seem likely, since it is a request that keeps coming up over and over again), we could use a different keyword: [n for n in range(1,1000) until n >= 400]
Why not? Was the idea of a comprehension handed down from above by a deity, never to be changed? Or is it a tool to be changed if the change makes it more useful? Mathematical set builder notation has no notion of "break" because it is an abstraction. It takes exactly as much effort (time, memory, resources, whatever) to generate these two mathematical sets: {1} {x for all x in Reals if x == 1} (using a hybrid maths/python notation which I hope is clear enough). To put it another way, mathematically the list comp [p+1 for p in primes()] is expected to run infinitely fast. But clearly Python code is not a mathematical abstraction. So the fact that mathematical set builder notation does not include any way to break out of the loop is neither here nor there. Comprehensions are code, and need to be judged as code, not abstract mathematical identities. -- Steven

On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Terry is correct: comprehensions are deliberately designed to have the exact same looping semantics as the equivalent statements flattened out into a single line, with the innermost expression lifted out of the loop body and placed in front. This then works to arbitrarily deep nesting levels. The surrounding syntax (parentheses, brackets, braces, and whether or not there is a colon present in the main expression) then governs what kind of result you get (generator-iterator, list, set, dict). For example in: (x, y, z for x in a if x for y in b if y for z in c if z) [x, y, z for x in a if x for y in b if y for z in c if z] {x, y, z for x in a if x for y in b if y for z in c if z} {x: y, z for x in a if x for y in b if y for z in c if z} The looping semantics of these expressions are all completely defined by the equivalent statements: for x in a: if x: for y in b: if y: for z in c: if z: (modulo a few name lookup quirks if you're playing with class scopes) Any attempt to change that fundamental equivalence between comprehensions and the corresponding statements has basically zero chance of getting accepted through the PEP process. The only remotely plausible proposal I've seen in this thread is the "else break" on the filter conditions, because that *can* be mapped directly to the statement form in order to accurately describe the intended semantics. However, it would fail the "just use itertools.takewhile or a custom iterator, that use case isn't common enough to justify dedicated syntax". The conceptual basis of Python's comprehensions in mathematical set notation would likely also play a part in rejecting an addition that requires an inherently procedural interpretation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 January 2013 09:51, yoav glazner <yoavglazner@gmail.com> wrote:
Great. I think this nails it. It is exactly the intended behavior, and very readable under current language capabilities. One does not have to stop and go read what "itertools.takewhile" does, and mentally unfold the lambda guard expression - that is what makes this (and the O.P. request) more readable than using takewhile. Note: stop can also just explictly raise StopIteration - or your next(iter([])) expression can be inlined within the generator. It works in Python 3 as well - though for those who did not test: it won't work for list, dicr or set comprehensions - just for generator expressions. js -><-

On Tue, Jan 29, 2013 at 10:35 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
This actually prompted an interesting thought for me. The statement-as-expression syntactic equivalent of the "else stop()" construct would actually be "else return", rather than "else break", since the goal is to say "we're done", regardless of the level of loop nesting. It just so happens that, inside a generator (or generator expression) raising StopIteration and returning from the generator are very close to being equivalent operations, which is why the "else stop()" trick works. In a 3.x container comprehension, the inner scope is an ordinary function, so the equivalence between returning from the function and raising StopIteration is lost. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 31 January 2013 08:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not sure if it is the goal to be able to break out of any level of nesting or at least that's not how I interpreted the original proposal. It is what happens for this stop() function but only because there's no other way. Personally I don't mind as I generally avoid multiple-for comprehensions; by the time I've written one out I usually decide that it would be more readable as ordinary for loops or with a separate function.
I don't really understand what you mean here. What is the difference between comprehensions in 2.x and 3.x? Oscar

On 1/31/2013 6:08 AM, Oscar Benjamin wrote:
In 2.x, (list) conprehensions are translated to the equivalent nested for and if statements and compiled and executed in place. In 3.x, the translation is wrapped in a temporary function that is called and then discarded. The main effect is to localize the loop names, the 'i' in '[i*2 for i in iterable]', for instance. -- Terry Jan Reedy

As I'm reading it, the while clause would be evaluated once for each iteration of the innermost loop, and would be able to access the current values of every loop. *ls = [ line for file in files for line in file while file or line ]* ...would become... ls = [] for file in files: for line in file: if not (file or line): break # break if not while_expression * ls.append(line) *

As I'm reading it, the while clause would be evaluated once for each iteration of the innermost loop, and would be able to access the current values of every loop. *ls = [ line for file in files for line in file while file or line ]* ...would become... ls = [] for file in files: for line in file: if not (file or line): break # break if not while_expression * ls.append(line)*

On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin@gmail.com> wrote:
20,
I know I'm showing my ignorance here, but how are list/dict/set comprehensions and generator expressions implemented differently that one's for loop will catch a StopIteration and the others won't? Would it make sense to reimplement list/dict/set comprehensions as an equivalent generator expression passed to the appropriate constructor, and thereby allow the StopIteration trick to work for each of them as well? Regards, Zach Ware

On 29 January 2013 13:34, Zachary Ware <zachary.ware+pyideas@gmail.com> wrote:
That is because whil list/set/dict constructors are sort of "self contained", a generator expression - they will expect the StopIteration to be raised by the iterator in the "for" part os the expression. The generator expression, on the other hand, is an iterator in itself, and it is expected to raise a StopIteration sometime. The code put aroundit to actually execute it will catch the StopIteration - and it won't care wether it was raised by the for iterator or by any other expression in the iterator. I mean - when you do list(bla for bla in blargh) the generator is exausted inside the "list" call - and this generator exaustion is signaled by the StopIteration exception in both cases.

On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas@gmail.com> wrote:
A for loop is like a while loop with a try/except handler for StopIteration. So the following are roughly equivalent: # For loop for x in iterable: func1(x) else: func2() # Equivalent loop it = iter(iterable) while True: try: x = next(it) except StopIteration: func2() break func1(x) A list comprehension is just like an implicit for loop with limited functionality so it looks like: # List comp results = [func1(x) for x in iterable if func2(x)] # Equivalent loop results = [] it = iter(iterable) while True: try: x = next(it) except StopIteration: break # This part is outside the try/except if func2(x): results.append(func1(x)) The problem in the above is that we only catch StopIteration around the call to next(). So if either of func1 or func2 raises StopIteration the exception will propagate rather than terminate the loop. (This may mean that it terminates a for loop higher in the call stack - which can lead to confusing bugs - so it's important to always catch StopIteration anywhere it might get raised.) The difference with the list(generator) version is that func1() and func2() are both called inside the call to next() from the perspective of the list() function. This means that if they raise StopIteration then the try/except handler in the enclosing list function will catch it and terminate its loop. # list(generator) results = list(func1(x) for x in iterable if func2(c)) # Equivalent loop: def list(iterable): it = iter(iterable) results = [] while True: try: # Now func1 and func2 are both called in next() here x = next(it) except StopIteration: break results.append(x) return results results_gen = (func1(x) for x in iterable if func2(x)) results = list(results_gen) Oscar

On Jan 29, 2013 10:02 AM, "Oscar Benjamin" <oscar.j.benjamin@gmail.com> wrote:
On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas@gmail.com>
wrote:
That makes a lot of sense. Thank you, Oscar and Joao, for the explanations. I wasn't thinking in enough scopes :) Regards, Zach Ware

Oscar Benjamin <oscar.j.benjamin@...> writes:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())! With the somewhat simpler (suggested earlier by Shane) def stop(): raise StopIteration this should become part of the python cookbook!! Thanks a real lot for working this out, Wolfgang

On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())!
I couldn't believe it so I had to check it: from __future__ import print_function import functools, itertools, operator, timeit def var1(): def _gen(): for i in range(100): if i > 50: break yield i return list(_gen()) def var2(): def stop(): raise StopIteration return list(i for i in range(100) if i <= 50 or stop()) def var3(): return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] def var4(): return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] if __name__ == '__main__': for f in (var1, var2, var3, var4): print(f.__name__, end=' ') print(timeit.timeit(f)) Results on my machine: var1 20.4974410534 var2 23.6218020916 var3 32.1543409824 var4 4.90913701057 var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? var3 is the slowest. I guess, because it has lambda in it. But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. -- Kind regards, Yuriy.

Although it's not always viable, given how easy it is to wrap an iterator, it seems like might come in handy for comprehensions. [x for x in items if x < 50 or items.close()] Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com

On 30/01/13 02:44, Wolfgang Maier wrote:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())!
I think you are mistaken about the speed. The itertools iterators are highly optimized and do all their work in fast C code. If you are seeing takewhile as slow, you are probably doing something wrong: untrustworthy timing code, misinterpreting what you are seeing, or some other error. Here's a comparison done the naive or obvious way. Copy and paste it into an interactive Python session: from itertools import takewhile from timeit import Timer def stop(): raise StopIteration setup = 'from __main__ import stop, takewhile' t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup) t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup) min(t1.repeat(number=100000, repeat=5)) min(t2.repeat(number=100000, repeat=5)) On my computer, t1 is about 1.5 times faster than t2. But this is misleading, because it's not takewhile that is slow. I am feeding something slow into takewhile. If I really need to run as fast as possible, I can optimize the function call inside takewhile: from operator import lt from functools import partial small_enough = partial(lt, 50) setup2 = 'from __main__ import takewhile, small_enough' t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2) min(t3.repeat(number=100000, repeat=5)) On my computer, t3 is nearly 13 times faster than t1, and 19 times faster than t2. Here are the actual times I get, using Python 2.7: py> min(t1.repeat(number=100000, repeat=5)) # using the StopIteration hack 1.2609241008758545 py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda 1.85182785987854 py> min(t3.repeat(number=100000, repeat=5)) # optimized version 0.09847092628479004 -- Steven

Yuriy Taraday <yorik.sar@...> writes:
On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier
<wolfgang.maier@biologie.uni-freiburg.de> wrote:
Steven D'Aprano <steve@...> writes:
Hi Yuriy and Steven, a) I had compared the originally proposed 'takewhile with lambda' version to the 'if cond or stop()' solution using 'timeit' just like you did. In principle, you find the same as I did, although I am a bit surprised that our differences are different. To be exact 'if cond or stop()' was 1.84 x faster in my hands than 'takewhile with lambda'. b) I have to say I was very impressed by the speed gains you report through the use of 'partial', which I had not thought of at all, I have to admit. However, I tested your suggestions and I think they both suffer from the same mistake: your condition is 'partial(lt,50)', but this is not met to begin with and results in an empty list at least for me. Have you two actually checked the output of the code or have you just timed it? I found that in order to make it work the comparison has to be made via 'partial(gt,50)'. With this modification the resulting list in your example would be [0,..,49] as it should be. And now the big surprise in terms of runtimes: partial(lt,50) variant: 1.17 (but incorrect results) partial(gt,50) variant: 13.95 if cond or stop() variant: 9.86 I guess python is just smart enough to recognize that it compares against a constant value all the time, and optimizes the code accordingly (after all the if clause is a pretty standard thing to use in a comprehension). So the reason for your reported speed-gain is that you actually broke out of the comprehension at the very first element instead of going through the first 50! Please comment, if you get different results. Best, Wolfgang

Although it's a bit of a cheat, if you create a wrapper of the thing you're iterating, or don't mind closing it (it's probably best to wrap it unless you know what it is), both generators and list comprehensions can be "while iterated" using this approach: [item for item in items if condition or items.close()] When I tested it earlier with a 1000 entries 5 times and had forgotten the parens on close(), it made it really obvious there would be times when the wrapping overhead wasn't a problem: On Jan 30, 2013, at 9:02 AM, Shane Green <shane.green@me.com> wrote:
Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 30, 2013, at 10:05 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

On 31/01/13 05:05, Oscar Benjamin wrote:
Funny you say that, clarity of code and ease of understanding is exactly why I dislike this stop() idea. 1) It does not work with list, dict or set comprehensions, only with generator expressions. So if you need a list, dict or set, you have to avoid the obvious list/dict/set comprehension. 2) It is fragile: it is easy enough to come up with examples of the above that *appear* to work: [i for i in range(20) if i < 50 or stop()] # appears to work fine [i for i in range(20) if i < 10 or stop()] # breaks 3) It reads wrong for a Python boolean expression. Given an if clause: if cond1() or cond2() you should expect that an element is generated if either cond1 or cond2 are true. When I see "if cond1() or stop()" I don't read it as "stop if not cond1()" but as a Python bool expression, "generate an element if cond1() gives a truthy value or if stop() gives a truthy value". This "if cond or stop()" is a neat hack, but it's still a hack, and less readable and understandable than I expect from Python code. -- Steven

On 30 January 2013 22:56, Steven D'Aprano <steve@pearwood.info> wrote:
That's true. I would prefer it if a similar effect were achievable in these cases.
As I said I would prefer a solution that would work for list comprehensions but there isn't one so the stop() method has to come with the caveat that it can only be used in that way. That said, I have become used to using a generator inside a call to dict() or set() (since the comprehensions for those cases were only recently added) so it doesn't seem a big problem to rewrite the above with calls to list(). You are right, though, that a bug like this would be problematic. If the StopIteration leaks up the call stack into a generator that is being for-looped then it creates a confusing debug problem (at least it did the first time I encountered it).
Again I would have preferred 'else break' or something clearer but this seems the best available (I'm open to suggestions).
This "if cond or stop()" is a neat hack, but it's still a hack, and less readable and understandable than I expect from Python code.
It is a hack (and I would prefer a supported method) but my point was that both you and Yuriy wrote the wrong code without noticing it. You both posted it to a mailing list where no one else noticed until someone actually tried running the code. In other words it wasn't obvious that the code was incorrect just from looking at it. This one looks strange but if you knew what stop() was then you would understand it: list(x for x in range(100) if x < 50 or stop()) This one is difficult to mentally parse even if you understand all of the constituent parts: [x for x in takewhile(partial(lt, 50), range(100))] Oscar

On 30/01/13 20:46, Wolfgang Maier wrote:
Yes, you are absolutely correct. I screwed that up badly. I can only take comfort that apparently so did Yuriy. I don't often paste code in public without testing it, but when I do, it invariably turns out to be wrong.
I do not get such large differences. I get these: py> min(t1.repeat(number=100000, repeat=5)) # cond or stop() 1.2582030296325684 py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda 1.9907748699188232 py> min(t3.repeat(number=100000, repeat=5)) # takewhile and partial 1.8741891384124756 with the timers t1, t2, t3 as per my previous email.
No, it is much simpler than that. partial(lt, 50) is equivalent to: lambda x: lt(50, x) which is equivalent to 50 < x, *not* x < 50 like I expected. So the function tests 50 < 0 on the first iteration, which is False, and takewhile immediately returns, giving you an empty list. I was surprised that partial was *so much faster* than a regular function. But it showed me what I expected/wanted to see, and so I didn't question it. A lesson for us all.
So the reason for your reported speed-gain is that you actually broke out of the comprehension at the very first element instead of going through the first 50!
Correct. -- Steven

On Tue, Jan 29, 2013 at 8:59 PM, Shane Green <shane@umbrellacode.com> wrote:
Unfortunately "else break" also kind of falls flat on its face when you consider it's being used in context of an expression.
Not really, since comprehensions are all about providing expression forms of the equivalent statements. I'm not saying "else break" would get approved (I actually don't think that's likely for other reasons), just that it isn't clearly dead in the water due to the inconsistency with the statement semantics (which is the core problem with the "while" suggestion). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks Nick, that is really helpful, as I can now see where the problem really lies for the developer team. I agree that under these circumstances my suggestion is inacceptable. You know, I am just a python user, and I don't know about your development paradigms. Knowing about them, let me make a wild suggestion (and I am sure it has no chance of getting accepted either, it's more of a test to see if I understood the problem): You could introduce a new 'breakif <condition>' statement, which would be equivalent to 'if <condition>: break'. Its use as a standalone statement could be allowed (but since its equivalent is already very simple it would be a very minor change). In addition, however the 'breakif' could be integrated into comprehensions just like 'if', and could be translated directly into loops of any nesting level without ambiguities. another note: in light of your explanation, it looks like the earlier suggestion of 'else break' would also work without ambiguities since with the rigid logic applied, there would be no doubt which of several 'for' loops gets broken by the 'break'. Thanks for any comments on this (and please :), don't yell at me for asking for a new keyword to achieve something minor, I already understood that part). Best, Wolfgang

On Tue, Jan 29, 2013 at 10:03 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Thanks for any comments on this (and please :), don't yell at me for asking for a new keyword to achieve something minor, I already understood that part).
I try not to do that - the judgement calls we have to make in designing the language don't always have obvious solutions, and part of the reason python-ideas exists is as a place for people to share ideas that turn out to be questionable, for the sake of uncovering those ideas that turn out to be worthwhile. I've had several proposals make their way into Python over the years, but they're still outnumbered by the ones which didn't make it (many because I decided not to propose them in the first place, but quite a few others because people on python-ideas and python-dev pointed out flaws, drawbacks and inconsistencies that I had missed). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29/01/13 21:44, Nick Coghlan wrote:
You have inadvertently supported the point I am trying to make: what is *deliberately designed* by people one way can be deliberately designed another way instead. List comps have the form, and limitations, they have because of people's decisions. People could decide differently. A while clause in a comprehension can map to the same statement form as currently used. Just because the parser sees "while" inside a comprehension doesn't mean that the underlying implementation has to literally insert a while loop inside a for-loop. Terry is right about one thing: that would lead to an entirely pointless infinite loop. Where Terry gets it wrong is to suppose that the only *conceivable* way to handle syntax that looks like [x for x in seq while condition] is to insert a while loop inside a for loop. But "while" is just a convenient keyword that looks good, is readable, and has a natural interpretation as executable pseudo-code. We could invent a new keyword if we wished, say "jabberwock", and treat "jabberwock cond" inside a comprehension as equivalent to "if cond else break": (x, y for x in a jabberwock x for y in b jabberwock y) for x in a: if x: for y in b: if y: yield (x, y) else: break else: break If you, as a core developer, tell me that in practice this would be exceedingly hard for the CPython implementation to do, I can only trust your opinion since I am not qualified to argue. But since you've already allowed that permitting "if cond else break" in comprehensions would be possible, I find it rather difficult to believe that spelling it "jabberwock cond" is not.
The only remotely plausible proposal I've seen in this thread is the "else break" on the filter conditions,
Which just begs for confusion and misunderstanding. Just wait until people start asking why they can't write "else some_expression", and we have to explain that inside a comprehension, the only thing allowed to follow "else" is "break". -- Steven

On 1/29/2013 10:02 AM, Rob Cliffe wrote:
The reference manual does spell it out: "In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached." Perhaps a non-trivial concrete example (say 4 levels deep) would help people understand that better. -- Terry Jan Reedy

On 29/01/13 00:33, Wolfgang Maier wrote:
Comprehensions in Clojure have this feature. http://clojuredocs.org/clojure_core/clojure.core/for ;; :when continues through the collection even if some have the ;; condition evaluate to false, like filter user=> (for [x (range 3 33 2) :when (prime? x)] x) (3 5 7 11 13 17 19 23 29 31) ;; :while stops at the first collection element that evaluates to ;; false, like take-while user=> (for [x (range 3 33 2) :while (prime? x)] x) (3 5 7) So there is precedent in at least one other language for this obvious and useful feature. -- Steven

|>>> def notyet(cond) : if cond : raise StopIteration return True |>>> list(x for x in range(100) if notyet(x>10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

How funny… I tried a variation of that because one of my original thoughts had been "[… if x else raise StopIteration()]" may have also made some sense. But I tried it based on the example from earlier, and hadn't even considered it was even closer…. Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 29, 2013, at 5:53 AM, Boris Borcic <bborcic@gmail.com> wrote:

Are you trying to say you entered that code and it ran? I would be very surprised: if you could simply 'raise StopIteration' within the 'if' clause then there would be no point to the discussion. But as it is, your StopIteration should not be caught by the 'for', but will be raised directly. Did you try running it?

Here's what I was doing, and worked when i switched to the generator:
def stop(): … raise StopIteration()
list(((x if x < 5 else stop()) for x in range(10))) [0, 1, 2, 3, 4]
Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 29, 2013, at 6:36 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

yoav glazner <yoavglazner@...> writes:
Joao S. O. Bueno <jsbueno@...> writes:
Shane Green <shane@...> writes:
Wow, thanks to the three of you! I think it's still not as clear what the code does as it would be with my 'while' suggestion. Particularly, the fact that this is not a simple 'if'-or-not decision for individual elements of the list, but in fact terminates the list with the first non-matching element (the while-like property) can easily be overlooked. However, I find it much more appealing to use built-in python semantics than to resort to the also hard to read itertools.takewhile(). In addition, this is also the fastest solution that was brought up so far. In my hands, it runs about 2x as fast as the equivalent takewhile construct, which in turn is just marginally faster than Boris Borcic's suggestion: |>>> def notyet(cond) : if cond : raise StopIteration return True |>>> list(x for x in range(100) if notyet(x>10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] I guess, I'll use your solution in my code from now on. Best, Wolfgang

On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
The time machine strikes again! Check out itertools.takewhile - it can do pretty much that: import itertools [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] It's not quite list comp notation, but it works.
ChrisA

Isn't "while" kind just the "if" of a looping construct? Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000) if n < 400]? I guess your kind of looking for an "else break" feature to exit the list comprehension before evaluating all the input values. Wouldn't that complete the "while()" functionality? Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 28, 2013, at 5:59 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

On Tue, Jan 29, 2013 at 1:32 AM, Shane Green <shane@umbrellacode.com> wrote:
In the specific case given, they'll produce the same result, but there are two key differences: 1) If the condition becomes true again later in the original iterable, the 'if' will pick up those entries, but the 'while' won't; and 2) The 'while' version will not consume more than the one result that failed to pass the condition. I daresay it would be faster and maybe cleaner to implement this with a language feature rather than itertools.takewhile, but list comprehensions can get unwieldy too; is there sufficient call for this to justify the syntax? ChrisA

Yeah, I realized (1) after a minute and came up with "else break": if n < 400 else break. Could that be functionally equivalent, not based on a loop construct within an iterator? Shane Green www.umbrellacode.com 408-692-4666 | shane@umbrellacode.com On Jan 28, 2013, at 6:43 AM, Chris Angelico <rosuav@gmail.com> wrote:

On Mon, Jan 28, 2013 at 9:51 AM, Shane Green <shane@umbrellacode.com> wrote:
You mean: `[n for n in range(0, 400) if n < 100 else break]`? That is definitely more obvious (in my opinion) than using the while syntax, but what does `break` mean in the context of a list comprehension? I understand the point, but I dislike the execution. I guess coming from a background in pure mathematics, this just seems wrong for a list (or set) comprehension.

On Tue, Jan 29, 2013 at 2:17 AM, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
It's easy enough in the simple case. What would happen if you added an "else break" to this: [(x,y) for x in range(10) for y in range(2) if x<3] Of course, this would be better written with the if between the two fors, but the clarity isn't that big a problem when it's not going to change the result. Would it be obvious that the "else break" would only halt the "for y" loop? ChrisA

On Mon, Jan 28, 2013 at 8:59 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
The while syntax definitely reads better, and I would guess that dis could clarify how much more efficient using `if n < 400` would be compared to the lambda. Then again this is a rather uncommon situation and it could be handled with the if syntax. Also, if we recall the zen of python "There should be one-- and preferably only one --obvious way to do it." which is argument enough against the `while` syntax.

On Mon, Jan 28, 2013 at 5:33 AM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
-1 This isn't adding a feature that the language can't currently perform. It can, with itertools, with an explicit 'for' loop and probably other methods. List comprehensions are a useful shortcut that should be kept as simple as possible. The semantics of the proposed 'while' aren't immediately obvious, which makes it out of place in list comprehensions, IMO. Eli

I thought everything that can be done with a list comprehension can also be done with an explicit 'for' loop! So following your logic, one would have to remove comprehensions from the language altogether. In terms of semantics I do not really see what isn't immediately obvious about my proposal. Since the question of use cases was brought up: I am working as a scientist, and one of the uses I thought of when proposing this was that it could be used in combination with any kind of iterator that can yield an infinite number of elements, but you only want the first few elements up to a certain value (note: this is related to, but not the same as saying I want a certain number of elements from the iterator). Let´s take the often used example of the Fibonacci iterator and assume you have an instance 'fibo' of its iterable class implementation, then: [n for n in fibo while n <10000] would return a list with all Fibonacci numbers that are smaller than 10000 (without having to know in advance how many such numbers there are). Likewise, with prime numbers and a 'prime' iterator: [n for n in prime while n<10000] and many other scientifically useful numeric sequences. I would appreciate such a feature, and, even though everything can be solved with itertools, I think it´s too much typing and thinking for generating a list quickly. Best, Wolfgang

On Mon, Jan 28, 2013 at 11:19 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Sarcasm will not help your argument. The difference (as I would expect you to know) between the performance of a list comprehension and an explict `for` loop is significant and the comprehension is already a feature of the language. Removing it would be nonsensical.
This is definitely a problematic use case for a simple list comprehension, but the takewhile solution works exactly as expected and even resembles your solution. It is in the standard library and it's performance seems to be fast enough (to me at least, on a 10 year old laptop). And the key phrase here is "simple list comprehension". Yours is in theory a simple list comprehension but is rather a slightly more complex case that can be handled in a barely more complex way. itertools is a part of the standard library that needs more affection, in my opinion and really does its best to accommodate these more complex cases in sensible ways. I am still -1 on this. Cheers, Ian

Ok, I am sorry for the sarcasm. Essentially this is exactly what I wanted to say with it. Because comprehensions are faster than for loops, I am using them, and this is why I'd like the while feature in them. I fully agree with everybody here that itertools provides a solution for it, but imagine for a moment the if clause wouldn't exist and people would point you to a similar itertools solution for it, e.g.: [n for n in itertools.takeif(lambda n: n % 4 == 0, range(1,1000))] What would you prefer? I think it is true that this is mostly about how often people would make use of the feature. And, yes, it was a mistake to disturb the ongoing voting with sarcasm. Best, Wolfgang
This is definitely a problematic use case for a simple list comprehension, but the takewhile solution works exactly as expected and even resembles your solution. It is in the standard library and it's performance seems to be fast enough (to me at least, on a 10 year old laptop). And the key phrase here is "simple list comprehension". Yours is in theory a simple list comprehension but is rather a slightly more complex case that can be handled in a barely more complex way. itertools is a part of the standard library that needs more affection, in my opinion and really does its best to accommodate these more complex cases in sensible ways. I am still -1 on this. Cheers, Ian

On 01/28/2013 05:33 AM, Wolfgang Maier wrote:
What happens when you want the names that start with 'B'? The advantage of 'if' is it processes the entire list so grabs all items that match, and the list does not have to be ordered. The disadvantage (can be) that it processes the entire list. Given that 'while' would only work on sorted lists, and could only start from the beginning, I think it may be too specialized. But I wouldn't groan if someone wanted to code it up. :) +0 ~Ethan~

Ethan Furman <ethan@...> writes:
I thought about this question, and I agree this is not what the while clause would be best for. However, currently you could solve tasks like this with itertools.takewhile in the following (almost perl-like) way (I illustrate things with numbers to keep it simpler): l=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] # now retrieve all numbers from 10 to 19 (combining takewhile and slicing) [n for n in itertools.takewhile(lambda n:n<20,l[len([x for x in itertools.takewhile(lambda x:x<10,l)]):])] Nice, isn't it? If I am not mistaken, then with my suggestion this would at least simplify to: [n for n in l[len([x for x in l while x<10]):] while n<20] Not great either, I admit, but at least it's fun to play this mindgame. Best, Wolfgang

On 1/28/2013 8:33 AM, Wolfgang Maier wrote:
Dear all, I guess this is so obvious that someone must have suggested it before:
No one who understands comprehensions would suggest this.
Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You want to break, not filter; and you are depending on the order of the items from the iterator. Comprehensions are a math-logic idea invented for (unordered) sets and borrowed by computer science and extended to sequences. However, sequences do not replace sets. https://en.wikipedia.org/wiki/Set-builder_notation https://en.wikipedia.org/wiki/List_comprehension Python has also extended the idea to dicts and iterators and uses almost exactly the same syntax for all 4 variations.
[n for n in range(1,1000) while n < 400]
This would translate as def _temp(): res = [] for n in range(1, 1000): while n < 400): res.append(n) return res _temp() which makes an infinite loop, not a truncated loop. What you actually want is res = [] for n in range(1, 1000): if >= 400): break res.append(n) which is not the form of a comprehension. -- Terry Jan Reedy

On 28 January 2013 23:27, Terry Reedy <tjreedy@udel.edu> wrote:
That's a little strong.
Python's comprehensions are based on iterators that are inherently ordered (although in some cases the order is arbitrary). In the most common cases the comprehensions produce lists or generators that preserve the order of the underlying iterable. I find that the cases where the order of an iterable is relevant are very common in my own usage of iterables and of comprehensions.
Although dicts and sets should be considered unordered they may still be constructed from a naturally ordered iterable. There are still cases where it makes sense to define the construction of such an object in terms of an order-dependent rule on the underlying iterator.
I guess this is what you mean by "No one who understands comprehensions would suggest this." Of course those are not the suggested semantics but I guess from this that you would object to a while clause that had a different meaning.
The form of a comprehension is not unchangeable. Oscar

On Mon, Jan 28, 2013 at 8:02 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Technically they are not inherently ordered. You give the perfect example below.
They may be, but they may also be constructed from an unordered iterable. How so? Let `d` be a non-empty dictionary, and `f` a function that defines some mutation of it's input such that there doesn't exist x such that x = f(x). e = {k: f(v) for k, v in d.items()} You're taking an unordered object (a dictionary) and making a new one from it. An order dependent rule here would not make sense. Likewise, if we were to do: e = [(k, f(v)) for k, v in d.items()] We're creating order from an object in which there is none. How could the while statement be useful there? An if statement works fine. A `while` statement as suggested wouldn't.
They are not the suggested semantics. You are correct. But based upon how list comprehensions are currently explained, one would be reasonable to expect a list comprehension with `while` to operate like this.
Agreed it is definitely mutable. I am just of the opinion that this is one of those instances where it shouldn't be changed.

On 29 January 2013 01:12, Ian Cordasco <graffatcolmingov@gmail.com> wrote:
I was referring to the case of constructing an object that does not preserve order by iterating over an object that does. Clearly a while clause would be a lot less useful if you were iterating over an object whose order was arbitrary: so don't use it in that case. A (contrived) example - caching Fibonacci numbers: # Fibonacci number generator def fib(): a = b = 1 while True: yield a a, b = b, a+b # Cache the first N fibonacci numbers fib_cache = {n: x for n, x in zip(range(N), fib())} # Alternative fib_cache = {n: x for n, x in enumerate(fib()) while n < N} # Cache the Fibonacci numbers less than X fib_cache = {} for n, x in enumerate(fib()): if x > X: break fib_cache[n] = x # Alternative 1 fib_cache = {n: x for n, x in enumerate(takewhile(lambda x: x < X, fib()))} # Alternative 2 fib_cache = {n: x for n, x in enumerate(fib()) while x < X} Oscar

On Mon, Jan 28, 2013 at 8:34 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Yeah, I'm not sure how well telling someone to use a construct of the language will go over.
As contrived as it may be, it is a good example. Still, I dislike the use of `while` and would rather Steven's suggestion of `until` were this to be included. This would make `until` a super special case, but then again, this construct seems special enough that only a few examples of its usefulness can be constructed. I guess I'm more -0 with `until` than -1. Thanks for the extra example Oscar. It was helpful. Cheers, Ian

On 29/01/13 10:27, Terry Reedy wrote:
Why would it translate that way? That would be a silly decision to make. Python can decide on the semantics of a while clause in a comprehension in whatever way makes the most sense, not necessarily according to some mechanical, nonsensical translation. We could easily decide that although [n for n in range(1,1000) if n < 400] has the semantics of: res = [] for n in range(1, 1000): if n < 400): res.append(n) [n for n in range(1,1000) while n < 400] could instead have the semantics of: res = [] for n in range(1, 1000): if not (n < 400): break res.append(n) If it were decided that reusing the while keyword in this way was too confusing (which doesn't seem likely, since it is a request that keeps coming up over and over again), we could use a different keyword: [n for n in range(1,1000) until n >= 400]
Why not? Was the idea of a comprehension handed down from above by a deity, never to be changed? Or is it a tool to be changed if the change makes it more useful? Mathematical set builder notation has no notion of "break" because it is an abstraction. It takes exactly as much effort (time, memory, resources, whatever) to generate these two mathematical sets: {1} {x for all x in Reals if x == 1} (using a hybrid maths/python notation which I hope is clear enough). To put it another way, mathematically the list comp [p+1 for p in primes()] is expected to run infinitely fast. But clearly Python code is not a mathematical abstraction. So the fact that mathematical set builder notation does not include any way to break out of the loop is neither here nor there. Comprehensions are code, and need to be judged as code, not abstract mathematical identities. -- Steven

On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Terry is correct: comprehensions are deliberately designed to have the exact same looping semantics as the equivalent statements flattened out into a single line, with the innermost expression lifted out of the loop body and placed in front. This then works to arbitrarily deep nesting levels. The surrounding syntax (parentheses, brackets, braces, and whether or not there is a colon present in the main expression) then governs what kind of result you get (generator-iterator, list, set, dict). For example in: (x, y, z for x in a if x for y in b if y for z in c if z) [x, y, z for x in a if x for y in b if y for z in c if z] {x, y, z for x in a if x for y in b if y for z in c if z} {x: y, z for x in a if x for y in b if y for z in c if z} The looping semantics of these expressions are all completely defined by the equivalent statements: for x in a: if x: for y in b: if y: for z in c: if z: (modulo a few name lookup quirks if you're playing with class scopes) Any attempt to change that fundamental equivalence between comprehensions and the corresponding statements has basically zero chance of getting accepted through the PEP process. The only remotely plausible proposal I've seen in this thread is the "else break" on the filter conditions, because that *can* be mapped directly to the statement form in order to accurately describe the intended semantics. However, it would fail the "just use itertools.takewhile or a custom iterator, that use case isn't common enough to justify dedicated syntax". The conceptual basis of Python's comprehensions in mathematical set notation would likely also play a part in rejecting an addition that requires an inherently procedural interpretation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 January 2013 09:51, yoav glazner <yoavglazner@gmail.com> wrote:
Great. I think this nails it. It is exactly the intended behavior, and very readable under current language capabilities. One does not have to stop and go read what "itertools.takewhile" does, and mentally unfold the lambda guard expression - that is what makes this (and the O.P. request) more readable than using takewhile. Note: stop can also just explictly raise StopIteration - or your next(iter([])) expression can be inlined within the generator. It works in Python 3 as well - though for those who did not test: it won't work for list, dicr or set comprehensions - just for generator expressions. js -><-

On Tue, Jan 29, 2013 at 10:35 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
This actually prompted an interesting thought for me. The statement-as-expression syntactic equivalent of the "else stop()" construct would actually be "else return", rather than "else break", since the goal is to say "we're done", regardless of the level of loop nesting. It just so happens that, inside a generator (or generator expression) raising StopIteration and returning from the generator are very close to being equivalent operations, which is why the "else stop()" trick works. In a 3.x container comprehension, the inner scope is an ordinary function, so the equivalence between returning from the function and raising StopIteration is lost. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 31 January 2013 08:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not sure if it is the goal to be able to break out of any level of nesting or at least that's not how I interpreted the original proposal. It is what happens for this stop() function but only because there's no other way. Personally I don't mind as I generally avoid multiple-for comprehensions; by the time I've written one out I usually decide that it would be more readable as ordinary for loops or with a separate function.
I don't really understand what you mean here. What is the difference between comprehensions in 2.x and 3.x? Oscar

On 1/31/2013 6:08 AM, Oscar Benjamin wrote:
In 2.x, (list) conprehensions are translated to the equivalent nested for and if statements and compiled and executed in place. In 3.x, the translation is wrapped in a temporary function that is called and then discarded. The main effect is to localize the loop names, the 'i' in '[i*2 for i in iterable]', for instance. -- Terry Jan Reedy

As I'm reading it, the while clause would be evaluated once for each iteration of the innermost loop, and would be able to access the current values of every loop. *ls = [ line for file in files for line in file while file or line ]* ...would become... ls = [] for file in files: for line in file: if not (file or line): break # break if not while_expression * ls.append(line) *

As I'm reading it, the while clause would be evaluated once for each iteration of the innermost loop, and would be able to access the current values of every loop. *ls = [ line for file in files for line in file while file or line ]* ...would become... ls = [] for file in files: for line in file: if not (file or line): break # break if not while_expression * ls.append(line)*

On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin@gmail.com> wrote:
20,
I know I'm showing my ignorance here, but how are list/dict/set comprehensions and generator expressions implemented differently that one's for loop will catch a StopIteration and the others won't? Would it make sense to reimplement list/dict/set comprehensions as an equivalent generator expression passed to the appropriate constructor, and thereby allow the StopIteration trick to work for each of them as well? Regards, Zach Ware

On 29 January 2013 13:34, Zachary Ware <zachary.ware+pyideas@gmail.com> wrote:
That is because whil list/set/dict constructors are sort of "self contained", a generator expression - they will expect the StopIteration to be raised by the iterator in the "for" part os the expression. The generator expression, on the other hand, is an iterator in itself, and it is expected to raise a StopIteration sometime. The code put aroundit to actually execute it will catch the StopIteration - and it won't care wether it was raised by the for iterator or by any other expression in the iterator. I mean - when you do list(bla for bla in blargh) the generator is exausted inside the "list" call - and this generator exaustion is signaled by the StopIteration exception in both cases.

On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas@gmail.com> wrote:
A for loop is like a while loop with a try/except handler for StopIteration. So the following are roughly equivalent: # For loop for x in iterable: func1(x) else: func2() # Equivalent loop it = iter(iterable) while True: try: x = next(it) except StopIteration: func2() break func1(x) A list comprehension is just like an implicit for loop with limited functionality so it looks like: # List comp results = [func1(x) for x in iterable if func2(x)] # Equivalent loop results = [] it = iter(iterable) while True: try: x = next(it) except StopIteration: break # This part is outside the try/except if func2(x): results.append(func1(x)) The problem in the above is that we only catch StopIteration around the call to next(). So if either of func1 or func2 raises StopIteration the exception will propagate rather than terminate the loop. (This may mean that it terminates a for loop higher in the call stack - which can lead to confusing bugs - so it's important to always catch StopIteration anywhere it might get raised.) The difference with the list(generator) version is that func1() and func2() are both called inside the call to next() from the perspective of the list() function. This means that if they raise StopIteration then the try/except handler in the enclosing list function will catch it and terminate its loop. # list(generator) results = list(func1(x) for x in iterable if func2(c)) # Equivalent loop: def list(iterable): it = iter(iterable) results = [] while True: try: # Now func1 and func2 are both called in next() here x = next(it) except StopIteration: break results.append(x) return results results_gen = (func1(x) for x in iterable if func2(x)) results = list(results_gen) Oscar

On Jan 29, 2013 10:02 AM, "Oscar Benjamin" <oscar.j.benjamin@gmail.com> wrote:
On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas@gmail.com>
wrote:
That makes a lot of sense. Thank you, Oscar and Joao, for the explanations. I wasn't thinking in enough scopes :) Regards, Zach Ware

Oscar Benjamin <oscar.j.benjamin@...> writes:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())! With the somewhat simpler (suggested earlier by Shane) def stop(): raise StopIteration this should become part of the python cookbook!! Thanks a real lot for working this out, Wolfgang

On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())!
I couldn't believe it so I had to check it: from __future__ import print_function import functools, itertools, operator, timeit def var1(): def _gen(): for i in range(100): if i > 50: break yield i return list(_gen()) def var2(): def stop(): raise StopIteration return list(i for i in range(100) if i <= 50 or stop()) def var3(): return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] def var4(): return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] if __name__ == '__main__': for f in (var1, var2, var3, var4): print(f.__name__, end=' ') print(timeit.timeit(f)) Results on my machine: var1 20.4974410534 var2 23.6218020916 var3 32.1543409824 var4 4.90913701057 var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? var3 is the slowest. I guess, because it has lambda in it. But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. -- Kind regards, Yuriy.
participants (19)
-
Boris Borcic
-
Carl Smith
-
Chris Angelico
-
Eli Bendersky
-
Ethan Furman
-
Ian Cordasco
-
Joao S. O. Bueno
-
Mark Hackett
-
Masklinn
-
Nick Coghlan
-
Oscar Benjamin
-
Rob Cliffe
-
Shane Green
-
Steven D'Aprano
-
Terry Reedy
-
Wolfgang Maier
-
yoav glazner
-
Yuriy Taraday
-
Zachary Ware