Re: [Python-ideas] [Python-Dev] yield * (Re: Missing operator.call)

[+python-ideas -python-dev] On Sat, Feb 7, 2009 at 8:50 PM, <glyph@divmod.com> wrote:
I still don't understand why such a construct is necessary. Is
for elt in iterable: yield elt
really all that bad? Maybe it's a little silly-looking, but at least it's easy to understand and not _that_ hard to type.... -- Cheers, Leif

Leif Walsh wrote:
It's not just silly looking, it's the same construct used repeatedly, in many different places in code. It is a basic principle of programming that anytime you have blocks of code that are almost identical, you should factor out the common code into it's own routine. See "Don't Repeat Yourself" and "Once And Only Once" for similar ideas: http://c2.com/cgi/wiki?OnceAndOnlyOnce http://c2.com/cgi/wiki?DontRepeatYourself Consider a pure Python implementation of itertools.chain: def chain(*iterables): for it in iterables: for elt in it: yield elt The double for loop obscures the essential nature of chain. From help(itertools.chain): "Return a chain object whose .next() method returns elements from the first iterable until it is exhausted, then elements from the next iterable, until all of the iterables are exhausted." The emphasis is on iterating over the sequence of iterables, not iterating over each iterable itself. This is one place where explicit is *not* better than implicit, as the inner loop exposes too much of the internal detail to the reader. Instead, chain() could be better written as this: def chain(*iterables): for it in iterables: yield from it Naturally you can use map and filter to transform the results: yield from map(trans, filter(expr, it)) The advantage is even more obvious when married with a generator expression: yield from (3*x for x in seq if x%2 == 1) instead of: for x in seq: if x%2 == 1: yield 3*x or for y in (3*x for x in seq if x%2 == 1): yield y I'm +1 on this suggestion, especially since it requires no new keywords. -- Steven

On Sun, Feb 8, 2009 at 3:14 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Sure, but it's only factoring out one or two lines. I dunno. If it's not too intrusive to the parser, I guess it's not such a bad idea, but it just seems like it's more work than it's worth. Besides, most applications I can think of that require you to build a generator around another number of generators also require some kind of manipulation of each data item generated, which this construct doesn't allow. It's a decent proposal, and looks nice enough, but I'm not convinced it's a good use of our time (not that it's up to me though). -- Cheers, Leif

Antoine Pitrou wrote:
The "yield from" syntax hasn't even been approved, let alone implemented, and you're already complaining it's slow? Talk about premature optimization! That's a criticism of *generator expressions*, not the suggested syntax. They're popular because most people prefer the large benefits in readability and convenience over the minuscule cost in generating them, particularly since often that cost is paid somewhere else: the caller builds the generator expression and passes the resulting iterator into your function, which merely iterates over it. And the cost is small: [steve@ando ~]$ python -m timeit -s "seq = range(500)" "(3*x for x in seq if x%2 == 1)" 1000000 loops, best of 3: 0.611 usec per loop -- Steven

On Sun, Feb 8, 2009 at 23:20, Steven D'Aprano <steve@pearwood.info> wrote:
Because generators are lazy and you don't run it into completion. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

Steven D'Aprano <steve@...> writes:
seq if
No, I was talking about the cost of running it to completion. A generator is executed in a separated frame. Therefore, if you "yield from" a generator, there is a frame switch at each iteration between the generator frame and the frame of the "yield from". Which is not the case with an inline "for" loop containing a "yield". Regards Antoine.

Antoine Pitrou wrote:
Perhaps I misunderstood you, because what you actually said was: "But the former will be slower than the latter, because it constructs an intermediate generator only to yield it element by element." Since a for loop will also yield element by element, the only difference I saw was constructing the generator expression, which is cheap.
I'm afraid I don't understand the relevance of this. If it's a criticism, it's a criticism of generators in general, not of the proposed syntax. Don't we already carry the cost of the frame switch when iterating over a generator? for el in generator: yield el If that is replaced with the proposed syntax yield from generator what's the difference, performance-wise? In both cases, you can optimise by unrolling the generator into an inline for loop, at the cost of readability, convenience, and the ability to pass generator objects around. In Python 2.4 at least, the optimisation is not to be sneered at (modulo the usual warnings about premature optimisation): unrolling is about 30-40% faster: $ python -m timeit -s "def f():" -s " for x in (i+1 for i in xrange(20)): yield x" "list(f())" # using gen expr 100000 loops, best of 3: 12.9 usec per loop $ python -m timeit -s "def f():" -s " for x in xrange(20): yield x+1" "list(f())" # unrolled into body of the loop 100000 loops, best of 3: 8.09 usec per loop Since people are already choosing to use generator expressions instead of unrolling them into for loops, I don't believe that your objection is relevant to the proposal. "yield from expression" would (presumably) be a shorter, neater way of saying "for x in expression: yield x" except that it doesn't create a new name x. -- Steven

Steven D'Aprano <steve@...> writes:
I don't know about other people, but when I write a generator expression, it's usually for passing it around. That is, I write a generator expression in places where I'd otherwise have to write a full generator function; both are probably equivalent performance-wise. I don't think writing a generator expression in situations where you could simply inline the equivalent loop is very common, because it doesn't seem to bring anything (and, as you observed, it's slower). Regards Antoine.

Just to give another random user's opinion, I love this idea. When writing code where I factor out lots of generators (for something like cherrypy), I've had to repeat this two line idiom dozens of times in one function. +1 Nate

Leif Walsh wrote:
If all you want is to pass yielded values outwards, it's not all that bad, although it could get a bit tedious if you're doing it a lot. However, if you want values passed back in by send() to go to the right places, it's *considerably* more complicated. The expansion I posted just before shows, I think, that this is not something you want to have to write out longhand every time -- at least not if you want a good chance of getting it right! -- Greg

Leif Walsh wrote:
It's not just silly looking, it's the same construct used repeatedly, in many different places in code. It is a basic principle of programming that anytime you have blocks of code that are almost identical, you should factor out the common code into it's own routine. See "Don't Repeat Yourself" and "Once And Only Once" for similar ideas: http://c2.com/cgi/wiki?OnceAndOnlyOnce http://c2.com/cgi/wiki?DontRepeatYourself Consider a pure Python implementation of itertools.chain: def chain(*iterables): for it in iterables: for elt in it: yield elt The double for loop obscures the essential nature of chain. From help(itertools.chain): "Return a chain object whose .next() method returns elements from the first iterable until it is exhausted, then elements from the next iterable, until all of the iterables are exhausted." The emphasis is on iterating over the sequence of iterables, not iterating over each iterable itself. This is one place where explicit is *not* better than implicit, as the inner loop exposes too much of the internal detail to the reader. Instead, chain() could be better written as this: def chain(*iterables): for it in iterables: yield from it Naturally you can use map and filter to transform the results: yield from map(trans, filter(expr, it)) The advantage is even more obvious when married with a generator expression: yield from (3*x for x in seq if x%2 == 1) instead of: for x in seq: if x%2 == 1: yield 3*x or for y in (3*x for x in seq if x%2 == 1): yield y I'm +1 on this suggestion, especially since it requires no new keywords. -- Steven

On Sun, Feb 8, 2009 at 3:14 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Sure, but it's only factoring out one or two lines. I dunno. If it's not too intrusive to the parser, I guess it's not such a bad idea, but it just seems like it's more work than it's worth. Besides, most applications I can think of that require you to build a generator around another number of generators also require some kind of manipulation of each data item generated, which this construct doesn't allow. It's a decent proposal, and looks nice enough, but I'm not convinced it's a good use of our time (not that it's up to me though). -- Cheers, Leif

Antoine Pitrou wrote:
The "yield from" syntax hasn't even been approved, let alone implemented, and you're already complaining it's slow? Talk about premature optimization! That's a criticism of *generator expressions*, not the suggested syntax. They're popular because most people prefer the large benefits in readability and convenience over the minuscule cost in generating them, particularly since often that cost is paid somewhere else: the caller builds the generator expression and passes the resulting iterator into your function, which merely iterates over it. And the cost is small: [steve@ando ~]$ python -m timeit -s "seq = range(500)" "(3*x for x in seq if x%2 == 1)" 1000000 loops, best of 3: 0.611 usec per loop -- Steven

On Sun, Feb 8, 2009 at 23:20, Steven D'Aprano <steve@pearwood.info> wrote:
Because generators are lazy and you don't run it into completion. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

Steven D'Aprano <steve@...> writes:
seq if
No, I was talking about the cost of running it to completion. A generator is executed in a separated frame. Therefore, if you "yield from" a generator, there is a frame switch at each iteration between the generator frame and the frame of the "yield from". Which is not the case with an inline "for" loop containing a "yield". Regards Antoine.

Antoine Pitrou wrote:
Perhaps I misunderstood you, because what you actually said was: "But the former will be slower than the latter, because it constructs an intermediate generator only to yield it element by element." Since a for loop will also yield element by element, the only difference I saw was constructing the generator expression, which is cheap.
I'm afraid I don't understand the relevance of this. If it's a criticism, it's a criticism of generators in general, not of the proposed syntax. Don't we already carry the cost of the frame switch when iterating over a generator? for el in generator: yield el If that is replaced with the proposed syntax yield from generator what's the difference, performance-wise? In both cases, you can optimise by unrolling the generator into an inline for loop, at the cost of readability, convenience, and the ability to pass generator objects around. In Python 2.4 at least, the optimisation is not to be sneered at (modulo the usual warnings about premature optimisation): unrolling is about 30-40% faster: $ python -m timeit -s "def f():" -s " for x in (i+1 for i in xrange(20)): yield x" "list(f())" # using gen expr 100000 loops, best of 3: 12.9 usec per loop $ python -m timeit -s "def f():" -s " for x in xrange(20): yield x+1" "list(f())" # unrolled into body of the loop 100000 loops, best of 3: 8.09 usec per loop Since people are already choosing to use generator expressions instead of unrolling them into for loops, I don't believe that your objection is relevant to the proposal. "yield from expression" would (presumably) be a shorter, neater way of saying "for x in expression: yield x" except that it doesn't create a new name x. -- Steven

Steven D'Aprano <steve@...> writes:
I don't know about other people, but when I write a generator expression, it's usually for passing it around. That is, I write a generator expression in places where I'd otherwise have to write a full generator function; both are probably equivalent performance-wise. I don't think writing a generator expression in situations where you could simply inline the equivalent loop is very common, because it doesn't seem to bring anything (and, as you observed, it's slower). Regards Antoine.

Just to give another random user's opinion, I love this idea. When writing code where I factor out lots of generators (for something like cherrypy), I've had to repeat this two line idiom dozens of times in one function. +1 Nate

Leif Walsh wrote:
If all you want is to pass yielded values outwards, it's not all that bad, although it could get a bit tedious if you're doing it a lot. However, if you want values passed back in by send() to go to the right places, it's *considerably* more complicated. The expansion I posted just before shows, I think, that this is not something you want to have to write out longhand every time -- at least not if you want a good chance of getting it right! -- Greg
participants (6)
-
Antoine Pitrou
-
Greg Ewing
-
Leif Walsh
-
Marcin 'Qrczak' Kowalczyk
-
nathan binkert
-
Steven D'Aprano