
In the generator expression (x+1 for x in L) the name 'L' is not local to the expression (as opposed to 'x'), I will call such a name a 'free name', as I am not aware of an existing terminology. The following is about how to treat 'free names' inside generator expressions. I argue that these names should be bound to their values when the generator expression is created, and the rest of this email tries to provide arguments why this may be desirable. I am fully aware that I tend to think about things in a very skewed manner though, so I would be grateful for any rebuttal. Recently I tried to implement a 'sieve' algorithm using generator expressions instead of lists. It wasn't the sieve of Eratosthenes but I will use that as a simpler example (all of the code shown below is python 2.5). Here is one way to implement the algorithm using list comprehensions (tuples could also be used as the mutability of lists is not used): def list_sieve(n): "Generate all primes less than n" P = range(2, n) while True: # Yield the first element of P and sieve out its multiples for p in P: yield p P = [x for x in P if x % p] break else: # If P was empty then we're done return
list(list_sieve(20)) [2, 3, 5, 7, 11, 13, 17, 19]
So that's ok. Then it occured to me that I didn't need to keep creating all theses lists; so I decided, without further thinking, to switch to generator expressions, as they are supposed to abstract the notion of iterable object (and in the function above I'm only using the 'iterable' properties of lists - any other iterable should do). So I came up with this: def incorrect_gen_sieve(n): "Unsuccessfully attempt to generate all primes less than n" # Change range to xrange P = xrange(2, n) while True: for p in P: yield p # Change list comprehension to generator expression P = (x for x in P if x % p) break else: return
list(incorrect_gen_sieve(20)) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Ouch. Looking back at the code I realised that this was due to the fact that the names 'p' and 'P' in the generator expressions are not local to it, so subsequent binding of these names will affect the expression. I can't really analyse further than this, my head start spinning if I try :). So I wrapped the generator expression in a lambda function to make sure that the names p and P inside it were bound to what I intended them to: def gen_sieve(n): "Generate all primes less than n, but without tables!" P = xrange(2, n) while True: for p in P: yield p # Make sure that p and P are what they should be P = (lambda P, p: (x for x in P if x % p))(P, p) break else: return
list(gen_sieve(20)) [2, 3, 5, 7, 11, 13, 17, 19]
That's better. I like the idea of a sieve of Eratosthenes without any tables, and it's achievable in Python quite easily. The only problem is the one that I mentioned above, which boils down to: In a generator expression, free names can be bound to a new object between the time when they are defined and when they are used, thus changing the value of the expression. But I think that the behaviour of generator expressions would be more controllable and closer to that of 'real sequences' if the free names they contain were implicitly frozen when the generator expression is created. So I am proposing that for example: (f(x) for x in L) be equivalent to: (lambda f, L: (f(x) for x in L))(f, L) In most cases this would make generator expressions behave more like list comprehensions. You would be able to read the generator expression and think "that't what it does" more reliabley. Of course if a free name is bound to a mutable object, then there is always the chance that this object will be mutated between the creation of the generator expression and its use. Lastly, instead of using a generator expression I could have written: from itertools import ifilter from functools import partial def tools_sieve(n): "Generate all primes less than n" P = xrange(2, n) while True: for p in P: yield p P = ifilter(partial(int.__rmod__, p), P) break else: return
list(sieve(20)) [2, 3, 5, 7, 11, 13, 17, 19]
It obviously works as P and p are 'frozen' when ifilter and partial are called. If (f(x) for x in L if g(x)) is to be the moral equivalent of imap(f, ifilter(g, L)) Then my proposal makes even more sense. -- Arnaud