In the generator expression
(x+1 for x in L)
the name 'L' is not local to the expression (as opposed to 'x'), I
will call such a name a 'free name', as I am not aware of an existing
terminology.
The following is about how to treat 'free names' inside generator
expressions. I argue that these names should be bound to their values
when the generator expression is created, and the rest of this email
tries to provide arguments why this may be desirable.
I am fully aware that I tend to think about things in a very skewed
manner though, so I would be grateful for any rebuttal.
Recently I tried to implement a 'sieve' algorithm using generator
expressions instead of lists. It wasn't the sieve of Eratosthenes but
I will use that as a simpler example (all of the code shown below is
python 2.5). Here is one way to implement the algorithm using list
comprehensions (tuples could also be used as the mutability of lists
is not used):
def list_sieve(n):
"Generate all primes less than n"
P = range(2, n)
while True:
# Yield the first element of P and sieve out its multiples
for p in P:
yield p
P = [x for x in P if x % p]
break
else:
# If P was empty then we're done
return
>>> list(list_sieve(20))
[2, 3, 5, 7, 11, 13, 17, 19]
So that's ok. Then it occured to me that I didn't need to keep
creating all theses lists; so I decided, without further thinking, to
switch to generator expressions, as they are supposed to abstract the
notion of iterable object (and in the function above I'm only using
the 'iterable' properties of lists - any other iterable should do).
So I came up with this:
def incorrect_gen_sieve(n):
"Unsuccessfully attempt to generate all primes less than n"
# Change range to xrange
P = xrange(2, n)
while True:
for p in P:
yield p
# Change list comprehension to generator expression
P = (x for x in P if x % p)
break
else:
return
>>> list(incorrect_gen_sieve(20))
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Ouch. Looking back at the code I realised that this was due to the
fact that the names 'p' and 'P' in the generator expressions are not
local to it, so subsequent binding of these names will affect the
expression. I can't really analyse further than this, my head start
spinning if I try :). So I wrapped the generator expression in a
lambda function to make sure that the names p and P inside it were
bound to what I intended them to:
def gen_sieve(n):
"Generate all primes less than n, but without tables!"
P = xrange(2, n)
while True:
for p in P:
yield p
# Make sure that p and P are what they should be
P = (lambda P, p: (x for x in P if x % p))(P, p)
break
else:
return
>>> list(gen_sieve(20))
[2, 3, 5, 7, 11, 13, 17, 19]
That's better. I like the idea of a sieve of Eratosthenes without any
tables, and it's achievable in Python quite easily. The only problem
is the one that I mentioned above, which boils down to:
In a generator expression, free names can be bound to a new object
between the time when they are defined and when they are used,
thus changing the value of the expression.
But I think that the behaviour of generator expressions would be more
controllable and closer to that of 'real sequences' if the free names
they contain were implicitly frozen when the generator expression is
created.
So I am proposing that for example:
(f(x) for x in L)
be equivalent to:
(lambda f, L: (f(x) for x in L))(f, L)
In most cases this would make generator expressions behave more like
list comprehensions. You would be able to read the generator
expression and think "that't what it does" more reliabley. Of course
if a free name is bound to a mutable object, then there is always the
chance that this object will be mutated between the creation of the
generator expression and its use.
Lastly, instead of using a generator expression I could have
written:
from itertools import ifilter
from functools import partial
def tools_sieve(n):
"Generate all primes less than n"
P = xrange(2, n)
while True:
for p in P:
yield p
P = ifilter(partial(int.__rmod__, p), P)
break
else:
return
>>> list(sieve(20))
[2, 3, 5, 7, 11, 13, 17, 19]
It obviously works as P and p are 'frozen' when ifilter and partial
are called. If
(f(x) for x in L if g(x))
is to be the moral equivalent of
imap(f, ifilter(g, L))
Then my proposal makes even more sense.
--
Arnaud