[Python-Dev] 2.5 and beyond

Sat Jul 1 21:03:19 CEST 2006

> a = []
> for i in range(10):
>     a.append(lambda: i)
> print [x() for x in a]
>
> [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

Aha! -- Thank you for jogging my memory.

You seem to be right -- the problem is not that Python is lexically scoped,
but that when you define a variable with =, it leaks out into the
surrounding function scope.

Here's an example:

	If True:
		y = 123
	print y

It may be obvious that this should print 123, but that's only because =
combines properties of assignment and definition.  In particular, if we were
to write

	y = 42
	if True:
		y = 123
	print y

it would be very surprising if this example were to print anything but 123.

Here is a corresponding fragment in C++:

	int y = 42;
	if (true) {
		y = 123;
	}
	std::cout << y << "\n";

The "int" in the first line means that the variable y is being defined.  Its
lack in the third line means that y refers to a variable defined in an outer
scope.  So both instances of y here refer to the same variable, as they do
in Python.

But because definition and assignment are separated in C++, we can also
write

	int y = 42;
	if (true) {
		int y = 123;
	}
	std::cout << y << "\n";

and the fragment will print 42.  In this example, there are two distinct
variables, both named y.

So the problem, as I see it, is indeed that in Python there are suites that
look to me as if they should define scopes, but don't.  Indeed, if I write

	if (foo):
		y = 123

I can't even determine by inspecting the program whether y is defined at
all.  I might argue that y is always defined, by virtue of appearing before
= somewhere in this scope, but the compiler tells me "name 'y' is not
defined" if I try it, so I guess that's the right way to treat it.

So here's how I understand what Greg was saying.

Suppose I write

	x = []
	for i in range(10):
		x.append(lambda:i)
	print [f() for f in x]

This example will print [9, 9, 9, 9, 9, 9, 9, 9, 9, 9], which I think is
wildly unintuitive.

My intuition in this matter is partly formed by C++, but it is also formed
by other languages going as far back as Algol 68.  That intuition says that
because the suite controlled by a "for" statement is executed any number of
times, potentially including zero, it should be considered as its own scope,
and any variables defined in that scope should stay there.

In particular, the variable "i" should be defined in the scope of the "for",
which implies that each time through the loop, the name "i" should be
(re)bound to a different object.

What surprises me even more is that if I try to define such a variable
explicitly, it still doesn't work:

	x = []
	for i in range(10):
		j = i
		x.append(lambda:j)
	print [f() for f in x]

This example still prints [9, 9, 9, 9, 9, 9, 9, 9, 9, 9].  If I understand
the reason correctly, it is because even though j is defined only in the
body of the loop, loop bodies are not scopes, so the variable's definition
is hoisted out into the surrounding function scope.

To convince myself of this behavior, I defined an extra function scope, the
purpose of which is to localize j:

	x = []
	for i in range(10):
		def foo():
			j = i
			return lambda:j
		x.append(foo())
	print [f() for f in x]

Indeed, this example prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].  The example
also points up the fact that

	x.append(lambda:i)

and

	def foo():
		j = i
		return lambda:j
	x.append(foo())

behave differently, where my intuition (and, I suspect, many other people's
as well) would be that they would be equivalent.

Finally, I observe that this second example above is also equivalent to

	x.append(lambda i=i: i)

which is what explains the fairly common idiom

	x = []
	for i in range(10):
		x.append(lambda i=i:i)
	print [f() for f in x]

So maybe what I meant when I asked for lexical scopes was two things:

	1) Every indentation level should be a scope;
	2) In general, variable definitions should not leak into
	   surrounding scopes.

I realize that (2) is too simplistic.  Someone who writes

	if x < 0:
		y = -x
	else:
		y = x

will expect y to be defined in the scope surrounding the "if" even if it was
not already defined there.  On the other hand, I think that the subtle
pitfalls that come from allowing "for" variables to leak into the
surrounding scopes are much harder to deal with and understand than would be
the consequences of restricting their scopes as outlined above.