Why is the use of an undefined name not a syntax error?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Apr 1 22:00:10 EDT 2018
On Sun, 01 Apr 2018 14:24:38 -0700, David Foster wrote:
> My understanding is that the Python interpreter already has enough
> information when bytecode-compiling a .py file to determine which names
> correspond to local variables in functions. That suggests it has enough
> information to identify all valid names in a .py file and in particular
> to identify which names are not valid.
Not even close.
The bottom line is, the Python core developers don't want to spent their
time writing and maintaining what is effectively a linter. Python is run
by a small team of volunteers with relatively little funding, and there
are far more important things for them to work on than duplicating the
work done by linters. If you want something to check your code ahead of
time for undefined names, then run a linter: you have many to choose from.
But even if they were prepared to do so, it isn't as easy or cheap as you
think. This sort of analysis works for local variables because Python has
decided on the rule that *any* binding operation to a local name in a
function makes it a local, regardless of whether that binding operation
would actually be executed or not. So:
def function():
len
return None
if False:
len = len
fails with UnboundLocalError. That's the rule for functions, and it is
deliberately made more restrictive than for Python code outside of
functions as a speed optimization.
(In Python 3, the rule is more restrictive than for Python 2: star
imports inside functions and unqualified exec are forbidden too.)
But it doesn't work for globals unless you make unjustifiable (for the
compiler) assumptions about what code contains, or do an extremely
expensive whole-application analysis.
For example, here's a simple, and common, Python statement:
import math
Can you tell me what global names that line will add to your globals?
If you said only "math", then you're guilty of making those unjustifiable
assumptions. Of course, for *sensible* code, that will be the only name
added, but the compiler shouldn't assume the code is sensible. Linters
can, but the compiler shouldn't.
The imported module is not necessarily the standard library `math`
module, it could be a user-defined module shadowing it. That module could
have side-effects, and those side-effects could include populating the
current module (not the fake `math`) with any number of globals, or
adding/deleting names from the builtins.
So the instant you import a module, in principle you no longer know the
state of globals.
Of course, in practice we don't do that. Much. But it is intentionally
allowed, and it is not appropriate for the compile to assume that we
never do that. A linter can assume sensible code, and get away with more
false negatives than the compiler can.
So here is a partial list of things which could change the global or
built-in name spaces, aside from explicit binding operations:
- star imports;
- importing any module could inject names into builtins or
your globals as a side-effect;
- calling any function could do the same;
- exec;
- eval, since it could call exec;
- manipulating globals() or locals();
- even under another name, e.g:
foo = False or globals
# later
foo()['surprise'] = 12345
I've probably missed many. Of course sensible code doesn't do horrible
things like those (possible excluding the star imports), but the compiler
would have to cope with them since they are allowed and sometimes they're
useful.
Unlike a linter, which can afford to be wrong sometimes, the compiler
cannot be wrong or it counts as a compiler bug. Nobody will be too upset
if a linter misses some obscure case in obfuscated weird code. But if the
compiler wrongly flags an error when the code is actually legal, people
will be justifiably annoyed.
--
Steve
More information about the Python-list
mailing list