Why is the use of an undefined name not a syntax error?

Sun Apr 1 22:00:10 EDT 2018

On Sun, 01 Apr 2018 14:24:38 -0700, David Foster wrote:

> My understanding is that the Python interpreter already has enough
> information when bytecode-compiling a .py file to determine which names
> correspond to local variables in functions. That suggests it has enough
> information to identify all valid names in a .py file and in particular
> to identify which names are not valid.

Not even close.

The bottom line is, the Python core developers don't want to spent their 
time writing and maintaining what is effectively a linter. Python is run 
by a small team of volunteers with relatively little funding, and there 
are far more important things for them to work on than duplicating the 
work done by linters. If you want something to check your code ahead of 
time for undefined names, then run a linter: you have many to choose from.

But even if they were prepared to do so, it isn't as easy or cheap as you 
think. This sort of analysis works for local variables because Python has 
decided on the rule that *any* binding operation to a local name in a 
function makes it a local, regardless of whether that binding operation 
would actually be executed or not. So:

def function():
    len
    return None
    if False:
        len = len

fails with UnboundLocalError. That's the rule for functions, and it is 
deliberately made more restrictive than for Python code outside of 
functions as a speed optimization.

(In Python 3, the rule is more restrictive than for Python 2: star 
imports inside functions and unqualified exec are forbidden too.)

But it doesn't work for globals unless you make unjustifiable (for the 
compiler) assumptions about what code contains, or do an extremely 
expensive whole-application analysis.

For example, here's a simple, and common, Python statement:

import math

Can you tell me what global names that line will add to your globals?

If you said only "math", then you're guilty of making those unjustifiable 
assumptions. Of course, for *sensible* code, that will be the only name 
added, but the compiler shouldn't assume the code is sensible. Linters 
can, but the compiler shouldn't.

The imported module is not necessarily the standard library `math` 
module, it could be a user-defined module shadowing it. That module could 
have side-effects, and those side-effects could include populating the 
current module (not the fake `math`) with any number of globals, or 
adding/deleting names from the builtins.

So the instant you import a module, in principle you no longer know the 
state of globals.

Of course, in practice we don't do that. Much. But it is intentionally 
allowed, and it is not appropriate for the compile to assume that we 
never do that. A linter can assume sensible code, and get away with more 
false negatives than the compiler can.

So here is a partial list of things which could change the global or 
built-in name spaces, aside from explicit binding operations:

- star imports;
- importing any module could inject names into builtins or 
  your globals as a side-effect;
- calling any function could do the same;
- exec;
- eval, since it could call exec;
- manipulating globals() or locals();
- even under another name, e.g:

    foo = False or globals
    # later
    foo()['surprise'] = 12345

I've probably missed many. Of course sensible code doesn't do horrible 
things like those (possible excluding the star imports), but the compiler 
would have to cope with them since they are allowed and sometimes they're 
useful.

Unlike a linter, which can afford to be wrong sometimes, the compiler 
cannot be wrong or it counts as a compiler bug. Nobody will be too upset 
if a linter misses some obscure case in obfuscated weird code. But if the 
compiler wrongly flags an error when the code is actually legal, people 
will be justifiably annoyed.

-- 
Steve