[Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs

Thu Nov 17 04:50:34 CET 2011

On Thu, Nov 17, 2011 at 12:00 PM, PJ Eby <pje at telecommunity.com> wrote:
> So who *is* PEP 395's target audience, and what is their mental model?
>  That's the question I'd like to come to grips with before proposing a full
> solution.

OK, I realised that the problem I want to solve with this part of the
PEP isn't limited to direct execution of scripts - It's a general
problem with figuring out an appropriate value for sys.path[0] that
also affects the interactive interpreter and the -m switch.

The "mission statement" for this part of PEP 395 is then clearly
stated as: the Python interpreter should *never* automatically place a
Python package directory on sys.path.

Adding package directories to sys.path creates undesirable aliasing
that may lead to multiple imports of the same module under different
names, unexpected shadowing of standard library (and other) modules
and packages, and frequently confusing errors where a module works
when imported but not when executed directly and vice-versa. Letting
the import system get into that state without even a warning is
letting an error pass silently and we shouldn't do it.

However, it's also true that, in many cases, this slight error in the
import state is actually harmless, so *always* failing in this
situation would be an unacceptable breach of backwards compatibility.
While we could issue a warning and demand that the user fix it
themselves (by invoking Python differently), there's no succinct way
to explain what has gone wrong - it depends on a fairly detailed
understanding of how import system gets initialised. And, as noted,
there isn't actually a easy mechanism for users to currently fix it
themselves in the general case - using the -m switch means also you
have to get the current working directory right, losing out on one of
the main benefits of direct execution. And such a warning is assuredly
useless if you actually ran the script by double-clicking it in a file
browser...

Accordingly, PEP 395 proposes that, when such a situation is
encountered, Python should just use the nearest containing
*non*-package directory as sys.path[0] rather than naively blundering
ahead and corrupting the import system state, regardless of how the
proposed value for sys.path[0] was determined (i.e. the current
working directory or the location of a specific Python file). Any
module that currently worked correctly in this situation should
continue to work, and many others that previously failed (because they
were inside packages) will start to work. The only new failures will
be early detection of invalid filesystem layouts, such as
"__init__.py" files in directories that are not valid Python package
names, and scripts stored inside package directories that *only* work
as scripts (effectively relying on the implicit relative imports that
occur due to __name__ being set to "__main__").

This problem most often arises during development (*not* after
deployment), when developers either start python to perform some
experiments, or place quick tests or sanity checks in "if __name__ ==
'__main__':" blocks at the end of their modules (this is a common
practice, described in a number of Python tutorials. Our own docs also
recommend this practice for test modules:
http://docs.python.org/library/unittest#basic-example).

The classic example from Stack Overflow looked like this:

    project/
        package/
            __init__.py
            foo.py
            tests/
                __init__.py
                test_foo.py

Currently, the *only* correct way to invoke test_foo is with "project"
as the current working directory and the command "python -m
package.tests.test_foo". Anything else (such as "python
package/tests/test_foo.py", ./package/tests/test_foo.py", clicking the
file in a file browser or, while in the tests directory, invoking
"python test_foo.py", "./test_foo.py" or "python -m test_foo") will
still *try* to run test_foo, but fail in a completely confusing
manner.

If test_foo uses absolute imports, then the error will generally be
"ImportError: No module named package", if it uses explicit relative
imports, then the error will be "ValueError: Attempted relative import
in non-package". Neither of these is going to make any sense to a
novice Python developer, but there isn't any obvious way to make those
messages self-explanatory (they're completely accurate, they just
involve a lot of assumed knowledge regarding how the import system
works and sys.path gets initialised).

If foo.py is set up to invoke its own test suite:

    if __name__ == "__main__":
        import unittest
        from .tests import test_foo
        unittest.main(test_foo.__name__)

Then you can get similarly confusing errors when attempting to run foo itself.

However, those errors are comparatively obvious compared to the
AttributeErrors (and ImportErrors) that can arise if you get
unexpected name shadowing. For example, suppose you have a helper
module called "package.json" for dealing with JSON serialisation in
your library, and you start an interactive session while in the
package directory, or attempting to invoke 'foo.py' directly in order
to run its test suite (as described above). Now "import json" is
giving you the version from your package, even though that version is
*supposed* to be safely hidden away inside your package namespace. By
silently allowing a package directory onto sys.path, we're doing our
users a grave disservice.

So my perspective is this: we're currently doing something by default
that's almost guaranteed to be the wrong thing to do. There's a
reasonably simple alternative that's almost always the *right* thing
to do. So let's switch the default behaviour to get the common case
right, and leave the confusing errors for the situations where
something is actually *broken* (i.e. misplaced __init__.py files and
scripts in package directories that are relying on implicit relative
imports).

And if that means requiring that package directories always be marked
explicitly (either by an __init__.py file or by a ".pyp" extension)
and forever abandoning the concepts in PEP 402, so be it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia