[Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!")

Thu Nov 24 09:12:43 CET 2011

On Thu, Nov 24, 2011 at 3:32 PM, PJ Eby <pje at telecommunity.com> wrote:
> You're right; I didn't think of this because I haven't moved past Python 2.5 for production coding as yet.  ;-)

Yeah, there's absolutely no way we could have changed this in 2.x -
with implicit relative imports in packages still allowed, there's too
much code such a change in semantics could have broken.

In Py3k though, most of that code is already going to break one way or
another: if they don't change it, attempting to import it will fail
(since implicit relative imports are gone), while if they *do* switch
to explicit relative imports to make importing as a module work, then
they're probably going to break direct invocation (since __name__ and
__package__ will be wrong unless you use '-m' from the correct working
directory).

The idea behind PEP 395 is to make converting to explicit relative
imports the right thing to do, *without* breaking dual-role modules
for either use case.

> However, if we're going on the basis of how many newbie errors can be solved
> by Just Working, PEP 402 will help more newbies than PEP 395, since you must
> first *have* a package in order for 395 to be meaningful.  ;-)

Nope, PEP 402 makes it worse, because it permanently entrenches the
current broken sys.path[0] initialisation with no apparent way out.
That first list in the current PEP of "these invocations currently
break for modules inside packages"? They all *stay* broken forever
under PEP 402, because the filesystem no longer fully specifies the
package structure - you need an *already* initialised sys.path to
figure out how to translate a given filesystem layout into the Python
namespace. With the package structure underspecified, there's no way
to reverse engineer what sys.path[0] *should* be and it becomes
necessary to put the burden back on the developer.

Consider this PEP 382 layout (based on the example __init__.py based
layout I use in PEP 395):

project/
    setup.py
    example.pyp/
        foo.py
        tests.pyp/
            test_foo.py

There's no ambiguity there: We have a top level project directory
containing an "example" package fragment and an "example.tests"
subpackage fragment. Given the full path to any of "setup.py",
"foo.py" and "test_foo.py", we can figure out that the correct thing
to place in sys.path[0] is the "projects" directory.

Under PEP 402, it would look like this:

project/
    setup.py
    example/
        foo.py
        tests/
            test_foo.py

Depending on what you put on sys.path, that layout could be defining a
"project" package, an "example" package or a "tests" package. The
interpreter has no way of knowing, so it can't do anything sensible
with sys.path[0] when the only information it has is the filename for
"foo.py" or "test_foo.py". Your best bet would be the status quo: just
use the directory containing that file, breaking any explicit relative
imports in the process (since __name__ is correspondingly inaccurate).

People already have to understand that Python modules have to be
explicitly marked - while "foo" can be executed as a Python script, it
cannot be imported as a Python module. Instead, the name needs to be
"foo.py" so that the import process will recognise it as a source
module. Explaining that importable package directories are similarly
marked with either an "__init__.py" file or a ".pyp" extension is a
fairly painless task - people can accept it and move on, even if they
don't necessarily understand *why* it's useful to be explicit about
package layouts. (Drawing that parallel is even more apt these days,
given the ability to explicitly execute any directory containing a
__main__.py file regardless of the directory name or other contents)

The mismatch between __main__ imports and imports from everywhere
else, though? That's hard to explain to *experienced* Python
programmers, let alone beginners.

My theory is that if we can get package layouts to stop breaking most
invocation methods for modules inside those packages, then beginners
should be significantly less confused about how imports work because
the question simply won't arise. Once the behaviour of imports from
__main__ is made consistent with imports from other modules, then the
time when people need to *care* about details like how sys.path[0]
gets initialised can be postponed until much later in their
development as a Python programmer.

>> (which are *supposed* to be dead in 3.x, but linger in
>> __main__ solely due to the way we initialise sys.path[0]). If a script
>> is going to be legitimately shipped inside a package directory, it
>> *must* be importable as part of that package namespace, and any script
>> in Py3k that relies on implicit relative imports fails to qualify.
>
> Wait a minute...  What would happen if there were no implicit relative
> imports allowed in __main__?
> Or are you just saying that you get the *appearance* of implicit relative
> importing, due to aliasing?

The latter - because the initialisation of sys.path[0] ignores package
structure information in the filesystem, it's easy to get the
interpreter to commit the cardinal aliasing sin of putting a package
directory on sys.path. In a lot of cases, that kinda sorta works in
2.x because of implicit relative imports, but it's always going to
cause problems in 3.x.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia