[Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs

Wed Nov 16 23:41:08 CET 2011

On Thu, Nov 17, 2011 at 1:08 AM, PJ Eby <pje at telecommunity.com> wrote:
> On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> So, without a clear answer to the question of "from module X, inside
>> package (or package portion) Y, find the nearest parent directory that
>> should be placed on sys.path" in a PEP 402 based world, I'm switching
>> to supporting PEP 382 as my preferred approach to namespace packages.
>> In this case, I think "explicit is better than implicit" means, "given
>> only a filesystem hierarchy, you should be able to figure out the
>> Python package hierarchy it contains". Only explicit markers (either
>> files or extensions) let you do that - with PEP 402, the filesystem
>> doesn't contain enough information to figure it out, you need to also
>> know the contents of sys.path.
>
> After spending an hour or so reading through PEP 395 and trying to grok what
> it's doing, I actually come to the opposite conclusion: that PEP 395 is
> violating the ZofP by both guessing, and not encouraging One Obvious Way of
> invoking scripts-as-modules.
> For example, if somebody adds an __init__.py to their project directory,
> suddenly scripts that worked before will behave differently under PEP 395,
> creating a strange bit of "spooky action at a distance".  (And yes, people
> add __init__.py files to their projects in odd places -- being setuptools
> maintainer, you get to see a LOT of weird looking project layouts.)
>
> While I think the __qname__ idea is fine, and it'd be good to have a way to
> avoid aliasing main (suggestion for how included below), I think that
> relative imports failing from inside a main module should offer an error
> message suggesting you use "-m" if you're running a script that's within a
> package, since that's the One Obvious Way of running a script that's also a
> module.  (Albeit not obvious unless you're Dutch.  ;-) )

The -m switch is not always an adequate replacement for direct
execution, because it relies on the current working directory being
set correctly (or else the module to be executed being accessible via
sys.path, and there being nothing in the current directory that will
shadow modules that you want to import). Direct execution will always
have the advantage of allowing you more explicit control over all of
sys.path[0], sys.argv[0] and __main__.__file__. The -m switch, on the
other hand, will always set sys.path[0] to the empty string, which may
not be what you really want.

If the package directory markers are explicit (as they are now and as
they are in PEP 382), then PEP 395 isn't guessing - the mapping from
the filesystem layout to the Python module namespace is completely
unambiguous, since the directory added as sys.path[0] will always be
the first parent directory that isn't marked as a package directory:

    # Current rule
    sys.path[0] = os.path.abspath(os.path.dirname(__main__.__file__))

    # PEP 395 rule
    path0 = os.path.abspath(os.path.dirname(__main__.__file__))
    while is_package_dir(path0):
        path0 = os.path.dirname(path0)
    sys.path[0] = path0

In fact, both today and under PEP 382, we could fairly easily provide
a "runpy.split_path_module()" function that converts an arbitrary
filesystem path to the corresponding python module name and sys.path
entry:

    def _splitmodname(fspath):
        path_entry, fname = os.path.split(fspath)
        modname = os.path.splitext(fname)[0]
        return path_entry, modname

    # Given appropriate definitions for "is_module_or_package" and
"has_init_file"...
    def split_path_module(fspath):
        if not is_module_or_package(fspath):
            raise ValueError("{!r} is not recognized as a Python
module".format(filepath))
        path_entry, modname = _splitmodname(fspath)
        while path_entry.endswith(".pyp") or has_init_file(path_entry):
            path_entry, pkg_name = _splitmodname(path_entry)
            modname = pkg_name + '.' + modname
        return modname, path_entry

As far as the "one obvious way" criticism goes, I think the obvious
way (given PEP 395) is clear:

1. Do you have a filename? Just run it and Python will figure out
where it lives in the module namespace
2. Do you have a module name? Run it with the -m switch and Python
will figure out where it lives on the filesystem

runpy.run_path() corresponds directly to 1, runpy.run_module()
corresponds directly to 2.

Currently, if you have a filename, just running it is sometimes the
*wrong thing to do*, because it may point inside a package directory.
But you have no easy way to tell if that is the case. Under PEP 402,
you simply *can't* tell, as the filesystem no longer contains enough
information to provide an unambiguous mapping to the Python module
namespace - instead, the intended mapping depends not only on the
filesystem contents, but also on the runtime configuration of
sys.path.

> For the import aliasing case, AFAICT it's only about cases where __name__ ==
> '__main__', no?  Why not just save the file/importer used for __main__, and
> then have the import machinery check whether a module being imported is
> about to alias __main__?  For that, you don't need to know in *advance* what
> the qualified name of __main__ is - you just spot it the first time somebody
> re-imports it.

Oh, I like that idea - once __main__.__qname__ is available, you could
just have a metapath hook along the lines of the following:

  class MainImporter:
    def __init__(self):
        main = sys.modules.get("__main__", None):
        self.main_qname = getattr(main, "__qname__", None)

    def find_module(self, fullname, path=None):
        if fullname == self.main_qname:
            return self
        return None

    def load_module(self, fullname):
        return sys.modules["__main__"]

> I think removing qname-quessing from PEP 395 (and replacing it with
> instructive/google-able error messages) would be an unqualified improvement,
> independent of what happens to PEPs 382 and 402.

Even if the "just do what I mean" part of the proposal in PEP 395 is
replaced by a "Did you mean?" error message, PEP 382 still offers the
superior user experience, since we could use runpy.split_path_module()
to state the *exact* argument to -m that should have been used. Of
course, that still wouldn't get sys.path[0] set correctly, so it isn't
a given that it would really help.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia