[Import-SIG] PEP 420 issue: extend_path

Tue May 8 05:10:41 CEST 2012

On Tue, May 8, 2012 at 12:24 PM, Eric V. Smith <eric at trueblade.com> wrote:
> I don't think you can do this, at least without losing the ability to
> create a namespace package where portions exist in different path_hook
> loaders. Currently (in the pep-420 branch) you can have a portion in the
> filesystem (FilePath loader), and a portion in a zip file (zipimport
> loader). See the SeparatedNestedZipNamespacePackages test in
> test_namespace_pkgs.py.
>
> I believe what you're suggesting requires the logic be moved from
> PathFinder (which is a meta path hook) in to FileFinder (which is a path
> hook). That's why it would break the cross-finder use case.
>
> Note that the meta path hook PathFinder doesn't know anything about
> directories or filesystems. That's why it currently (in the pep-420
> branch) delegates everything to the path hook finders.

No, it just means that PackageLoader needs to be based on PathFinder
rather than FileFinder. That way the new logic can be fully isolated
from the higher level finder implementation.

> I think the better solution is to create a new finder method, called
> something like find_module_or_namespace_portion (but obviously with a
> better name). If this exists, then it would be called and allowed to
> return a loader, string, or None. If it doesn't exist, find_module would
> be called. It could not participate in namespace packages and could only
> return a loader or None.

I'd suggest the simpler hook "find_package" that has the new semantics
and is *only* called by a new PackageLoader class.

The algorithm would then be:

- the main PathFinder loops scans the sys.path or the relevant
__path__  attribute until it finds a loader. Full stop, end of story.
- PackageLoader.load_module() handles scanning the *rest* of the path
in order to populate namespace packages, roughly as follows:

    package_paths = []
    for entry in path_to_scan:
        importer = _get_importer(entry) # Check path_importer_cache, etc
        try:
            find_loader = importer.find_package
        except AttributeError:
            find_loader = importer.find_module
        loader = find_loader(fullname)
        try:
            load_module = loader.load_module
        except AttributeError:
            pass
        else:
            return load_module(fullname)
        if loader is not None:
            package_paths.append(loader)
    return make_namespace_package(package_paths)

The find_package vs find_module distinction also lets us resolve the
potential for infinite recursion in FileFinder without needing an
additional subclass. For find_module, FileFinder would return the new
PackageLoader instances, while find_package would return either
strings (for namespace package portions) or the appropriate loader for
__init__.py (for self-contained packages)

> I think the use case of being able to have namespace package portions
> returned by different path hooks is important. Imagine a case where the
> namespace "encodings" takes off. Who's to same some portions don't ship
> as zip files, some as regular files, and maybe some with a hypothetical
> http loader?

Agreed, but I still want to get this out of the main import path, so
that it only happens if a namespace portion gets encountered during
the scan. For backwards compatibility with existing import
reimplementations, the expected top level semantics should remain
"when you find a loader, stop scanning and call the load_module()
method".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia