[Python-ideas] importlib: making FileFinder easier to extend

Erik Bray erik.m.bray at gmail.com
Wed Feb 7 10:03:38 EST 2018


Hello,

Brief problem statement: Let's say I have a custom file type (say,
with extension .foo) and these .foo files are included in a package
(along with other Python modules with standard extensions like .py and
.so), and I want to make these .foo files importable like any other
module.

On its face, importlib.machinery.FileFinder makes this easy.  I make a
loader for my custom file type (say, FooSourceLoader), and I can use
the FileFinder.path_hook helper like:

sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo'])))
sys.path_importer_cache.clear()

Great--now I can import my .foo modules like any other Python module.
However, any standard Python modules now cannot be imported.  The way
PathFinder sys.meta_path hook works, sys.path_hooks entries are
first-come-first-serve, and furthermore FileFinder.path_hook is very
promiscuous--it will take over module loading for *any* directory on
sys.path, regardless what the file extensions are in that directory.
So although this mechanism is provided by the stdlib, it can't really
be used for this purpose without breaking imports of normal modules
(and maybe it's not intended for that purpose, but the documentation
is unclear).

There are a number of different ways one could get around this.  One
might be to pass FileFinder.path_hook loaders/extension pairs for all
the basic file types known by the Python interpreter.  Unfortunately
there's no great way to get that information.  *I* know that I want to
support .py, .pyc, .so etc. files, and I know which loaders to use for
them.  But that's really information that should belong to the Python
interpreter, and not something that should be reverse-engineered.  In
fact, there is such a mapping provided by
importlib.machinery._get_supported_file_loaders(), but this is not a
publicly documented function.

One could probably think of other workarounds.  For example you could
implement a custom sys.meta_path hook.  But I think it shouldn't be
necessary to go to higher levels of abstraction in order to do
this--the default sys.path handler should be able to handle this use
case.

In order to support adding support for new file types to
sys.path_hooks, I ended up implementing the following hack:

#############################################################
import os
import sys

from importlib.abc import PathEntryFinder


@PathEntryFinder.register
class MetaFileFinder:
    """
    A 'middleware', if you will, between the PathFinder sys.meta_path hook,
    and sys.path_hooks hooks--particularly FileFinder.

    The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
    it will handle *any* directory.  So if one wants to insert another
    FileFinder.path_hook into sys.path_hooks, that will totally take over
    importing for any directory, and previous path hooks will be ignored.

    This class provides its own sys.path_hooks hook as follows: If inserted
    on sys.path_hooks (it should be inserted early so that it can supersede
    anything else).  Its find_spec method then calls each hook on
    sys.path_hooks after itself and, for each hook that can handle the given
    sys.path entry, it calls the hook to create a finder, and calls that
    finder's find_spec.  So each sys.path_hooks entry is tried until a spec is
    found or all finders are exhausted.
    """

    def __init__(self, path):
        if not os.path.isdir(path):
            raise ImportError('only directories are supported', path=path)

        self.path = path
        self._finder_cache = {}

    def __repr__(self):
        return '{}({!r})'.format(self.__class__.__name__, self.path)

    def find_spec(self, fullname, target=None):
        if not sys.path_hooks:
            return None

        for hook in sys.path_hooks:
            if hook is self.__class__:
                continue

            finder = None
            try:
                if hook in self._finder_cache:
                    finder = self._finder_cache[hook]
                    if finder is None:
                        # We've tried this finder before and got an ImportError
                        continue
            except TypeError:
                # The hook is unhashable
                pass

            if finder is None:
                try:
                    finder = hook(self.path)
                except ImportError:
                    pass

            try:
                self._finder_cache[hook] = finder
            except TypeError:
                # The hook is unhashable for some reason so we don't bother
                # caching it
                pass

            if finder is not None:
                spec = finder.find_spec(fullname, target)
                if spec is not None:
                    return spec

        # Module spec not found through any of the finders
        return None

    def invalidate_caches(self):
        for finder in self._finder_cache.values():
            finder.invalidate_caches()

    @classmethod
    def install(cls):
        sys.path_hooks.insert(0, cls)
        sys.path_importer_cache.clear()

#############################################################

This works, for example, like:

>>> MetaFileFinder.install()
>>> sys.path_hooks.append(FileFinder.path_hook((SourceFileLoader, ['.foo'])))

And now, .foo modules are importable, without breaking support for the
built-in module types.

This is still overkill though.  I feel like there should instead be a
way to, say, extend a sys.path_hooks hook based on FileFinder so as to
be able to support loading other file types, without having to go
above the default sys.meta_path hooks.

A small, but related problem I noticed in the way FileFinder.path_hook
is implemented, is that for almost *every directory* that gets cached
in sys.path_importer_cache, a new FileFinder instance is created with
its own self._loaders attribute, each containing a copy of the same
list of (loader, extensions) tuples.  I calculated that on one large
project this alone accounted for nearly 1 MB.  Not a big deal in the
grand scheme of things, but still a bit overkill.

ISTM it would kill two birds with one stone if FileFinder were
changed, or there were a subclass thereof, that had a class attribute
containing the standard loader/extension mappings.  This in turn could
simply be appended to in order to support new extension types.

Thanks,
E


More information about the Python-ideas mailing list