[Python-ideas] importlib: making FileFinder easier to extend

Brett Cannon brett at python.org
Tue Feb 20 15:14:33 EST 2018


Basically what you're after is a way to extend the default finder with a
new file type. Historically you didn't want this because of the performance
hit of the extra stat call to check that new file extension (this has been
greatly alleviated in Python 3 through the caching of directory contents).
But I would still argue that you don't necessarily want this for e.g. the
stdlib or any other random project which might just happen to have a file
with the same file extension as the one you want to have special support
for.

I also don't think we want a class attribute to contains the default
loaders since not everyone will want those default semantics in all cases
either. Since we're diving into deep levels of customization I would askew
anything that makes assumptions for what you want.

I think the best we could consider is making
importlib.machinery._get_supported_loaders() a public API. That way you can
easily construct a finder with the default loaders plus your custom ones.
After that you can then provide a custom sys.path_hooks entry that
recognizes the directories which contain your custom file type.

If that seems reasonable then feel free to open an enhancement request at
bugs.python.org to discuss the API and then we can discuss how to implement
a PR for it.

On Wed, 7 Feb 2018 at 07:04 Erik Bray <erik.m.bray at gmail.com> wrote:

> Hello,
>
> Brief problem statement: Let's say I have a custom file type (say,
> with extension .foo) and these .foo files are included in a package
> (along with other Python modules with standard extensions like .py and
> .so), and I want to make these .foo files importable like any other
> module.
>
> On its face, importlib.machinery.FileFinder makes this easy.  I make a
> loader for my custom file type (say, FooSourceLoader), and I can use
> the FileFinder.path_hook helper like:
>
> sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo'])))
> sys.path_importer_cache.clear()
>
> Great--now I can import my .foo modules like any other Python module.
> However, any standard Python modules now cannot be imported.  The way
> PathFinder sys.meta_path hook works, sys.path_hooks entries are
> first-come-first-serve, and furthermore FileFinder.path_hook is very
> promiscuous--it will take over module loading for *any* directory on
> sys.path, regardless what the file extensions are in that directory.
> So although this mechanism is provided by the stdlib, it can't really
> be used for this purpose without breaking imports of normal modules
> (and maybe it's not intended for that purpose, but the documentation
> is unclear).
>
> There are a number of different ways one could get around this.  One
> might be to pass FileFinder.path_hook loaders/extension pairs for all
> the basic file types known by the Python interpreter.  Unfortunately
> there's no great way to get that information.  *I* know that I want to
> support .py, .pyc, .so etc. files, and I know which loaders to use for
> them.  But that's really information that should belong to the Python
> interpreter, and not something that should be reverse-engineered.  In
> fact, there is such a mapping provided by
> importlib.machinery._get_supported_file_loaders(), but this is not a
> publicly documented function.
>
> One could probably think of other workarounds.  For example you could
> implement a custom sys.meta_path hook.  But I think it shouldn't be
> necessary to go to higher levels of abstraction in order to do
> this--the default sys.path handler should be able to handle this use
> case.
>
> In order to support adding support for new file types to
> sys.path_hooks, I ended up implementing the following hack:
>
> #############################################################
> import os
> import sys
>
> from importlib.abc import PathEntryFinder
>
>
> @PathEntryFinder.register
> class MetaFileFinder:
>     """
>     A 'middleware', if you will, between the PathFinder sys.meta_path hook,
>     and sys.path_hooks hooks--particularly FileFinder.
>
>     The hook returned by FileFinder.path_hook is rather 'promiscuous' in
> that
>     it will handle *any* directory.  So if one wants to insert another
>     FileFinder.path_hook into sys.path_hooks, that will totally take over
>     importing for any directory, and previous path hooks will be ignored.
>
>     This class provides its own sys.path_hooks hook as follows: If inserted
>     on sys.path_hooks (it should be inserted early so that it can supersede
>     anything else).  Its find_spec method then calls each hook on
>     sys.path_hooks after itself and, for each hook that can handle the
> given
>     sys.path entry, it calls the hook to create a finder, and calls that
>     finder's find_spec.  So each sys.path_hooks entry is tried until a
> spec is
>     found or all finders are exhausted.
>     """
>
>     def __init__(self, path):
>         if not os.path.isdir(path):
>             raise ImportError('only directories are supported', path=path)
>
>         self.path = path
>         self._finder_cache = {}
>
>     def __repr__(self):
>         return '{}({!r})'.format(self.__class__.__name__, self.path)
>
>     def find_spec(self, fullname, target=None):
>         if not sys.path_hooks:
>             return None
>
>         for hook in sys.path_hooks:
>             if hook is self.__class__:
>                 continue
>
>             finder = None
>             try:
>                 if hook in self._finder_cache:
>                     finder = self._finder_cache[hook]
>                     if finder is None:
>                         # We've tried this finder before and got an
> ImportError
>                         continue
>             except TypeError:
>                 # The hook is unhashable
>                 pass
>
>             if finder is None:
>                 try:
>                     finder = hook(self.path)
>                 except ImportError:
>                     pass
>
>             try:
>                 self._finder_cache[hook] = finder
>             except TypeError:
>                 # The hook is unhashable for some reason so we don't bother
>                 # caching it
>                 pass
>
>             if finder is not None:
>                 spec = finder.find_spec(fullname, target)
>                 if spec is not None:
>                     return spec
>
>         # Module spec not found through any of the finders
>         return None
>
>     def invalidate_caches(self):
>         for finder in self._finder_cache.values():
>             finder.invalidate_caches()
>
>     @classmethod
>     def install(cls):
>         sys.path_hooks.insert(0, cls)
>         sys.path_importer_cache.clear()
>
> #############################################################
>
> This works, for example, like:
>
> >>> MetaFileFinder.install()
> >>> sys.path_hooks.append(FileFinder.path_hook((SourceFileLoader,
> ['.foo'])))
>
> And now, .foo modules are importable, without breaking support for the
> built-in module types.
>
> This is still overkill though.  I feel like there should instead be a
> way to, say, extend a sys.path_hooks hook based on FileFinder so as to
> be able to support loading other file types, without having to go
> above the default sys.meta_path hooks.
>
> A small, but related problem I noticed in the way FileFinder.path_hook
> is implemented, is that for almost *every directory* that gets cached
> in sys.path_importer_cache, a new FileFinder instance is created with
> its own self._loaders attribute, each containing a copy of the same
> list of (loader, extensions) tuples.  I calculated that on one large
> project this alone accounted for nearly 1 MB.  Not a big deal in the
> grand scheme of things, but still a bit overkill.
>
> ISTM it would kill two birds with one stone if FileFinder were
> changed, or there were a subclass thereof, that had a class attribute
> containing the standard loader/extension mappings.  This in turn could
> simply be appended to in order to support new extension types.
>
> Thanks,
> E
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180220/db384fa5/attachment-0001.html>


More information about the Python-ideas mailing list