data:image/s3,"s3://crabby-images/0f7db/0f7db553b86d4d536df726a607a9cd7b06b9145c" alt=""
Hello, Brief problem statement: Let's say I have a custom file type (say, with extension .foo) and these .foo files are included in a package (along with other Python modules with standard extensions like .py and .so), and I want to make these .foo files importable like any other module. On its face, importlib.machinery.FileFinder makes this easy. I make a loader for my custom file type (say, FooSourceLoader), and I can use the FileFinder.path_hook helper like: sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo']))) sys.path_importer_cache.clear() Great--now I can import my .foo modules like any other Python module. However, any standard Python modules now cannot be imported. The way PathFinder sys.meta_path hook works, sys.path_hooks entries are first-come-first-serve, and furthermore FileFinder.path_hook is very promiscuous--it will take over module loading for *any* directory on sys.path, regardless what the file extensions are in that directory. So although this mechanism is provided by the stdlib, it can't really be used for this purpose without breaking imports of normal modules (and maybe it's not intended for that purpose, but the documentation is unclear). There are a number of different ways one could get around this. One might be to pass FileFinder.path_hook loaders/extension pairs for all the basic file types known by the Python interpreter. Unfortunately there's no great way to get that information. *I* know that I want to support .py, .pyc, .so etc. files, and I know which loaders to use for them. But that's really information that should belong to the Python interpreter, and not something that should be reverse-engineered. In fact, there is such a mapping provided by importlib.machinery._get_supported_file_loaders(), but this is not a publicly documented function. One could probably think of other workarounds. For example you could implement a custom sys.meta_path hook. But I think it shouldn't be necessary to go to higher levels of abstraction in order to do this--the default sys.path handler should be able to handle this use case. In order to support adding support for new file types to sys.path_hooks, I ended up implementing the following hack: ############################################################# import os import sys from importlib.abc import PathEntryFinder @PathEntryFinder.register class MetaFileFinder: """ A 'middleware', if you will, between the PathFinder sys.meta_path hook, and sys.path_hooks hooks--particularly FileFinder. The hook returned by FileFinder.path_hook is rather 'promiscuous' in that it will handle *any* directory. So if one wants to insert another FileFinder.path_hook into sys.path_hooks, that will totally take over importing for any directory, and previous path hooks will be ignored. This class provides its own sys.path_hooks hook as follows: If inserted on sys.path_hooks (it should be inserted early so that it can supersede anything else). Its find_spec method then calls each hook on sys.path_hooks after itself and, for each hook that can handle the given sys.path entry, it calls the hook to create a finder, and calls that finder's find_spec. So each sys.path_hooks entry is tried until a spec is found or all finders are exhausted. """ def __init__(self, path): if not os.path.isdir(path): raise ImportError('only directories are supported', path=path) self.path = path self._finder_cache = {} def __repr__(self): return '{}({!r})'.format(self.__class__.__name__, self.path) def find_spec(self, fullname, target=None): if not sys.path_hooks: return None for hook in sys.path_hooks: if hook is self.__class__: continue finder = None try: if hook in self._finder_cache: finder = self._finder_cache[hook] if finder is None: # We've tried this finder before and got an ImportError continue except TypeError: # The hook is unhashable pass if finder is None: try: finder = hook(self.path) except ImportError: pass try: self._finder_cache[hook] = finder except TypeError: # The hook is unhashable for some reason so we don't bother # caching it pass if finder is not None: spec = finder.find_spec(fullname, target) if spec is not None: return spec # Module spec not found through any of the finders return None def invalidate_caches(self): for finder in self._finder_cache.values(): finder.invalidate_caches() @classmethod def install(cls): sys.path_hooks.insert(0, cls) sys.path_importer_cache.clear() ############################################################# This works, for example, like:
MetaFileFinder.install() sys.path_hooks.append(FileFinder.path_hook((SourceFileLoader, ['.foo'])))
And now, .foo modules are importable, without breaking support for the built-in module types. This is still overkill though. I feel like there should instead be a way to, say, extend a sys.path_hooks hook based on FileFinder so as to be able to support loading other file types, without having to go above the default sys.meta_path hooks. A small, but related problem I noticed in the way FileFinder.path_hook is implemented, is that for almost *every directory* that gets cached in sys.path_importer_cache, a new FileFinder instance is created with its own self._loaders attribute, each containing a copy of the same list of (loader, extensions) tuples. I calculated that on one large project this alone accounted for nearly 1 MB. Not a big deal in the grand scheme of things, but still a bit overkill. ISTM it would kill two birds with one stone if FileFinder were changed, or there were a subclass thereof, that had a class attribute containing the standard loader/extension mappings. This in turn could simply be appended to in order to support new extension types. Thanks, E
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
Basically what you're after is a way to extend the default finder with a new file type. Historically you didn't want this because of the performance hit of the extra stat call to check that new file extension (this has been greatly alleviated in Python 3 through the caching of directory contents). But I would still argue that you don't necessarily want this for e.g. the stdlib or any other random project which might just happen to have a file with the same file extension as the one you want to have special support for. I also don't think we want a class attribute to contains the default loaders since not everyone will want those default semantics in all cases either. Since we're diving into deep levels of customization I would askew anything that makes assumptions for what you want. I think the best we could consider is making importlib.machinery._get_supported_loaders() a public API. That way you can easily construct a finder with the default loaders plus your custom ones. After that you can then provide a custom sys.path_hooks entry that recognizes the directories which contain your custom file type. If that seems reasonable then feel free to open an enhancement request at bugs.python.org to discuss the API and then we can discuss how to implement a PR for it. On Wed, 7 Feb 2018 at 07:04 Erik Bray <erik.m.bray@gmail.com> wrote:
Hello,
Brief problem statement: Let's say I have a custom file type (say, with extension .foo) and these .foo files are included in a package (along with other Python modules with standard extensions like .py and .so), and I want to make these .foo files importable like any other module.
On its face, importlib.machinery.FileFinder makes this easy. I make a loader for my custom file type (say, FooSourceLoader), and I can use the FileFinder.path_hook helper like:
sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo']))) sys.path_importer_cache.clear()
Great--now I can import my .foo modules like any other Python module. However, any standard Python modules now cannot be imported. The way PathFinder sys.meta_path hook works, sys.path_hooks entries are first-come-first-serve, and furthermore FileFinder.path_hook is very promiscuous--it will take over module loading for *any* directory on sys.path, regardless what the file extensions are in that directory. So although this mechanism is provided by the stdlib, it can't really be used for this purpose without breaking imports of normal modules (and maybe it's not intended for that purpose, but the documentation is unclear).
There are a number of different ways one could get around this. One might be to pass FileFinder.path_hook loaders/extension pairs for all the basic file types known by the Python interpreter. Unfortunately there's no great way to get that information. *I* know that I want to support .py, .pyc, .so etc. files, and I know which loaders to use for them. But that's really information that should belong to the Python interpreter, and not something that should be reverse-engineered. In fact, there is such a mapping provided by importlib.machinery._get_supported_file_loaders(), but this is not a publicly documented function.
One could probably think of other workarounds. For example you could implement a custom sys.meta_path hook. But I think it shouldn't be necessary to go to higher levels of abstraction in order to do this--the default sys.path handler should be able to handle this use case.
In order to support adding support for new file types to sys.path_hooks, I ended up implementing the following hack:
############################################################# import os import sys
from importlib.abc import PathEntryFinder
@PathEntryFinder.register class MetaFileFinder: """ A 'middleware', if you will, between the PathFinder sys.meta_path hook, and sys.path_hooks hooks--particularly FileFinder.
The hook returned by FileFinder.path_hook is rather 'promiscuous' in that it will handle *any* directory. So if one wants to insert another FileFinder.path_hook into sys.path_hooks, that will totally take over importing for any directory, and previous path hooks will be ignored.
This class provides its own sys.path_hooks hook as follows: If inserted on sys.path_hooks (it should be inserted early so that it can supersede anything else). Its find_spec method then calls each hook on sys.path_hooks after itself and, for each hook that can handle the given sys.path entry, it calls the hook to create a finder, and calls that finder's find_spec. So each sys.path_hooks entry is tried until a spec is found or all finders are exhausted. """
def __init__(self, path): if not os.path.isdir(path): raise ImportError('only directories are supported', path=path)
self.path = path self._finder_cache = {}
def __repr__(self): return '{}({!r})'.format(self.__class__.__name__, self.path)
def find_spec(self, fullname, target=None): if not sys.path_hooks: return None
for hook in sys.path_hooks: if hook is self.__class__: continue
finder = None try: if hook in self._finder_cache: finder = self._finder_cache[hook] if finder is None: # We've tried this finder before and got an ImportError continue except TypeError: # The hook is unhashable pass
if finder is None: try: finder = hook(self.path) except ImportError: pass
try: self._finder_cache[hook] = finder except TypeError: # The hook is unhashable for some reason so we don't bother # caching it pass
if finder is not None: spec = finder.find_spec(fullname, target) if spec is not None: return spec
# Module spec not found through any of the finders return None
def invalidate_caches(self): for finder in self._finder_cache.values(): finder.invalidate_caches()
@classmethod def install(cls): sys.path_hooks.insert(0, cls) sys.path_importer_cache.clear()
#############################################################
This works, for example, like:
MetaFileFinder.install() sys.path_hooks.append(FileFinder.path_hook((SourceFileLoader, ['.foo'])))
And now, .foo modules are importable, without breaking support for the built-in module types.
This is still overkill though. I feel like there should instead be a way to, say, extend a sys.path_hooks hook based on FileFinder so as to be able to support loading other file types, without having to go above the default sys.meta_path hooks.
A small, but related problem I noticed in the way FileFinder.path_hook is implemented, is that for almost *every directory* that gets cached in sys.path_importer_cache, a new FileFinder instance is created with its own self._loaders attribute, each containing a copy of the same list of (loader, extensions) tuples. I calculated that on one large project this alone accounted for nearly 1 MB. Not a big deal in the grand scheme of things, but still a bit overkill.
ISTM it would kill two birds with one stone if FileFinder were changed, or there were a subclass thereof, that had a class attribute containing the standard loader/extension mappings. This in turn could simply be appended to in order to support new extension types.
Thanks, E _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (2)
-
Brett Cannon
-
Erik Bray