On Mon, 28 Nov 2016 at 14:49 Steve Dower <steve.dower@python.org> wrote:
On 28Nov2016 1433, Steve Dower wrote:
> On 28Nov2016 1419, Nathaniel Smith wrote:
>> I'd suggest that we additional specify that if we find a
>> foo.missing.py, then the code is executed but -- unlike a regular
>> module load -- it's not automatically inserted into
>> sys.modules["foo"]. That seems like it could only create confusion.
>> And it doesn't restrict functionality, because if someone really wants
>> to implement some clever shenanigans, they can always modify
>> sys.modules["foo"] by hand.
>
> In before Brett says "you can do this with an import hook", because,
> well, we can do this with an import hook :)

And since I suggested it, here's a rough proof-of-concept:

import importlib.abc
import os
import sys

class MissingPathFinder(importlib.abc.MetaPathFinder):
     def find_spec(self, fullname, path, target=None):
         for p in (path or sys.path):
             file = os.path.join(p, fullname + ".missing")
             if os.path.isfile(file):
                 with open(file, 'r', encoding='utf-8') as f:
                     raise ModuleNotFoundError(f.read())

sys.meta_path.append(MissingPathFinder())
import foo


Add a "foo.missing" file to your working directory and you'll get the
message from that instead of the usual one.

Since this PEP directly affects import I'm going to weigh in.

First, this won't necessarily create more stat calls depending on how it's implemented. importlib.machinery.FileFinder which does the searching for files on sys.path caches directory contents for as long as the granularity of the file system's mtime is (e.g. a 1 second mtime granularity means directory contents are cached for 1 second). This means that if the check occurs within that granularity (whether immediately after looking for *.py or in some way through a second pass) then there's no file system overhead.

Second, as proposed the PEP probably shouldn't change importlib.machinery.SourceFileLoader and instead should return some new loader that only handles these *.missing.py files (just like a different loader is returned for extension modules). This allows for the loader to be simpler and avoids making any custom loader from no longer implementing current semantics (although I have tried to structure things in importlib to make it so subclassing is an attractive option for people so this isn't vital, just something to at least consider).

Third, Steve channeled me properly and this actually doesn't require any changes to any pre-existing code and can instead be implemented as an importlib.abc.MetaPathFinder that is at the end of sys.meta_path which means there wouldn't be any local shadowing of modules available farther down sys.path (although this would lead to more stat calls).

Fourth, if you make a meta-path finder and use static data you do away with any performance issue with the file system. And since it would be installed at the end of the sys.meta_path -- and thus after importlib.machinery.PathFinder -- you effectively shadow it with a successful import and so there's no need to worry about the information leaking out unless someone mucks with sys.meta_path.

Fifth, people have asked for some way to catch/log/manipulate the response of import when a module isn't found, e.g. renaming modules in 2/3 migrations was the first major instance of this, but I've heard others wanting to log this to detect what modules they should install for users in a cloud environment. This is a specific solution to a general problem that some have asked for so it might warrant thinking about whether a more general solution could work (but never enough to warrant me trying to solve it above other issues).

Sixth, this would be easier to deal with if import got refactored into its own object and out of the sys.module for easier manipulation of the import process. ;)

Seventh, these *.missing.py files if they are directly executed are totally going to be abused like *.pth files, I can just feel it in my bones. We need to be okay with this if we accept this PEP as-is.