[Python-Dev] Planning on removing cache invalidation for file finders

Sat Mar 2 17:16:28 CET 2013

On Sat, Mar 2, 2013 at 10:36 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Sat, Mar 2, 2013 at 12:31 PM, Brett Cannon <brett at python.org> wrote:
> > As of right now, importlib keeps a cache of what is in a directory for
> its
> > file finder instances. It uses mtime on the directory to try and detect
> when
> > it has changed to know when to refresh the cache. But thanks to mtime
> > granularities of up to a second, it is only a heuristic that isn't
> totally
> > reliable, especially across filesystems on different OSs.
> >
> > This is why importlib.invalidate_caches() came into being. If you look in
> > our test suite you will see it peppered around where a module is created
> on
> > the fly to make sure that mtime granularity isn't a problem. But it
> somewhat
> > negates the point of the mtime heuristic when you have to make this
> function
> > call regardless to avoid potential race conditions.
> >
> > http://bugs.python.org/issue17330 originally suggested trying to add
> another
> > heuristic to determine when to invalidate the cache. But even with the
> > suggestion it's still iffy and in no way foolproof.
> >
> > So the current idea is to just drop the invalidation heuristic and go
> > full-blown reliance on calls to importlib.invalidate_caches() as
> necessary.
> > This makes code more filesystem-agnostic and protects people from
> > hard-to-detect errors when importlib only occasionally doesn't detect new
> > modules (I know it drove me nuts for a while when the buildbots kept
> failing
> > sporadically and only on certain OSs).
> >
> > I would have just made the change but Antoine wanted it brought up here
> > first to make sure that no one was heavily relying on the current setup.
> So
> > if you have a good, legitimate reason to keep the reliance on mtime for
> > cache invalidation please speak up. But since the common case will never
> > care about any of this (how many people generate modules on the fly to
> being
> > with?) and to be totally portable you need to call
> > importlib.invalidate_caches() anyway, it's going to take a lot to
> convince
> > me to keep it.
>
> I think you should keep it. A long running service that periodically
> scans the importers for plugins doesn't care if modules take a few
> extra seconds to show up, it just wants to see them eventually.
> Installers (or filesystem copy or move operations!) have no way to
> inform arbitrary processes that new files have been added.
>

But if they are doing the scan they can also easily invalidate the caches
before performing the scan.

>
> It's that case where the process that added the modules is separate
> from the process scanning for them, and the communication is one way,
> where the heuristic is important. Explicit invalidation only works
> when they're the *same* process, or when they're closely coupled so
> the adding process can tell the scanning process to invalidate the
> caches (our test suite is mostly the former although there are a
> couple of cases of the latter).
>

That's only true if the scanning process has no idea that another process
is adding modules. If there is an expectation then it doesn't matter who
added the file as you just assume cache invalidation is necessary.

>
> I have no problem with documenting invalidate_caches() as explicitly
> required for correctness when writing new modules which are to be read
> back by the same process, or when there is a feedback path between two
> processes that may be confusing if the cache invalidation is delayed.
>

Already documented as such.

> The implicit invalidation is only needed to pick up modules written by
> *another* process.
>
> In addition, it may be appropriate for importlib to offer a
> "write_module" method that accepts (module name, target path,
> contents). This would:
>
> 1. Allow in-process caches to be invalidated implicitly and
> selectively when new modules are created
>

I don't think that's necessary. If people don't want to blindly clear all
caches for a file they can write the file, search the keys in
sys.path_importer_cache for the longest prefix for the newly created file,
and then call the invalidate_cache() method on that explicit finder.

2. Allow importers to abstract write access in addition to read access
>

That's heading down the virtual filesystem path which I don't want to go
down any farther than I have to. The API is big enough as it is and the
more entangled it gets the harder it is to change/fix, especially with the
finders having a nice, small API compared to the loaders.

> 3. Allow the import system to complain at time of writing if the
> desired module name and target path don't actually match given the
> current import system state.
>

I think that's more checking than necessary for a use case that isn't that
common.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130302/53088d22/attachment.html>