[Python-Dev] Planning on removing cache invalidation for file finders

Sat Mar 2 16:36:04 CET 2013

On Sat, Mar 2, 2013 at 12:31 PM, Brett Cannon <brett at python.org> wrote:
> As of right now, importlib keeps a cache of what is in a directory for its
> file finder instances. It uses mtime on the directory to try and detect when
> it has changed to know when to refresh the cache. But thanks to mtime
> granularities of up to a second, it is only a heuristic that isn't totally
> reliable, especially across filesystems on different OSs.
>
> This is why importlib.invalidate_caches() came into being. If you look in
> our test suite you will see it peppered around where a module is created on
> the fly to make sure that mtime granularity isn't a problem. But it somewhat
> negates the point of the mtime heuristic when you have to make this function
> call regardless to avoid potential race conditions.
>
> http://bugs.python.org/issue17330 originally suggested trying to add another
> heuristic to determine when to invalidate the cache. But even with the
> suggestion it's still iffy and in no way foolproof.
>
> So the current idea is to just drop the invalidation heuristic and go
> full-blown reliance on calls to importlib.invalidate_caches() as necessary.
> This makes code more filesystem-agnostic and protects people from
> hard-to-detect errors when importlib only occasionally doesn't detect new
> modules (I know it drove me nuts for a while when the buildbots kept failing
> sporadically and only on certain OSs).
>
> I would have just made the change but Antoine wanted it brought up here
> first to make sure that no one was heavily relying on the current setup. So
> if you have a good, legitimate reason to keep the reliance on mtime for
> cache invalidation please speak up. But since the common case will never
> care about any of this (how many people generate modules on the fly to being
> with?) and to be totally portable you need to call
> importlib.invalidate_caches() anyway, it's going to take a lot to convince
> me to keep it.

I think you should keep it. A long running service that periodically
scans the importers for plugins doesn't care if modules take a few
extra seconds to show up, it just wants to see them eventually.
Installers (or filesystem copy or move operations!) have no way to
inform arbitrary processes that new files have been added.

It's that case where the process that added the modules is separate
from the process scanning for them, and the communication is one way,
where the heuristic is important. Explicit invalidation only works
when they're the *same* process, or when they're closely coupled so
the adding process can tell the scanning process to invalidate the
caches (our test suite is mostly the former although there are a
couple of cases of the latter).

I have no problem with documenting invalidate_caches() as explicitly
required for correctness when writing new modules which are to be read
back by the same process, or when there is a feedback path between two
processes that may be confusing if the cache invalidation is delayed.
The implicit invalidation is only needed to pick up modules written by
*another* process.

In addition, it may be appropriate for importlib to offer a
"write_module" method that accepts (module name, target path,
contents). This would:

1. Allow in-process caches to be invalidated implicitly and
selectively when new modules are created
2. Allow importers to abstract write access in addition to read access
3. Allow the import system to complain at time of writing if the
desired module name and target path don't actually match given the
current import system state.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia