<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Mar 2, 2013 at 10:36 AM, Nick Coghlan <span dir="ltr"><<a href="mailto:ncoghlan@gmail.com" target="_blank">ncoghlan@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Sat, Mar 2, 2013 at 12:31 PM, Brett Cannon <<a href="mailto:brett@python.org">brett@python.org</a>> wrote:<br>


> As of right now, importlib keeps a cache of what is in a directory for its<br>

> file finder instances. It uses mtime on the directory to try and detect when<br>

> it has changed to know when to refresh the cache. But thanks to mtime<br>

> granularities of up to a second, it is only a heuristic that isn't totally<br>

> reliable, especially across filesystems on different OSs.<br>

><br>

> This is why importlib.invalidate_caches() came into being. If you look in<br>

> our test suite you will see it peppered around where a module is created on<br>

> the fly to make sure that mtime granularity isn't a problem. But it somewhat<br>

> negates the point of the mtime heuristic when you have to make this function<br>

> call regardless to avoid potential race conditions.<br>

><br>

> <a href="http://bugs.python.org/issue17330" target="_blank">http://bugs.python.org/issue17330</a> originally suggested trying to add another<br>

> heuristic to determine when to invalidate the cache. But even with the<br>

> suggestion it's still iffy and in no way foolproof.<br>

><br>

> So the current idea is to just drop the invalidation heuristic and go<br>

> full-blown reliance on calls to importlib.invalidate_caches() as necessary.<br>

> This makes code more filesystem-agnostic and protects people from<br>

> hard-to-detect errors when importlib only occasionally doesn't detect new<br>

> modules (I know it drove me nuts for a while when the buildbots kept failing<br>

> sporadically and only on certain OSs).<br>

><br>

> I would have just made the change but Antoine wanted it brought up here<br>

> first to make sure that no one was heavily relying on the current setup. So<br>

> if you have a good, legitimate reason to keep the reliance on mtime for<br>

> cache invalidation please speak up. But since the common case will never<br>

> care about any of this (how many people generate modules on the fly to being<br>

> with?) and to be totally portable you need to call<br>

> importlib.invalidate_caches() anyway, it's going to take a lot to convince<br>

> me to keep it.<br>

<br>

</div></div>I think you should keep it. A long running service that periodically<br>

scans the importers for plugins doesn't care if modules take a few<br>

extra seconds to show up, it just wants to see them eventually.<br>

Installers (or filesystem copy or move operations!) have no way to<br>

inform arbitrary processes that new files have been added.<br></blockquote><div><br></div><div style>But if they are doing the scan they can also easily invalidate the caches before performing the scan.</div><div> </div>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

It's that case where the process that added the modules is separate<br>

from the process scanning for them, and the communication is one way,<br>

where the heuristic is important. Explicit invalidation only works<br>

when they're the *same* process, or when they're closely coupled so<br>

the adding process can tell the scanning process to invalidate the<br>

caches (our test suite is mostly the former although there are a<br>

couple of cases of the latter).<br></blockquote><div><br></div><div style>That's only true if the scanning process has no idea that another process is adding modules. If there is an expectation then it doesn't matter who added the file as you just assume cache invalidation is necessary.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I have no problem with documenting invalidate_caches() as explicitly<br>

required for correctness when writing new modules which are to be read<br>

back by the same process, or when there is a feedback path between two<br>

processes that may be confusing if the cache invalidation is delayed.<br></blockquote><div><br></div><div style>Already documented as such.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


The implicit invalidation is only needed to pick up modules written by<br>

*another* process.<br>

<br>

In addition, it may be appropriate for importlib to offer a<br>

"write_module" method that accepts (module name, target path,<br>

contents). This would:<br>

<br>

1. Allow in-process caches to be invalidated implicitly and<br>

selectively when new modules are created<br></blockquote><div><br></div><div style>I don't think that's necessary. If people don't want to blindly clear all caches for a file they can write the file, search the keys in sys.path_importer_cache for the longest prefix for the newly created file, and then call the invalidate_cache() method on that explicit finder.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2. Allow importers to abstract write access in addition to read access<br></blockquote><div><br></div><div style>That's heading down the virtual filesystem path which I don't want to go down any farther than I have to. The API is big enough as it is and the more entangled it gets the harder it is to change/fix, especially with the finders having a nice, small API compared to the loaders.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

3. Allow the import system to complain at time of writing if the<br>

desired module name and target path don't actually match given the<br>

current import system state.<br></blockquote><div><br></div><div style>I think that's more checking than necessary for a use case that isn't that common. </div></div></div></div>