Excerpts from Thomas Kluyver's message of 2017-10-20 19:37:45 +0100:
On Fri, Oct 20, 2017, at 07:24 PM, Doug Hellmann wrote:
I have been trying to find time to do something like that within stevedore for a while to solve some client-side startup performance issues with the OpenStack client. I would be happy to help add it to entrypoints instead and use it from there.
Thomas, please me know how I can help.
Thanks Doug! For starters, I'd be interested to hear any plans you have for how to tackle caching, or any thoughts you have on the rough plan I described before. If you're happy with the concepts, I'll have a go at implementing it. I'll probably consider it experimental until there's a hooks mechanism to trigger rebuilding the cache when packages are installed or uninstalled.
I assumed that the user loading the plugins might not be able to write to any of the directories on sys.path (aside from "." and we don't want to put a cache file there), so my plan was to build the cache the first time entry points were scanned and use appdirs  to pick a cache location specific to the user. I thought I would use the value of sys.path as a string (joining the paths together with a separator of some sort) to create a hash for the cache file ID. Some of that may be obviated if we assume a setuptools hook that lets us update the cache(s) when a package is installed.
I also thought I'd provide a command line tool to generate the cache just in case it became corrupted or if someone wanted to update it by hand for some other reason, similar to Nick's locate/updatedb parallel UX example (and re-reading your email, I see you mention this, too).
I hadn't gone as far as deciding on a file format, but sqlite, JSON, and INI (definitely something built-in) were all on my mind. I planned to see if we would actually gain enough of a boost just by placing a separate file for each dist in a single cache directory, rather than trying to merge everything into one file. In addition to eliminating the concurrency issue, that approach might have the additional benefit of simplifying operating system packages, because they could just add a new file to the package instead of having to run a command to update the cache when a package was installed (if the file is the same format as entry_points.txt but with a different name, that's even simpler since it's just a copy of a file that will already be available during packaging).
Your idea of having a cache file per directory on sys.path is also interesting, though I have to admit I'm not familiar enough with the import machinery to know if it's easy to determine the containing directory for a dist to find the right cache to update. I am interested in hearing more details about what you planned there.
I would also like to compare the performance of a few approaches (1 file per sys.path hash using INI, JSON, and sqlite; one file per entry on sys.path using the same formats) using a significant number of plugins (~100?) before we decide.
I agree with your statement in the original email that applications should be able to disable the cache. I'm not sure it makes sense to have a mode that only reads from a cache, but I may just not see the use case for that.
What's our next step?