[Distutils] Entry points: specifying and caching

Doug Hellmann doug at doughellmann.com
Fri Oct 20 15:26:46 EDT 2017


Excerpts from Thomas Kluyver's message of 2017-10-20 19:37:45 +0100:
> On Fri, Oct 20, 2017, at 07:24 PM, Doug Hellmann wrote:
> > I have been trying to find time to do something like that within
> > stevedore for a while to solve some client-side startup performance
> > issues with the OpenStack client. I would be happy to help add it
> > to entrypoints instead and use it from there.
> > 
> > Thomas, please me know how I can help.
> 
> Thanks Doug! For starters, I'd be interested to hear any plans you have
> for how to tackle caching, or any thoughts you have on the rough plan I
> described before. If you're happy with the concepts, I'll have a go at
> implementing it. I'll probably consider it experimental until there's a
> hooks mechanism to trigger rebuilding the cache when packages are
> installed or uninstalled.
> 
> Thomas

I assumed that the user loading the plugins might not be able to
write to any of the directories on sys.path (aside from "." and we
don't want to put a cache file there), so my plan was to build the
cache the first time entry points were scanned and use appdirs [1]
to pick a cache location specific to the user.  I thought I would
use the value of sys.path as a string (joining the paths together
with a separator of some sort) to create a hash for the cache file
ID. Some of that may be obviated if we assume a setuptools hook that
lets us update the cache(s) when a package is installed.

I also thought I'd provide a command line tool to generate the cache
just in case it became corrupted or if someone wanted to update it
by hand for some other reason, similar to Nick's locate/updatedb
parallel UX example (and re-reading your email, I see you mention this,
too).

I hadn't gone as far as deciding on a file format, but sqlite, JSON,
and INI (definitely something built-in) were all on my mind.  I
planned to see if we would actually gain enough of a boost just by
placing a separate file for each dist in a single cache directory,
rather than trying to merge everything into one file. In addition
to eliminating the concurrency issue, that approach might have the
additional benefit of simplifying operating system packages, because
they could just add a new file to the package instead of having to
run a command to update the cache when a package was installed (if
the file is the same format as entry_points.txt but with a different
name, that's even simpler since it's just a copy of a file that
will already be available during packaging).

Your idea of having a cache file per directory on sys.path is also
interesting, though I have to admit I'm not familiar enough with
the import machinery to know if it's easy to determine the containing
directory for a dist to find the right cache to update. I am
interested in hearing more details about what you planned there.

I would also like to compare the performance of a few approaches
(1 file per sys.path hash using INI, JSON, and sqlite; one file per
entry on sys.path using the same formats) using a significant number
of plugins (~100?) before we decide.

I agree with your statement in the original email that applications
should be able to disable the cache. I'm not sure it makes sense
to have a mode that only reads from a cache, but I may just not see
the use case for that.

What's our next step?

Doug

[1] https://pypi.python.org/pypi/appdirs/1.4.3


More information about the Distutils-SIG mailing list