[Distutils] Entry points: specifying and caching

Doug Hellmann doug at doughellmann.com
Fri Oct 20 14:24:29 EDT 2017

Excerpts from Nick Coghlan's message of 2017-10-20 14:42:09 +1000:
> On 20 October 2017 at 02:14, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> > On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
> > > I’m in favor, although one question I guess is whether it should be a a
> > > PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
> > > without (2), its just another file in the .dist-info directory and that
> > > doesn’t actually need standardized at all). I don’t think that this will
> > > be a very controversial PEP though, and should be pretty easy.
> >
> > I have opened a PR to document what is already there, without adding any
> > new features. I think this is worth doing even if we don't change
> > anything, since it's a de-facto standard used for different tools to
> > interact.
> >
> > https://github.com/pypa/python-packaging-user-guide/pull/390
> >
> > We can still write a PEP for caching if necessary.
> >
> +1 for that approach (PR for the status quo, PEP for a shared metadata
> caching design) from me
> Making the status quo more discoverable is valuable in its own right, and
> the only decisions we'll need to make for that are terminology
> clarification ones, not interoperability ones (this isn't like PEP 440 or
> 508 where we actually thought some of the default setuptools behaviour was
> slightly incorrect and wanted to change it).
> Figuring out a robust cross-platform network-file-system-tolerant metadata
> caching design on the other hand is going to be hard, and as Donald
> suggests, the right ecosystem level solution might be to define
> install-time hooks for package installation operations.
> > > I’m also in favor of this. Although I would suggest SQLite rather than a
> > > JSON file for the primary reason being that a JSON file isn’t
> > > multiprocess safe without being careful (and possibly introducing
> > > locking) whereas SQLite has already solved that problem.
> >
> > SQLite was actually my first thought, but from experience in Jupyter &
> > IPython I'm wary of it - its built-in locking does not work well over
> > NFS, and it's easy to corrupt the database. I think careful use of
> > atomic writing can be more reliable (though that has given us some
> > problems too).
> >
> > That may be easier if there's one cache per user, though - we can
> > perhaps try to store it somewhere that's not NFS.
> >
> I'm wondering if rather than jumping straight to a PEP, it may make sense
> to instead initially pursue this idea as a *non-*standard, implementation
> dependent thing specific to the "entrypoints" project. There are a *lot* of
> challenges to be taken into account for a truly universal metadata caching
> design, and it would be easy to fall into the trap of coming up with a
> design so complex that nobody can realistically implement it.
> Specifically, I'm thinking of a usage model along the lines of the
> updatedb/locate pair on *nix systems: `locate` gives you access to very
> fast searches of your filesystem, but it *doesn't* try to automagically
> keeps its indexes up to date. Instead, refreshing the indexes is handled by
> `updatedb`, and you can either rely on that being run automatically in a
> cron job, or else force an update with `sudo updatedb` when you want to use
> `locate`.
> For a project like entrypoints, what that might look like is that at
> *runtime*, you may implement a reasonably fast "cache freshness check",
> where you scanned the mtime of all the sys.path entries, and compared those
> to the mtime of the cache. If the cache looks up to date, then cool,
> otherwise emit a warning about the stale metadata cache, and then bypass it.
> The entrypoints project itself could then expose a
> `refresh-entrypoints-cache` command that could start out only supporting
> virtual environments, and then extend to per-user caching, and then finally
> (maybe) consider whether or not it wanted to support installation-wide
> caches (with the extra permissions management and cross-process and
> cross-system coordination that may imply).
> Such an approach would also tie in nicely with Donald's suggestion of
> reframing the ecosystem level question as "How should the entrypoints
> project request that 'refresh-entrypoints-cache' be run after every package
> installation or removal operation?", which in turn would integrate nicely
> with things like RPM file triggers (where the system `pip` package could
> set a file trigger that arranged for any properly registered Python package
> installation plugins to be run for every modification to site-packages
> while still appropriately managing the risk of running arbitrary code with
> elevated privileges)
> Cheers,
> Nick.

I have been trying to find time to do something like that within
stevedore for a while to solve some client-side startup performance
issues with the OpenStack client. I would be happy to help add it
to entrypoints instead and use it from there.

Thomas, please me know how I can help.


More information about the Distutils-SIG mailing list