[Distutils] Entry points: specifying and caching

Thu Oct 19 14:09:51 EDT 2017

> On Oct 19, 2017, at 12:14 PM, Thomas Kluyver <thomas at kluyver.me.uk> wrote:
> 
> On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
>> I’m in favor, although one question I guess is whether it should be a a
>> PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
>> without (2), its just another file in the .dist-info directory and that
>> doesn’t actually need standardized at all). I don’t think that this will
>> be a very controversial PEP though, and should be pretty easy.
> 
> I have opened a PR to document what is already there, without adding any
> new features. I think this is worth doing even if we don't change
> anything, since it's a de-facto standard used for different tools to
> interact.
> 
> https://github.com/pypa/python-packaging-user-guide/pull/390
> 
> We can still write a PEP for caching if necessary.

I think documenting what’s there is a reasonable goal, but if we’re going to add caching we should just PEP the whole thing changing it from a defect standard to an actual standard + caching. Generally we should only use non-PEP “specs” in places where we’re just trying to document what exists already, but where we’re not really happy with the current solution or we plan to alter it eventually.

For this, I think the entry points solution is generally a good one with some alterations (namely, the addition of caching)…. Although now that I think about it, maybe this isn’t really a packaging problem at all and I’m not sure that it benefits from standardization at all.

So stepping back a second, here’s what entrypoints provides today:

1. A way to implement a interface that some other package can provide implementations for.
2. A way to specify script wrappers that will be automatically generated.
3. A way to define extras that must be installed in order for a particular entry point to be available.

Off the bat I’m going to say we don’t need to worry about (2) in this hypothetical system, because I think the fact it is implemented currently via this system is mostly a historic accident, and it’s not something we should be looking at in the future. Script wrappers should have some dedicated metadata, not piggybacking off of the plugin system.

For (3) I don’t believe that what extras were installed is recorded anywhere, so I’m going to guess that this works by looking up what extras are *available* for a particular package and then seeing if all of the requirements of that distribution are satisfied. Assuming that’s the case then that’s not really something that requires deep integration with the packaging toolchain, it just needs the APIs to look those things up.

Finally we come to (1), which is in my opinion the meet of what you’re hoping to achieve here (and what most people are using entry points for outside of console scripts. What I notice about (1) is that it really has absolutely nothing to do with packaging at all. It would likely use some of the APIs provided by the packaging toolchain (for instance, the ability to add custom files to a .dist-info directory, the ability to iterate over installed packages, etc) but as a whole pip, setuptools, twine, PyPI, etc none of these things need to know anything about it.

EXCEPT, for the fact that with the desire to cache things, it would be beneficial to “hook” into the lifecycle of a package install. However I know that there are other plugin systems out there that would like to also be able to do that (Twisted Plugins come to mind) and that I think outside of plugin systems, such a mechanism is likely to be useful in general for other cases.

So heres a different idea that is a bit more ambitious but that I think is a better overall idea. Let entrypoints be a setuptools thing, and lets define some key lifecycle hooks during the installation of a package and some mechanism in the metadata to let other tools subscribe to those hooks. Then  a caching layer could be written for setuptools entrypoints to make that faster without requiring standardization, but also a whole new, better plugin system could to, Twisted plugins could benefit, etc [1].

One thing that I like about all of our work recently in packaging is a lot of it has been about making it so there isn’t just one standard set of tools, and I think that providing lifecycle hooks is another step along that path.

> 
>> I’m also in favor of this. Although I would suggest SQLite rather than a
>> JSON file for the primary reason being that a JSON file isn’t
>> multiprocess safe without being careful (and possibly introducing
>> locking) whereas SQLite has already solved that problem.
> 
> SQLite was actually my first thought, but from experience in Jupyter &
> IPython I'm wary of it - its built-in locking does not work well over
> NFS, and it's easy to corrupt the database. I think careful use of
> atomic writing can be more reliable (though that has given us some
> problems too).
> 
> That may be easier if there's one cache per user, though - we can
> perhaps try to store it somewhere that's not NFS.
> 

I don’t have a lot of experience using SQLite in this way so it’s entirely possible it’s not as robust as we want/need it to be. I’m not wedded to this idea (but then if we do what I said above, this idea becomes something for any individual implementation of plugins to decide and we don’t need to pick a standard here at all!).

[1] I realize the irony in saying a plugin system isn’t a packaging problem, so let’s define a plugin system for packaging hooks, but I think it can be very simple and not something designed to be reusable outside of that context and speed is less of a concern, etc.