I agree. The "malware" problem is really a "how do I understand which hooks run in each environment" problem. The hooks could slow down or confuse, frustrate people in ways that were unrelated to any malicious intent.

The caching could just be a more efficient, lossless representation of the *.dist/egg-info data model.
Would something as simple as a file per sys.path with the 'last modified by installer' date be helpful? You could check those to determine whether your cache was out of date.

Another option would be to try to investigate whether the per-sys-path operations that 'import x' has to do anyway can be cached and shared with pkg_resources?

On Thu, Oct 26, 2017 at 8:21 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
On 26 October 2017 at 18:33, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Nathaniel raises the point that it may be easier to convince other package managers to regenerate an entry points cache than to call arbitrary Python hooks on install.

At least for RPM, we have file triggers now, whereby system packages can register a hook to say "Any time another package touches a file under <path of interest> I want to know about it".

That means the exact semantics of any RPM integration would likely end up just living in a file trigger, so it wouldn't matter to much whether that trigger was "refresh these predefined caches" or "run any installed hooks based on the defined Python level metadata".

However, I expect it would be much easier to define a "optionally export data for caching in a more efficient key value store" API than it would be to define an API for arbitrary pre-/post- [un]install hooks. In particular, a caching API is much easier to *repair*, since the "source of truth" remains the installation DB itself - the cache is just to speed up runtime lookups.


Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia
Distutils-SIG maillist  -  Distutils-SIG@python.org