On Oct 18, 2017, at 10:52 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
1. Specification
I’m in favor, although one question I guess is whether it should be a a PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since without (2), its just another file in the .dist-info directory and that doesn’t actually need standardized at all). I don’t think that this will be a very controversial PEP though, and should be pretty easy.
2. Caching
I’m also in favor of this. Although I would suggest SQLite rather than a JSON file for the primary reason being that a JSON file isn’t multiprocess safe without being careful (and possibly introducing locking) whereas SQLite has already solved that problem. One possible further enhancement to your proposal is to try and think of a way to have a singular cache, since we can include the sys.path entry as part of the data inside the cache, having a singular cache means we can reduce the the number of files we have to open down to a single file. The biggest problem I see with this, is it opens up questions about how we handle things like user installs… so maybe a cache DB per sys.path entry is the best way. I think we could use something like SQLite’s ATTACH DATABASE command to add multiple DBs to the same SQLite connection to be able to query across all of the entries with a single query. One downside to this is that SQLite is an optional module in Python so it may not exist, although we could implement that so that we just bypass the cache always in that case (and probably raise a warning?) so things continue to work, they will just be slower. I know that Twisted has used a cache file for awhile for plugins (so a similiar use case) so I wonder if they would have any opinions or insight into this as well.