[Distutils] Indexing modules in Python distributions

Thomas Kluyver thomas at kluyver.me.uk
Wed Feb 8 13:14:38 EST 2017

Thanks Steve, Chris,

On Tue, Feb 7, 2017, at 04:49 PM, Chris Wilcox wrote:

> I may be able to help jump-start this a bit and provide a platform for
> this to run on. I deployed a small service that scans PyPI to figure
> out statistics on Python 2 vs Python 3 support using PyPI Classifiers.
> The source is on GitHub:  https://github.com/crwilcox/PyPI-Gatherer.
> It watches the PyPI updates feed and refreshes entries for packages as
> they show up as modified. It should be possible to add your lib,
> query, and add an additional row or two to the result. I am happy to
> work together on this. Also, the data is stored in an Azure Table
> Storage which has rest endpoints (and a Python SDK) that makes getting
> the published data straight-forward.

I had a quick look through this, and it does look like it should provide
a useful framework for scanning PyPI and updating the results. :-)

What I'm proposing differs in that it would need to download files from
PyPI - basically all of them, if we're thorough about it. I imagine
that's going to involve a lot of data transfer. Do we know what order of
magnitude we're talking about? Is it so large that we should be thinking
of running the scanner in the same data centre as the file storage?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170208/316dd389/attachment.html>

More information about the Distutils-SIG mailing list