Thanks Steve, Chris,

On Tue, Feb 7, 2017, at 04:49 PM, Chris Wilcox wrote:

I may be able to help jump-start this a bit and provide a platform for this to run on. I deployed a small service that scans PyPI to figure out statistics on Python 2 vs Python 3 support using PyPI Classifiers. The source is on GitHub: It watches the PyPI updates feed and refreshes entries for packages as they show up as modified. It should be possible to add your lib, query, and add an additional row or two to the result. I am happy to work together on this. Also, the data is stored in an Azure Table Storage which has rest endpoints (and a Python SDK) that makes getting the published data straight-forward.

I had a quick look through this, and it does look like it should provide a useful framework for scanning PyPI and updating the results. :-)

What I'm proposing differs in that it would need to download files from PyPI - basically all of them, if we're thorough about it. I imagine that's going to involve a lot of data transfer. Do we know what order of magnitude we're talking about? Is it so large that we should be thinking of running the scanner in the same data centre as the file storage?