On Oct 25, 2013, at 9:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

Most downloads happen through the Fastly CDN - the numbers are derived from the Fastly logs rather than being direct. The code that does that log analysis is in https://bitbucket.org/pypa/pypi/src (Donald would be able to provide a more direct reference to the relevant source).


It buckets data and puts it into redis. Really “dumb” but it works well enough until better infrastructure can be put into place.

However, separating downloads between mirroring, automatic deployments and integration and actual direct downloads isn't something PyPI has ever done, or is really able to do in a systematic way. "pip install thatproject" (and equivalent commands for other tools) looks the same to PyPI regardless of whether it's a human or a script running the command.

That's why Donald's recent download analysis was able to split it up by tools, but not by purpose.

Yea this part is hard/impossible :/

Now, exposing more of that analytical data to package owners on an ongoing basis is an interesting idea, but one that would be a *very* long way down the priority list for the current development team.

Nice analytics for package owners is on the road map, but it’s, as you mentioned, down the road map a ways.

However, if someone else were to figure out a way to expose the data users needed to do their own analysis, it might be possible to support that, although it may be better to look at offering that through Warehouse (aka PyPI.next) rather than the existing PyPI software (https://github.com/dstufft/warehouse). There's a demo instance (using live data) running at preview-pypi.python.org, but that's mostly focused on backwards compatibility testing for the tool APIs at this point rather than being navigable through a web browser.

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA