[Distutils] Inflated download counts

Nick Coghlan ncoghlan at gmail.com
Sat Oct 26 03:35:12 CEST 2013

On 26 Oct 2013 04:51, "Dustin Oprea" <myselfasunder at gmail.com> wrote:
> Is there any way that we can use the user-agent to either identify users
or identify mirrors?
> Can we pass a flag or signature from "pip"? It won't reflect downloads
from website, but this probably won't affect the numbers much. In this
case, we might just reword it to "pip Downloads".
> This is a distressing issue. It doesn't seem like package owners have any
usable usage data.

Most downloads happen through the Fastly CDN - the numbers are derived from
the Fastly logs rather than being direct. The code that does that log
analysis is in https://bitbucket.org/pypa/pypi/src (Donald would be able to
provide a more direct reference to the relevant source).

However, separating downloads between mirroring, automatic deployments and
integration and actual direct downloads isn't something PyPI has ever done,
or is really able to do in a systematic way. "pip install thatproject" (and
equivalent commands for other tools) looks the same to PyPI regardless of
whether it's a human or a script running the command.

That's why Donald's recent download analysis was able to split it up by
tools, but not by purpose.

Now, exposing more of that analytical data to package owners on an ongoing
basis is an interesting idea, but one that would be a *very* long way down
the priority list for the current development team.

However, if someone else were to figure out a way to expose the data users
needed to do their own analysis, it might be possible to support that,
although it may be better to look at offering that through Warehouse (aka
PyPI.next) rather than the existing PyPI software (
https://github.com/dstufft/warehouse). There's a demo instance (using live
data) running at preview-pypi.python.org, but that's mostly focused on
backwards compatibility testing for the tool APIs at this point rather than
being navigable through a web browser.


> Dustin Oprea
> On Oct 25, 2013 1:49 PM, "Donald Stufft" <donald at stufft.io> wrote:
>> Mostly new packages will get roughly 2-3k of downloads from what appears
to be
>> mirroring infrastructure. I’m hesitant to mess with the traffic numbers
at all because
>> I don’t want them to be inaccurate *and* artificial vs just inaccurate
(assuming you
>> think it’s the number of people downloading your project).
>> On Oct 25, 2013, at 1:22 PM, Dustin Oprea <dustin at randomingenuity.com>
>>> It seems like the download counts on PyPI aren't accurate. Though the
really useful packages seem to have higher numbers than the packages that
only apply to a specific target audience, I'm fairly certain that the
numbers are more affected by robots and such than actual users.
>>> Recently I started a service that requires membership. In the last
month, PyPI reports 3000 downloads of the client, yet Google Analytics only
reports a handful of visits to the website. I have even less membership
signups (as expected, so soon after launch). Why are the download counts so
>>> What has to be done to get this to be accurate?
>>> I've included two screenshots of PyPI and GA.
>>> Dustin Oprea
>>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20131026/4a659f5e/attachment-0001.html>

More information about the Distutils-SIG mailing list