On 26 Oct 2013 04:51, "Dustin Oprea" <myselfasunder@gmail.com> wrote:
>
> Is there any way that we can use the user-agent to either identify users or identify mirrors?
>
> Can we pass a flag or signature from "pip"? It won't reflect downloads from website, but this probably won't affect the numbers much. In this case, we might just reword it to "pip Downloads".
>
> This is a distressing issue. It doesn't seem like package owners have any usable usage data.

Most downloads happen through the Fastly CDN - the numbers are derived from the Fastly logs rather than being direct. The code that does that log analysis is in https://bitbucket.org/pypa/pypi/src (Donald would be able to provide a more direct reference to the relevant source).

However, separating downloads between mirroring, automatic deployments and integration and actual direct downloads isn't something PyPI has ever done, or is really able to do in a systematic way. "pip install thatproject" (and equivalent commands for other tools) looks the same to PyPI regardless of whether it's a human or a script running the command.

That's why Donald's recent download analysis was able to split it up by tools, but not by purpose.

Now, exposing more of that analytical data to package owners on an ongoing basis is an interesting idea, but one that would be a *very* long way down the priority list for the current development team.

However, if someone else were to figure out a way to expose the data users needed to do their own analysis, it might be possible to support that, although it may be better to look at offering that through Warehouse (aka PyPI.next) rather than the existing PyPI software (https://github.com/dstufft/warehouse). There's a demo instance (using live data) running at preview-pypi.python.org, but that's mostly focused on backwards compatibility testing for the tool APIs at this point rather than being navigable through a web browser.

Cheers,
Nick.

>
> Dustin Oprea
>
> On Oct 25, 2013 1:49 PM, "Donald Stufft" <donald@stufft.io> wrote:
>>
>> Mostly new packages will get roughly 2-3k of downloads from what appears to be
>> mirroring infrastructure. Iím hesitant to mess with the traffic numbers at all because
>> I donít want them to be inaccurate *and* artificial vs just inaccurate (assuming you
>> think itís the number of people downloading your project).
>>
>> On Oct 25, 2013, at 1:22 PM, Dustin Oprea <dustin@randomingenuity.com> wrote:
>>
>>> It seems like the download counts on PyPI aren't accurate. Though the really useful packages seem to have higher numbers than the packages that only apply to a specific target audience, I'm fairly certain that the numbers are more affected by robots and such than actual users.
>>>
>>> Recently I started a service that requires membership. In the last month, PyPI reports 3000 downloads of the client, yet Google Analytics only reports a handful of visits to the website. I have even less membership signups (as expected, so soon after launch). Why are the download counts so inflated?
>>>
>>> What has to be done to get this to be accurate?
>>>
>>> I've included two screenshots of PyPI and GA.
>>>
>>>
>>>
>>> Dustin Oprea
>>> <Selection_001.png><Selection_002.png>_______________________________________________
>>> Distutils-SIG maillist †- †Distutils-SIG@python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>>
>>
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B†7C5D 5E2B 6356 A926 F04F 6E3C†BCE9 3372 DCFA
>>
>>
>> _______________________________________________
>> Distutils-SIG maillist †- †Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>
> _______________________________________________
> Distutils-SIG maillist †- †Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>