[Distutils] PyPI Download Counts

Noah Kantrowitz noah at coderanger.net
Mon May 27 10:55:04 CEST 2013


On May 27, 2013, at 12:27 AM, holger krekel wrote:

> Hi Donald,
> 
> On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:
>> Hello!
>> 
>> As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.
>> 
>> There are numerous reasons for their removal/deprecation some of which are:
>>    - Technically hard to make work with the new CDN
>>        - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access
> 
> What would be involved money/effort wise to get such access?
> 
>>        - The work around for not having log access would greatly reduce the utility of the CDN
>>    - Highly inaccurate
>>        - A number of things prevent the download counts from being inaccurate, some of which include:
>>            - pip download cache
>>            - Internal or unofficial mirrors
>>            - Packages not hosted on PyPI (for comparisons sake)
>>            - Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script).
> 
> given the CDN usage of mirrors may drop soon.
> 
>>    - Not particularly useful
>>        - Just because a project has been downloaded a lot doesn't mean it's good
>>        - Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad
>> 
>> In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources.
>> 
>> The API will continue to return values for it in order to not break scripts, however in the future all these values will be set to 0. The Web UI has been modified to no longer display it.
> 
> While download counts do have the weeknesses you describe they also
> provide a rough indication of usage which many of us referred to.
> I used it to determine interest and it partly drove my development
> efforts.  From that angle i am not happy about the change but of course
> i see the benefits.
> 
> Not having download counts maybe lets us think harder about
> better metrics.  The number of projects using a package as a dep
> might be one.

We do still get some indication of package activity from looking through the logs, it just no longer has a direct correlation. We will see one request hit the backend servers from each shield node per hour when that package is being requested. At some point we could recycle this into some kind of abstract popularity count, but I don't think thats a development priority for anyone right now.

--Noah

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130527/58929efd/attachment-0001.pgp>


More information about the Distutils-SIG mailing list