[Distutils] PyPI Download Counts

Alex Clark aclark at aclark.net
Thu May 30 13:53:53 CEST 2013


Hi,

anatoly techtonik <techtonik <at> gmail.com> writes:

> 
> On Mon, May 27, 2013 at 3:08 AM, Donald Stufft <donald <at> stufft.io> wrote:
> 
> Hello!
> 
> 
> As you have have noticed the download counts on PyPI are no longer
updating. Originally this was due to an issue with the script that processes
these download counts. However I have now removed the download counts from
the PyPI webui and their use via the API is considered deprecated.
> 
> 
> 
> This was the only motivation to continue supporting my packages. :(
> Of course it was an illusion that these were useful to someone, but it was
so sweet.


+1-ish


> 
> 
>  
> 
> There are numerous reasons for their removal/deprecation some of which are:
> 
>     - Technically hard to make work with the new CDN
>         - The CDN is being donated to the PSF, and the donated tier does
not offer any form of log access
>         - The work around for not having log access would greatly reduce
the utility of the CDN
> 
> 
> 
> I don't believe that CDN clients don't want access to download stats - it
is an essential feature for measuring performance and rates of any download
service. Who is this provider who doesn't support them?
> 
>  
> 
>     - Highly inaccurate
> 
>         - A number of things prevent the download counts from being
inaccurate, some of which include:
>             - pip download cache
>             - Internal or unofficial mirrors
>             - Packages not hosted on PyPI (for comparisons sake)
> 
>             - Mirrors or unofficial grab scripts causing inflated counts
(Last I looked 25% of the downloads were from a known mirroring script).
> 
> 
> For less popular packages these factors are not that important. Also the
exact count of human downloads is rarely interesting. Also everybody
realizes there is no guarantee that download rate is not inflated. And still
these counts provide good enough overview of relative package popularity.
> 
> 
> I wouldn't say that counts are highly inaccurate. For relative comparisons
they are sane enough.
> 
> Having inaccurate stats is much better than not having stats at all.
Exposing download counts with a note about their accuracy will increase
chances that people will be interested to work on improving the stats.
> 
>  
> 
> 
>     - Not particularly useful
> 
>         - Just because a project has been downloaded a lot doesn't mean
it's good
>         - Similarly just because a project hasn't been downloaded a lot
doesn't mean it's bad
> 
> 
> 
> 
> How about download count for a package released 7 years ago? The download
count proves it is useful.


Well for now at least we have the history. You can use `vanity` [1] to get
those stats... and graph them against zero, for today ;-)


>  
> 
> 
> In short because it's value is low for various reasons, and the tradeoffs
required to make it work are high It has been not an effective use of resources.
> 
> 
> 
> What are the tradeoffs? I'd like to preserve counts - that's why I got here.


I think there is a large enough group of folks that have chimed in here
already to indicate that losing download stats is not entirely acceptable,
but certainly a reasonable trade off when someone volunteers to do a lot of
important and necessary hard work to improve our overall infrastructure :-).

I'm sure we'll be able to re-add them at some point, but personally I'm
going to let the CDN dust settle, and be thankful these folks are
volunteering their time to do all the work (Donald, Noah, et al, thank you!)


Alex






> 
> 
> 
> 
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG <at> python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
> 


---

Alex Clark * http://about.me/alex.clark






More information about the Distutils-SIG mailing list