[Distutils] [Python-Dev] Python 3.x Adoption for PyPI and PyPI Download Numbers

Donald Stufft donald at stufft.io
Wed Apr 22 05:46:27 CEST 2015

> On Apr 21, 2015, at 11:35 PM, Gregory P. Smith <greg at krypto.org> wrote:
> On Tue, Apr 21, 2015 at 10:55 AM Donald Stufft <donald at stufft.io <mailto:donald at stufft.io>> wrote:
> Just thought I'd share this since it shows how what people are using to
> download things from PyPI have changed over the past year. Of particular
> interest to most people will be the final graphs showing what percentage of
> downloads from PyPI are for Python 3.x or 2.x.
> As always it's good to keep in mind, "Lies, Damn Lies, and Statistics". I've
> tried not to bias the results too much, but some bias is unavoidable. Of
> particular note is that a lot of these numbers come from pip, and as of version
> 6.0 of pip, pip will cache downloads by default. This would mean that older
> versions of pip are more likely to "inflate" the downloads than newer versions
> since they don't cache by default. In addition if a project has a file which
> is used for both 2.x and 3.x and they do a ``pip install`` on the 2.x version
> first then it will show up as counted under 2.x but not 3.x due to caching (and
> of course the inverse is true, if they install on 3.x first it won't show up
> on 2.x).
> Here's the link: https://caremad.io/2015/04/a-year-of-pypi-downloads/ <https://caremad.io/2015/04/a-year-of-pypi-downloads/>
> Anyways, I'll have access to the data set for another day or two before I
> shut down the (expensive) server that I have to use to crunch the numbers so if
> there's anything anyone else wants to see before I shut it down, speak up soon.
> Thanks!
> I like your focus on particular packages of note such as django and requests.
> How do CDNs influence these "lies"?  I thought the download counts on PyPI were effectively meaningless due to CDN mirrors fetching and hosting things?
> Do we have user-agent logs from all PyPI package CDN mirrors or just from the master?
> -gps

We took the download counts offline for awhile because of the CDN, however within a month or two (now almost two years ago) they enabled logs on our account to bring them back. So these numbers are from the CDN edge and they reflect the “true” traffic. I say “true” because although we have logs, logging isn’t considered an essential service so in times of problems logging can be reduced or disabled completely (you can see in the data set some weeks had a massive drop, this was due to missing a day or two of logs).

That being said though, ontop of the Fastly provided CDN, there is also the ability to mirror PyPI (which shows up as bandersnatch or others in the logs) and if someone is installing from a mirror we don’t see that data at all. On top of that, all versions of pip prior to 6.0 had an opt in download cache which would mean that, on an opt in basis, we wouldn’t see downloads for those people and since 6.0 there is now an opt-out cache.

Specifically to the mirror network itself, that represents about 20% of the total traffic on PyPI, however we can determine when it was a mirror and those downloads show up as “Unknown” in other charts since it’s a mirror client we don’t know what the final target environment will be.

This might mean that future snapshots will look at API accesses instead, or perhaps we try to implement some sort of optional popcon or maybe we continue to look at package installs and we just interpret the data with the knowledge that these things are at play.

Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150421/98095143/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150421/98095143/attachment-0001.sig>

More information about the Distutils-SIG mailing list