Mailman 3 PyPI Download Counts - Distutils-SIG

PyPI Download Counts

Donald Stufft

27 May 2013 27 May '13

12:08 a.m.

Hello! As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated. There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access - The work around for not having log access would greatly reduce the utility of the CDN - Highly inaccurate - A number of things prevent the download counts from being inaccurate, some of which include: - pip download cache - Internal or unofficial mirrors - Packages not hosted on PyPI (for comparisons sake) - Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script). - Not particularly useful - Just because a project has been downloaded a lot doesn't mean it's good - Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources. The API will continue to return values for it in order to not break scripts, however in the future all these values will be set to 0. The Web UI has been modified to no longer display it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Attachments:

signature.asc (application/pgp-signature — 841 bytes)
attachment.html (text/html — 3.5 KB)

Show replies by date

holger krekel

27 May 27 May

7:27 a.m.

Hi Donald, On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:

...

Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access

What would be involved money/effort wise to get such access?

...

- The work around for not having log access would greatly reduce the utility of the CDN - Highly inaccurate - A number of things prevent the download counts from being inaccurate, some of which include: - pip download cache - Internal or unofficial mirrors - Packages not hosted on PyPI (for comparisons sake) - Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script).

given the CDN usage of mirrors may drop soon.

...

- Not particularly useful - Just because a project has been downloaded a lot doesn't mean it's good - Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad

In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources.

The API will continue to return values for it in order to not break scripts, however in the future all these values will be set to 0. The Web UI has been modified to no longer display it.

While download counts do have the weeknesses you describe they also provide a rough indication of usage which many of us referred to. I used it to determine interest and it partly drove my development efforts. From that angle i am not happy about the change but of course i see the benefits. Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one. cheers, holger

...

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Nick Coghlan

7:41 a.m.

On Mon, May 27, 2013 at 5:27 PM, holger krekel <holger@merlinux.eu> wrote:

...

Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one.

With the current downside being that it's hard for PyPI to figure out that number, too :) Agreed it would be a good number to publish once it's more readily available, too. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

holger krekel

7:59 a.m.

On Mon, May 27, 2013 at 17:41 +1000, Nick Coghlan wrote:

...

On Mon, May 27, 2013 at 5:27 PM, holger krekel <holger@merlinux.eu> wrote:

...
Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one.

With the current downside being that it's hard for PyPI to figure out that number, too :)

Yip. But something like Vinaj's red-dove approach or Marius' get_deps.py could provide a base. We might think about a docker instance which could allow to quickly spawn new light VMs so we can isolate setup.py runs. (Yes, it's only Linux but it'd be a start).

...

Agreed it would be a good number to publish once it's more readily available, too.

I think "dep" numbers are mostly interesting for libraries, not so much for applications like django or pyramid or tools like nose/pytest. Another more practical data point would be "does this package even install on win32/linux/osx py26/py27/py33" and even better, do its automated tests pass? If we could evolve to have this info published on pypi.python.org it would be quite useful i think. I am actually currently implementing a system which enables this (the "devpi" system) so i don't mean this all just as "nice to have" theory. I aim to present the status of this work at EuroPython. best, holger

...

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Florian Friesdorf

8:36 a.m.

Hi Holger, holger krekel <holger@merlinux.eu> writes:

...

On Mon, May 27, 2013 at 17:41 +1000, Nick Coghlan wrote:

...
On Mon, May 27, 2013 at 5:27 PM, holger krekel <holger@merlinux.eu> wrote:

...
Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one.

With the current downside being that it's hard for PyPI to figure out that number, too :)

Yip. But something like Vinaj's red-dove approach or Marius' get_deps.py could provide a base. We might think about a docker instance which could allow to quickly spawn new light VMs so we can isolate setup.py runs. (Yes, it's only Linux but it'd be a start).

nix and nixpkgs allow this isolation on-top off linux, freebsd, OS X and theoretically also cygwin (not sure how good cygwin is supported at the moment). http://nixos.org/nix/ http://nixos.org/nixpkgs/ From nixos.org: Nix is a purely functional package manager. This means that it can ensure that an upgrade to one package cannot break others, that you can always roll back to previous version, that multiple versions of a package can coexist on the same system, and much more. Nixpkgs is a large collection of packages that can be installed with the Nix package manager.

...

...
Agreed it would be a good number to publish once it's more readily available, too.

I think "dep" numbers are mostly interesting for libraries, not so much for applications like django or pyramid or tools like nose/pytest.

Another more practical data point would be "does this package even install on win32/linux/osx py26/py27/py33" and even better, do its automated tests pass?

http://hydra.nixos.org/build/5062796

...

If we could evolve to have this info published on pypi.python.org it would be quite useful i think. I am actually currently implementing a system which enables this (the "devpi" system) so i don't mean this all just as "nice to have" theory. I aim to present the status of this work at EuroPython.

Nice! Looking forward to that. If you have any questions about nix/nixpkgs/nixos, especially about the way python packages are packaged, please let me know. Also, it's not set in stone. Personally, I'd love to see hydra.python.org providing builds of all pypi packages and would be happy to help. Also including Domen and Rok for whome I assume the same. You might have other tools that are better suited for you. regards florian -- Florian Friesdorf <flo@chaoflow.net> GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: flo@chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC

holger krekel

3:46 p.m.

Hi Florian, On Mon, May 27, 2013 at 10:36 +0200, Florian Friesdorf wrote:

...

Hi Holger,

holger krekel <holger@merlinux.eu> writes:

...
On Mon, May 27, 2013 at 17:41 +1000, Nick Coghlan wrote:

...
On Mon, May 27, 2013 at 5:27 PM, holger krekel <holger@merlinux.eu> wrote:

...
Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one.

With the current downside being that it's hard for PyPI to figure out that number, too :)

Yip. But something like Vinaj's red-dove approach or Marius' get_deps.py could provide a base. We might think about a docker instance which could allow to quickly spawn new light VMs so we can isolate setup.py runs. (Yes, it's only Linux but it'd be a start).

nix and nixpkgs allow this isolation on-top off linux, freebsd, OS X and theoretically also cygwin (not sure how good cygwin is supported at the moment).

http://nixos.org/nix/ http://nixos.org/nixpkgs/

From nixos.org: Nix is a purely functional package manager. This means that it can ensure that an upgrade to one package cannot break others, that you can always roll back to previous version, that multiple versions of a package can coexist on the same system, and much more.

Nixpkgs is a large collection of packages that can be installed with the Nix package manager.

Interesting stuff, didn't know about it. Did you post this as a suggestion for provisioning an environment to run setup.py (on nix-supported platforms)? If so, i am not sure how it would help exactly. I guess myself i'd aim for a 80% solution for discovering dependencies first. Simplest/quickest wins there :)

...

...
...
Agreed it would be a good number to publish once it's more readily available, too.

I think "dep" numbers are mostly interesting for libraries, not so much for applications like django or pyramid or tools like nose/pytest.

Another more practical data point would be "does this package even install on win32/linux/osx py26/py27/py33" and even better, do its automated tests pass?

http://hydra.nixos.org/build/5062796

...
If we could evolve to have this info published on pypi.python.org it would be quite useful i think. I am actually currently implementing a system which enables this (the "devpi" system) so i don't mean this all just as "nice to have" theory. I aim to present the status of this work at EuroPython.

Nice! Looking forward to that.

If you have any questions about nix/nixpkgs/nixos, especially about the way python packages are packaged, please let me know. Also, it's not set in stone.

are you going to be at EP? It's a long conference and i am more than happy to sit together on this topic for a bit sometimes. best, holger

...

Personally, I'd love to see hydra.python.org providing builds of all pypi packages and would be happy to help. Also including Domen and Rok for whome I assume the same.

...

You might have other tools that are better suited for you.

regards florian -- Florian Friesdorf <flo@chaoflow.net> GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: flo@chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC

Domen Kožar

4:04 p.m.

I'll also be at EP and can help to explain how Nix could solve the isolation problem. On Mon, May 27, 2013 at 5:46 PM, holger krekel <holger@merlinux.eu> wrote:

...

Hi Florian,

On Mon, May 27, 2013 at 10:36 +0200, Florian Friesdorf wrote:

...
Hi Holger,

holger krekel <holger@merlinux.eu> writes:

...
On Mon, May 27, 2013 at 17:41 +1000, Nick Coghlan wrote:

...
On Mon, May 27, 2013 at 5:27 PM, holger krekel <holger@merlinux.eu> wrote:

...
Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one.

With the current downside being that it's hard for PyPI to figure out that number, too :)

Yip. But something like Vinaj's red-dove approach or Marius' get_deps.py could provide a base. We might think about a docker instance which could allow to quickly spawn new light VMs so we can isolate setup.py runs. (Yes, it's only Linux but it'd be a start).

nix and nixpkgs allow this isolation on-top off linux, freebsd, OS X and theoretically also cygwin (not sure how good cygwin is supported at the moment).

http://nixos.org/nix/ http://nixos.org/nixpkgs/

From nixos.org: Nix is a purely functional package manager. This means that it can ensure that an upgrade to one package cannot break others, that you can always roll back to previous version, that multiple versions of a package can coexist on the same system, and much more.

Nixpkgs is a large collection of packages that can be installed with the Nix package manager.

Interesting stuff, didn't know about it.

Did you post this as a suggestion for provisioning an environment to run setup.py (on nix-supported platforms)? If so, i am not sure how it would help exactly. I guess myself i'd aim for a 80% solution for discovering dependencies first. Simplest/quickest wins there :)

...
...
...
Agreed it would be a good number to publish once it's more readily available, too.

I think "dep" numbers are mostly interesting for libraries, not so much for applications like django or pyramid or tools like nose/pytest.

Another more practical data point would be "does this package even install on win32/linux/osx py26/py27/py33" and even better, do its automated tests pass?

http://hydra.nixos.org/build/5062796

...
If we could evolve to have this info published on pypi.python.org it would be quite useful i think. I am actually currently implementing a system which enables this (the "devpi" system) so i don't mean this all just as "nice to have" theory. I aim to present the status of this work at EuroPython.

Nice! Looking forward to that.

If you have any questions about nix/nixpkgs/nixos, especially about the way python packages are packaged, please let me know. Also, it's not set in stone.

are you going to be at EP? It's a long conference and i am more than happy to sit together on this topic for a bit sometimes.

best, holger

...
Personally, I'd love to see hydra.python.org providing builds of all pypi packages and would be happy to help. Also including Domen and Rok for whome I assume the same.

...
You might have other tools that are better suited for you.

regards florian -- Florian Friesdorf <flo@chaoflow.net> GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: flo@chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC

Lennart Regebro

11:39 a.m.

On Mon, May 27, 2013 at 9:59 AM, holger krekel <holger@merlinux.eu> wrote:

...

Another more practical data point would be "does this package even install on win32/linux/osx py26/py27/py33" and even better, do its automated tests pass?

Those are interesting metrics, but doesn't indicate the packages popularity at all. //Lennart

holger krekel

1:34 p.m.

On Mon, May 27, 2013 at 13:39 +0200, Lennart Regebro wrote:

...

On Mon, May 27, 2013 at 9:59 AM, holger krekel <holger@merlinux.eu> wrote:

...
Another more practical data point would be "does this package even install on win32/linux/osx py26/py27/py33" and even better, do its automated tests pass?

Those are interesting metrics, but doesn't indicate the packages popularity at all.

right. And to be clear, I wish we would get back download counters even if they are just rough indicators. For me this is a negative aspect of the CDN move as i was using a fully functional mirroring cache -- which currently is broken due to the very move. holger

Noah Kantrowitz

8:55 a.m.

On May 27, 2013, at 12:27 AM, holger krekel wrote:

...

Hi Donald,

On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:

...
Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access

What would be involved money/effort wise to get such access?

...
- The work around for not having log access would greatly reduce the utility of the CDN - Highly inaccurate - A number of things prevent the download counts from being inaccurate, some of which include: - pip download cache - Internal or unofficial mirrors - Packages not hosted on PyPI (for comparisons sake) - Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script).

given the CDN usage of mirrors may drop soon.

...
- Not particularly useful - Just because a project has been downloaded a lot doesn't mean it's good - Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad

In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources.

The API will continue to return values for it in order to not break scripts, however in the future all these values will be set to 0. The Web UI has been modified to no longer display it.

While download counts do have the weeknesses you describe they also provide a rough indication of usage which many of us referred to. I used it to determine interest and it partly drove my development efforts. From that angle i am not happy about the change but of course i see the benefits.

Not having download counts maybe lets us think harder about better metrics. The number of projects using a package as a dep might be one.

We do still get some indication of package activity from looking through the logs, it just no longer has a direct correlation. We will see one request hit the backend servers from each shield node per hour when that package is being requested. At some point we could recycle this into some kind of abstract popularity count, but I don't think thats a development priority for anyone right now. --Noah

Jim Fulton

30 May 30 May

12:05 p.m.

On Mon, May 27, 2013 at 3:27 AM, holger krekel <holger@merlinux.eu> wrote:

...

Hi Donald,

On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:

...
Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access

What would be involved money/effort wise to get such access?

I didn't see an answer to this. Any idea how much this would cost? With access to logs, we could compute download counts. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton

Donald Stufft

3:56 p.m.

On May 30, 2013, at 8:05 AM, Jim Fulton <jim@zope.com> wrote:

...

On Mon, May 27, 2013 at 3:27 AM, holger krekel <holger@merlinux.eu> wrote:

...
Hi Donald,

On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:

...
Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access

What would be involved money/effort wise to get such access?

I didn't see an answer to this.

Any idea how much this would cost? With access to logs, we could compute download counts.

Jim

-- Jim Fulton http://www.linkedin.com/in/jimfulton

Fastly has given us access to their streaming log support. Infrastructure will be setting up a secure method for receiving these logs at which point PyPI can use them. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Lennart Regebro

4:35 p.m.

On Thu, May 30, 2013 at 5:56 PM, Donald Stufft <donald@stufft.io> wrote:

...

Fastly has given us access to their streaming log support.

Infrastructure will be setting up a secure method for receiving these logs at which point PyPI can use them.

Excellent! Awesomeness unfolds like a FantomenK beat! //Lennart

anatoly techtonik

8 Jun 8 Jun

10:57 p.m.

On Thu, May 30, 2013 at 6:56 PM, Donald Stufft <donald@stufft.io> wrote:

...

On May 30, 2013, at 8:05 AM, Jim Fulton <jim@zope.com> wrote:

On Mon, May 27, 2013 at 3:27 AM, holger krekel <holger@merlinux.eu> wrote:

Hi Donald,

On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:

Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access

What would be involved money/effort wise to get such access?

I didn't see an answer to this.

Any idea how much this would cost? With access to logs, we could compute download counts.

Jim

-- Jim Fulton http://www.linkedin.com/in/jimfulton

Fastly has given us access to their streaming log support.

That's another deal. This is just what is expected in exchange for the "Real-time CDN by Fastly" phrase at the bottom of the page to make it reflect the reality. I'd say that not getting access to download stats from the start is a fail on the side of PSF or whoever involved in coordination with Fastly. There are many CDN providers out there and I suspect so far only Fastly was contacted. The primary responsibility of that coordinator is correctly sending the message for CDN provider that PyPI is a public exhibition of their service quality, and not a tax exemption for charity. I'd say Fastly should be interested to help with making us download stats exposed in a convenient API friendly way, because real-time stats is the key feature of their marketing advantage as I see it. Infrastructure will be setting up a secure method for receiving these logs

...

at which point PyPI can use them.

Is it done already? I am free to some degree to help with that. What is the current problem to tackle on?

Alex Clark

9 Jun 9 Jun

12:53 p.m.

anatoly techtonik <techtonik <at> gmail.com> writes:

...

I'd say that not getting access to download stats from the start is a fail

on the side of PSF or whoever involved in coordination with Fastly. There are many CDN providers out there and I suspect so far only Fastly was contacted. The primary responsibility of that coordinator is correctly sending the message for CDN provider that PyPI is a public exhibition of their service quality, and not a tax exemption for charity. I'd say Fastly should be interested to help with making us download stats exposed in a convenient API friendly way, because real-time stats is the key feature of their marketing advantage as I see it.

...

Infrastructure will be setting up a secure method for receiving these logs

at which point PyPI can use them.

...

Is it done already? I am free to some degree to help with that. What is

the current problem to tackle on? +1 <small_rant>But: this happens all the time in open source communities, even mature ones. Someone volunteers to perform some complex task and hand-waves-away some requirement they don't personally feel is important under the "too hard" or "I don't have time for that, I'm doing this" umbrella. Someone else comes along and says "wait, what happened to XXX?" at which point others join in to emphasize the importance of XXX. At this point, it's more clear whether or not XXX is actually important. But during this time: XXX is broken and many people on both sides are unhappy. GOLDEN RULE: That's why even in open source, the "golden rule of OPS" is still important: PLEASE DO NOT BREAK SHIT. Even if you are fixing other more important functionality and don't think XXX is important. Or if XXX and the thing you want to do cannot co-exist, then make sure you get extensive buy-in from a large percentage of folks before you consider removing XXX. If for no other reason, then you will most certainly draw unwanted attention to yourself (which is especially frustrating in the context of "trying to do good things"). </small_rant> Anyway, I've seen this so many times I don't even get angry anymore. Because I know everyone involved has nothing but the best intentions. In this case, I'd like to see the download count functionality restored. (I spent a full year building a website whose almost-sole-functionality relies on download counts existing and working properly.) So, please call me when they are back.:-) And if I can help in anyway to make that happen, please let me know. Alex

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG <at> python.org http://mail.python.org/mailman/listinfo/distutils-sig

--- Alex Clark · http://about.me/alex.clark

Donald Stufft

6:41 p.m.

On Jun 9, 2013, at 8:53 AM, Alex Clark <aclark@aclark.net> wrote:

...

<small_rant>But: this happens all the time in open source communities, even mature ones. Someone volunteers to perform some complex task and hand-waves-away some requirement they don't personally feel is important under the "too hard" or "I don't have time for that, I'm doing this" umbrella.

Someone else comes along and says "wait, what happened to XXX?" at which point others join in to emphasize the importance of XXX. At this point, it's more clear whether or not XXX is actually important. But during this time: XXX is broken and many people on both sides are unhappy.

GOLDEN RULE: That's why even in open source, the "golden rule of OPS" is still important: PLEASE DO NOT BREAK SHIT. Even if you are fixing other more important functionality and don't think XXX is important. Or if XXX and the thing you want to do cannot co-exist, then make sure you get extensive buy-in from a large percentage of folks before you consider removing XXX. If for no other reason, then you will most certainly draw unwanted attention to yourself (which is especially frustrating in the context of "trying to do good things"). </small_rant>

Anyway, I've seen this so many times I don't even get angry anymore. Because I know everyone involved has nothing but the best intentions. In this case, I'd like to see the download count functionality restored. (I spent a full year building a website whose almost-sole-functionality relies on download counts existing and working properly.) So, please call me when they are back.:-) And if I can help in anyway to make that happen, please let me know.

PyPI's central purpose is to act as a repository and index of packages. Download counts are an auxiliary feature. Prior to the CDN there were large swathes of people, primarily in non US locations, who simply could not use PyPI because of bad routes between them and PyPI. Some folks were getting download speeds more reminiscent of dialup instead of Broadband. Moving to a CDN made PyPI reasonably function again for those people. So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from. And, thanks to Fastly generously giving us a access to streaming logs, will be bringing them back once the issues are sorted out. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Alex Clark

8:04 p.m.

Donald Stufft <donald <at> stufft.io> writes:

...

So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from.

FWIW: You missed the moral of the story: when you make a decision like this, someone will *always* disagree with you (even over the most trivial things). And even if they don't, they may disagree with your approach (e.g. why not sort problems with download counts before enabling the CDN) So the only way to make everyone happy is to consider everyone who will be affected by your actions, before you take action. (I'm not that concerned with this particular incident, just fascinated by open source community relations in general.) Alex

...

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________ Distutils-SIG maillist - Distutils-SIG <at> python.org http://mail.python.org/mailman/listinfo/distutils-sig

Noah Kantrowitz

8:08 p.m.

On Jun 9, 2013, at 1:04 PM, Alex Clark <aclark@aclark.net> wrote:

...

Donald Stufft <donald <at> stufft.io> writes:

...
So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from.

FWIW:

You missed the moral of the story: when you make a decision like this, someone will *always* disagree with you (even over the most trivial things). And even if they don't, they may disagree with your approach (e.g. why not sort problems with download counts before enabling the CDN) So the only way to make everyone happy is to consider everyone who will be affected by your actions, before you take action.

There is another way, make awesome and wait for history to determine who was happy :) --Noah

Alex Clark

8:57 p.m.

New subject: Open source community relations: Was: Re: PyPI Download Counts

Noah Kantrowitz <noah <at> coderanger.net> writes:

...

There is another way, make awesome and wait for history to determine who

was happy :) Tee hee hee! :-) OK this is my 3rd (and likely last) attempt to steer this thread away from the specific, and contemplate the broader implications. The PyPI CDN is awesome (seriously, I appreciate all the hard work and progress from Donald, Noah, Fastly, et al, as I've said recently in another thread). And I'm prepared to be the only one interested in discussing the broader implications, at which point I'll go OT and shut up. But FWIW I bother to comment here for a reason: to learn from this incident and apply that knowledge to future incidents, and to help others do the same. Alex

...

--Noah

_______________________________________________ Distutils-SIG maillist - Distutils-SIG <at> python.org http://mail.python.org/mailman/listinfo/distutils-sig

Nick Coghlan

10 Jun 10 Jun

1:35 a.m.

On 10 June 2013 06:04, Alex Clark <aclark@aclark.net> wrote:

...

Donald Stufft <donald <at> stufft.io> writes:

...
So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from.

FWIW:

You missed the moral of the story: when you make a decision like this, someone will *always* disagree with you (even over the most trivial things). And even if they don't, they may disagree with your approach (e.g. why not sort problems with download counts before enabling the CDN) So the only way to make everyone happy is to consider everyone who will be affected by your actions, before you take action.

And, indeed, we plan to run future changes of this magnitude through the PEP process for exactly this reason. We can't promise not to break some features in order to achieve gains we think are worth the loss, but we *can* promise not to break such features without advance warning and explicit consideration of alternatives that may allow us to avoid the breakage in the first place. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

holger krekel

7:34 a.m.

On Mon, Jun 10, 2013 at 11:35 +1000, Nick Coghlan wrote:

...

On 10 June 2013 06:04, Alex Clark <aclark@aclark.net> wrote:

...
Donald Stufft <donald <at> stufft.io> writes:

...
So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from.

FWIW:

You missed the moral of the story: when you make a decision like this, someone will *always* disagree with you (even over the most trivial things). And even if they don't, they may disagree with your approach (e.g. why not sort problems with download counts before enabling the CDN) So the only way to make everyone happy is to consider everyone who will be affected by your actions, before you take action.

And, indeed, we plan to run future changes of this magnitude through the PEP process for exactly this reason. We can't promise not to break some features in order to achieve gains we think are worth the loss, but we *can* promise not to break such features without advance warning and explicit consideration of alternatives that may allow us to avoid the breakage in the first place.

Nick, i welcome this intention but could you now and in the future be explicit about who "we" is? It's not even obvious to me and i have written a recent PEP in this area and am following most of the mails here for a while, also participating in some actions behind the scene. thanks, holger

Nick Coghlan

7:51 a.m.

On 10 June 2013 17:34, holger krekel <holger@merlinux.eu> wrote:

...

On Mon, Jun 10, 2013 at 11:35 +1000, Nick Coghlan wrote:

...
On 10 June 2013 06:04, Alex Clark <aclark@aclark.net> wrote:

...
Donald Stufft <donald <at> stufft.io> writes:

...
So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from.

FWIW:

You missed the moral of the story: when you make a decision like this, someone will *always* disagree with you (even over the most trivial things). And even if they don't, they may disagree with your approach (e.g. why not sort problems with download counts before enabling the CDN) So the only way to make everyone happy is to consider everyone who will be affected by your actions, before you take action.

And, indeed, we plan to run future changes of this magnitude through the PEP process for exactly this reason. We can't promise not to break some features in order to achieve gains we think are worth the loss, but we *can* promise not to break such features without advance warning and explicit consideration of alternatives that may allow us to avoid the breakage in the first place.

Nick, i welcome this intention but could you now and in the future be explicit about who "we" is? It's not even obvious to me and i have written a recent PEP in this area and am following most of the mails here for a while, also participating in some actions behind the scene.

In the specific case of PyPI, mainly Donald (as the one doing almost all the development work for the recent PyPI improvements), Richard (as the PyPI development lead) and Noah (as the python.org infrastructure lead). I'm less directly involved in PyPI changes (I'm still working mostly on the next draft of PEP 426 rather than anything more near term), but I'm still involved enough to want to say "we" rather than "they" :) The main issue that arose with the PyPI CDN change was that while it *was* discussed, the discussion happened on infrastructure-sig rather than here, so most PyPI client developers didn't hear about it until after the switch was already flipped. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

anatoly techtonik

14 Jun 14 Jun

1:59 p.m.

On Mon, Jun 10, 2013 at 10:51 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
On Mon, Jun 10, 2013 at 11:35 +1000, Nick Coghlan wrote:

...
On 10 June 2013 06:04, Alex Clark <aclark@aclark.net> wrote:

...
Donald Stufft <donald <at> stufft.io> writes:

...
So yes. I broke Download counts because they were not more important

...
...
...
...
people being able to actually use PyPI to install from.

FWIW:

You missed the moral of the story: when you make a decision like this, someone will *always* disagree with you (even over the most trivial

On 10 June 2013 17:34, holger krekel <holger@merlinux.eu> wrote: than things).

...
...
...
And even if they don't, they may disagree with your approach (e.g. why not sort problems with download counts before enabling the CDN) So the only way to make everyone happy is to consider everyone who will be affected by your actions, before you take action.

And, indeed, we plan to run future changes of this magnitude through the PEP process for exactly this reason. We can't promise not to break some features in order to achieve gains we think are worth the loss, but we *can* promise not to break such features without advance warning and explicit consideration of alternatives that may allow us to avoid the breakage in the first place.

And the necessary precondition for that is status.python.org - heartbeat indicator, development roadmap and a dashboard with current issues that needs response from the community.

...

In the specific case of PyPI, mainly Donald (as the one doing almost all the development work for the recent PyPI improvements), Richard (as the PyPI development lead) and Noah (as the python.org infrastructure lead). I'm less directly involved in PyPI changes (I'm still working mostly on the next draft of PEP 426 rather than anything more near term), but I'm still involved enough to want to say "we" rather than "they" :)

The main issue that arose with the PyPI CDN change was that while it *was* discussed, the discussion happened on infrastructure-sig rather than here, so most PyPI client developers didn't hear about it until after the switch was already flipped.

Right. You need to scrape several mailing lists and then explicitly monitor the threads that are interesting to be aware of the stuff that happening. Send thanks to people writing infrastructure reports - they are really helpful. status.python.org can contain such summaries prepared collaboratively for release using Etherpad. In this case you can expect more people to pop in. Also, I must add that the PSF blog and Python Insider blogs are not interesting. Sorry guys, but the stats don't lie. There should be a more technical and concise blog resource. Python community should be more inclusive for people without advanced English, and provide some tasty for them too. -- anatoly t.

Lennart Regebro

9 Jun 9 Jun

8:09 p.m.

On Sun, Jun 9, 2013 at 8:41 PM, Donald Stufft <donald@stufft.io> wrote:

...

So yes. I broke Download counts because they were not more important than people being able to actually use PyPI to install from.

I approve of this message. //Lennart

holger krekel

1:20 p.m.

On Sun, Jun 09, 2013 at 01:57 +0300, anatoly techtonik wrote:

...

On Thu, May 30, 2013 at 6:56 PM, Donald Stufft <donald@stufft.io> wrote:

...
On May 30, 2013, at 8:05 AM, Jim Fulton <jim@zope.com> wrote:

On Mon, May 27, 2013 at 3:27 AM, holger krekel <holger@merlinux.eu> wrote:

Hi Donald,

On Sun, May 26, 2013 at 20:08 -0400, Donald Stufft wrote:

Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access

What would be involved money/effort wise to get such access?

I didn't see an answer to this.

Any idea how much this would cost? With access to logs, we could compute download counts.

Jim

-- Jim Fulton http://www.linkedin.com/in/jimfulton

Fastly has given us access to their streaming log support.

That's another deal. This is just what is expected in exchange for the "Real-time CDN by Fastly" phrase at the bottom of the page to make it reflect the reality.

I'd say that not getting access to download stats from the start is a fail on the side of PSF or whoever involved in coordination with Fastly.

I also was surprised when download counts were gone but download stats aren't very reliable indicators, anyway. To begin with, one might configure a CI system to cache pypi packages, another might re-download all the time. Some popular packages are rather used from system distros, others downloaded from pypi. And there are many more issues. Seems like there are not overwhelmingly many people unhappy about its removal. Doesn't mean they shouldn't be reinstanted, just saying.

...

There are many CDN providers out there and I suspect so far only Fastly was contacted. The primary responsibility of that coordinator is correctly sending the message for CDN provider that PyPI is a public exhibition of their service quality, and not a tax exemption for charity. I'd say Fastly should be interested to help with making us download stats exposed in a convenient API friendly way, because real-time stats is the key feature of their marketing advantage as I see it.

FYI Fastly has been responding swiftly and responsibly so far, as far as i could follow it. And Donald has been helping people on various fronts (not only Fastly related) and been the main driver of the CDN move, thankfully. Things are starting to work more smoothly now and i guess some bumps in the road were unavoidable because they only show up in real life. cheers, holger

...

Infrastructure will be setting up a secure method for receiving these logs

...
at which point PyPI can use them.

Is it done already? I am free to some degree to help with that. What is the current problem to tackle on?

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Donald Stufft

6:33 p.m.

On Jun 9, 2013, at 9:20 AM, holger krekel <holger@merlinux.eu> wrote:

...

...
There are many CDN providers out there and I suspect so far only Fastly was contacted. The primary responsibility of that coordinator is correctly sending the message for CDN provider that PyPI is a public exhibition of their service quality, and not a tax exemption for charity. I'd say Fastly should be interested to help with making us download stats exposed in a convenient API friendly way, because real-time stats is the key feature of their marketing advantage as I see it.

FYI Fastly has been responding swiftly and responsibly so far, as far as i could follow it. And Donald has been helping people on various fronts (not only Fastly related) and been the main driver of the CDN move, thankfully. Things are starting to work more smoothly now and i guess some bumps in the road were unavoidable because they only show up in real life.

Fastly has been a dream to work with. They've been fast at fixing and diagnosing issues, have helped tune the config to get a higher hit rate, and when they heard that people were upset that download counts had to be turned off they offered the logging support to be turned on for our account. The infrastructure is setup to receive the logs (and intact was receiving logs for a day or so) but upgrades to the VM that runs PyPI needs to occur before we can continue receiving them. The disk drive that PyPI has is to small to handle the volume of request data that is coming in from Fastly and it quickly filled up in under 24 hours. Upgrading that space requires powering off the VM so we (The Infra team) are working on doing that, ideally without downtime on PyPI. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

anatoly techtonik

30 May 30 May

10:01 a.m.

On Mon, May 27, 2013 at 3:08 AM, Donald Stufft <donald@stufft.io> wrote:

...

Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

This was the only motivation to continue supporting my packages. :( Of course it was an illusion that these were useful to someone, but it was so sweet.

...

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access - The work around for not having log access would greatly reduce the utility of the CDN

I don't believe that CDN clients don't want access to download stats - it is an essential feature for measuring performance and rates of any download service. Who is this provider who doesn't support them?

...

- Highly inaccurate - A number of things prevent the download counts from being inaccurate, some of which include: - pip download cache - Internal or unofficial mirrors - Packages not hosted on PyPI (for comparisons sake) - Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script).

For less popular packages these factors are not that important. Also the exact count of human downloads is rarely interesting. Also everybody realizes there is no guarantee that download rate is not inflated. And still these counts provide good enough overview of relative package popularity. I wouldn't say that counts are highly inaccurate. For relative comparisons they are sane enough. Having inaccurate stats is much better than not having stats at all. Exposing download counts with a note about their accuracy will increase chances that people will be interested to work on improving the stats.

...

- Not particularly useful - Just because a project has been downloaded a lot doesn't mean it's good - Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad

How about download count for a package released 7 years ago? The download count proves it is useful.

...

In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources.

What are the tradeoffs? I'd like to preserve counts - that's why I got here.

Alex Clark

11:53 a.m.

Hi, anatoly techtonik <techtonik <at> gmail.com> writes:

...

On Mon, May 27, 2013 at 3:08 AM, Donald Stufft <donald <at> stufft.io> wrote:

Hello!

As you have have noticed the download counts on PyPI are no longer

updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

...

This was the only motivation to continue supporting my packages. :( Of course it was an illusion that these were useful to someone, but it was

so sweet. +1-ish

...

There are numerous reasons for their removal/deprecation some of which are:

- Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does

...

- The work around for not having log access would greatly reduce

not offer any form of log access the utility of the CDN

...

I don't believe that CDN clients don't want access to download stats - it

is an essential feature for measuring performance and rates of any download service. Who is this provider who doesn't support them?

...

- Highly inaccurate

- A number of things prevent the download counts from being

...

- pip download cache - Internal or unofficial mirrors - Packages not hosted on PyPI (for comparisons sake)

- Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script).

For less popular packages these factors are not that important. Also the exact count of human downloads is rarely interesting. Also everybody realizes there is no guarantee that download rate is not inflated. And still

inaccurate, some of which include: these counts provide good enough overview of relative package popularity.

...

I wouldn't say that counts are highly inaccurate. For relative comparisons

they are sane enough.

...

Having inaccurate stats is much better than not having stats at all.

Exposing download counts with a note about their accuracy will increase chances that people will be interested to work on improving the stats.

...

- Not particularly useful

- Just because a project has been downloaded a lot doesn't mean

it's good

...

- Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad

How about download count for a package released 7 years ago? The download count proves it is useful.

Well for now at least we have the history. You can use `vanity` [1] to get those stats... and graph them against zero, for today ;-)

...

In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources.

What are the tradeoffs? I'd like to preserve counts - that's why I got here.

I think there is a large enough group of folks that have chimed in here already to indicate that losing download stats is not entirely acceptable, but certainly a reasonable trade off when someone volunteers to do a lot of important and necessary hard work to improve our overall infrastructure :-). I'm sure we'll be able to re-add them at some point, but personally I'm going to let the CDN dust settle, and be thankful these folks are volunteering their time to do all the work (Donald, Noah, et al, thank you!) Alex

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG <at> python.org http://mail.python.org/mailman/listinfo/distutils-sig

--- Alex Clark * http://about.me/alex.clark

Brian Sutherland

12:54 p.m.

On Sun, May 26, 2013 at 08:08:35PM -0400, Donald Stufft wrote:

...

Hello!

As you have have noticed the download counts on PyPI are no longer updating. Originally this was due to an issue with the script that processes these download counts. However I have now removed the download counts from the PyPI webui and their use via the API is considered deprecated.

There are numerous reasons for their removal/deprecation some of which are: - Technically hard to make work with the new CDN - The CDN is being donated to the PSF, and the donated tier does not offer any form of log access - The work around for not having log access would greatly reduce the utility of the CDN - Highly inaccurate - A number of things prevent the download counts from being inaccurate, some of which include: - pip download cache - Internal or unofficial mirrors - Packages not hosted on PyPI (for comparisons sake) - Mirrors or unofficial grab scripts causing inflated counts (Last I looked 25% of the downloads were from a known mirroring script). - Not particularly useful - Just because a project has been downloaded a lot doesn't mean it's good - Similarly just because a project hasn't been downloaded a lot doesn't mean it's bad

In short because it's value is low for various reasons, and the tradeoffs required to make it work are high It has been not an effective use of resources.

The API will continue to return values for it in order to not break scripts, however in the future all these values will be set to 0. The Web UI has been modified to no longer display it.

There are other ways to get usage statistics. Debian has Popularity Contest (http://popcon.debian.org) statistics for python libraries. e.g. babel and twisted: http://qa.debian.org/popcon.php?package=python-babel http://qa.debian.org/popcon.php?package=twisted Of course, the these are probably still "Highly inaccurate", just in a different way;) -- Brian Sutherland

4183

Age (days ago)

4201

Last active (days ago)

List overview

Download

28 comments

11 participants

participants (11)

Alex Clark
anatoly techtonik
Brian Sutherland
Domen Kožar
Donald Stufft
Florian Friesdorf
holger krekel
Jim Fulton
Lennart Regebro
Nick Coghlan
Noah Kantrowitz

PyPI Download Counts

Florian Friesdorf

Domen Kožar

tags

participants (11)