On May 8, 2014, at 5:22 PM, Donald Stufft firstname.lastname@example.org wrote:
Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot?
Do you have any data to point to that says it hasn’t worked? Just to see what impact it has had, I’m running my scripts again that I ran a year ago to see what has changed, already I can see they are processing MUCH faster than last year.
The data has finished processing, it represents a time diff of approximately one year. The pip release that caused all of this was released about 4-5 months ago.
Overall PyPI has seen a 50% growth in installable projects in that time. If the change would have had no effect we'd expect to see a ~50% increase across the board. However what we've seen is a a 60% (+10% of expected) increase in projects that can only be installed from PyPI and a 12% decrease in projects that have any unsafe files (-62% of expected).
Further more we can see that if pip were to change the default of --allow-all-external it would take 23 projects from unable to be installed by default to able to be installed by default. This represents 0.2% of installable projects on PyPI. It would take an additional 40 projects and make one or more additional files able to be downloaded by default.
Some other data points:
Looking at these numbers I think it's safe to say that in this time period that the "hosting hygiene" of a PyPI project is more likely to be a better state than it was a year ago. We cannot state for a fact if this is because of this change or not, however given that the fallout is ~23 (or ~63) projects out of 38,835 I think it is incredibly reasonable to leave the defaults alone since there is a reasonably high chance that they played at least some part in that change.
I'd love to get these numbers to the point where the number of projects installable strictly from PyPI is 100% (or at least 100% installable safely), however 92% (or 92.2%) is getting pretty close to that and hopefully that number will just continue to grow until it hits 100%.
For reference, here's the raw numbers as well as some summary of the data here:
And the repository where the raw data as well as the scripts used to collect and process it is here:
linkcollector.py collections while linkwriter.py writes out the json file, and stats2.py processes and gives the numbers from the gist above. links.json is the data from a year ago, and 2014-05-08.links.json is the data from today.
Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA