pip: cdecimal an externally hosted file and may be unreliable [sic]
Just a warning, in case any of the new packaging team forgot to contact http://cve.mitre.org/ . Stefan Krah
Hi,
I don't understand your email. Can you please elaborate?
Victor
2014-05-06 23:35 GMT+02:00 Stefan Krah
Just a warning, in case any of the new packaging team forgot to contact http://cve.mitre.org/ .
Stefan Krah
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...
Victor Stinner
I don't understand your email. Can you please elaborate?
There is nothing wrong with the package. The remark is a joke provoked by a long history of a campaign [1] against external packages on distutils-sig. Many tools (like crate.io, when it was still up) have made derogatory remarks about external packages. Now the latest version of the officially sanctioned download tool (pip) spits out copious warnings, one of which is the subject of this thread. External packages are being singled out unfairly: 1) Anyone can upload any package to PyPI (i.e. the index is not curated at all). 2) Last time I looked, access credentials (via "lost login") were sent out in plaintext. 3) AFAIK people can upload a different (malicious) version of a package with *the exact same name*. 4) pip generally downloads the latest version, so a malicious person can provide a good package for several years until people trust him, then change to a trojaned version. 5) Looking at the list of certificates that is in my default cert store, I don't find SSL trustworthy at all. 6) D.J. Bernstein, who is somewhat security minded, has been shipping his software *for years* with just plain HTTP and published checksums. To sum it up: 1) Don't use pip to install packages directly from PyPI if security really matters. 2) The best security we currently get is either a) with package signatures (*if* you can get the author's key via a trustworthy channel, which is rarely the case). b) with decent checksums that are recorded on public mailing lists at the time the package is announced (it would be hard for an attacker to modify all mailing list archives.) Whether a package is internal or external is orthogonal to both points. With all these points, I find it questionable for an "official" install tool to make security related remarks about just one category of weaknesses. After all, people might be led to believe that pip is some sort of apt-get and all uploaded packages are safe. Stefan Krah [1] Note that the joke is quite innocent in comparison to what I've read on distutils-sig about the subject.
On May 8, 2014, at 8:12 AM, Stefan Krah
Victor Stinner
wrote: I don't understand your email. Can you please elaborate?
There is nothing wrong with the package. The remark is a joke provoked by a long history of a campaign [1] against external packages on distutils-sig.
Many tools (like crate.io, when it was still up) have made derogatory remarks about external packages. Now the latest version of the officially sanctioned download tool (pip) spits out copious warnings, one of which is the subject of this thread.
pip doesn't install externally hosted packaged by default because people should be aware of what servers they depend on. Installs that depend on externally hosted packages are brittle and more prone to failure. Every single external server adds *another* SPOF into any particular install set. Even if every external server has a 99.9% uptime, when you combine multiple of them the total uptime of any particular set of requirements drops quickly. This has been a major complaint that people have had over time. However they are not a security issue so they are easy to turn back on globally if a person wishes to make their own deployment more brittle (--allow-all-external). However in addition to externally hosted packages, there are also unverifiable packages. These are packages where there is no path to verify the download. Generally an unverifiable package is one that either is linked to directly but has no hash information encoded in the URL in a way that pip understands or, the more common case, it is a package that is linked from a page that is linked too on PyPI. Since we cannot build a path of trust to verify the package it is trivial for an attacker to MITM it and execute arbitrary Python code on the end users machine. This case *is* a security issue and pip goes out of it's way to *greatly* discourage depending on packages that require this kind of install.
External packages are being singled out unfairly:
1) Anyone can upload any package to PyPI (i.e. the index is not curated at all).
Sure, the threat model of PyPI isn't that you can install anything willy nilly and be safe. It's that when you ask for package X version Y, you should get what the *author* published and not what attacker who happens to be in a position to MITM you sends you.
2) Last time I looked, access credentials (via "lost login") were sent out in plaintext.
The existence of other security issues is not an excuse to not fix a security issue. There are other problems and we're slowly working on trying to clear them out.
3) AFAIK people can upload a different (malicious) version of a package with *the exact same name*.
Yes, a malicious author is needfully outside of the threat model for PyPI/pip.
4) pip generally downloads the latest version, so a malicious person can provide a good package for several years until people trust him, then change to a trojaned version.
Yes, a malicious author is needfully outside of the threat model for PyPI/pip.
5) Looking at the list of certificates that is in my default cert store, I don't find SSL trustworthy at all.
TLS may not be the best method of authenticating package downloads, however it is an easy one and it is significantly better than having no authentication at all. You move your attack surface from "anyone who has a privileged network position" to "anyone who has a privileged network position and also has the ability to generate erroneously signed CA certificates". That is a significantly smaller attack surface.
6) D.J. Bernstein, who is somewhat security minded, has been shipping his software *for years* with just plain HTTP and published checksums.
Argument from authority doesn't really hold up very well when DJB doesn't distribute is software in a way that is intended to be consumed mechanically. Also while he may be a crypto expert he is far from an expert on successfully distributing software, unless you also think that the signature checking in most OS provided package managers is pointless.
To sum it up:
1) Don't use pip to install packages directly from PyPI if security really matters.
“security” isn’t a useful term on it’s own. You have to define a threat model. A better answer here is don’t use pip to install packages directly from PyPI if a malicious author is within your threat model (modulo other issue we’re still working on, but as I said above, the existence of a second security issue isn’t a reason not to fix the first issue).
2) The best security we currently get is either
a) with package signatures (*if* you can get the author's key via a trustworthy channel, which is rarely the case).
b) with decent checksums that are recorded on public mailing lists at the time the package is announced (it would be hard for an attacker to modify all mailing list archives.)
Whether a package is internal or external is orthogonal to both points.
Again as I said above, the internal vs external isn’t a security related decision it is a “uptime” related decision. Verifiable vs unverifiable *is* a security related decision.
With all these points, I find it questionable for an "official" install tool to make security related remarks about just one category of weaknesses.
After all, people might be led to believe that pip is some sort of apt-get and all uploaded packages are safe.
Stefan Krah
[1] Note that the joke is quite innocent in comparison to what I've read on distutils-sig about the subject.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Well, to be fair and leaving aside uptime concerns and the general desire to always install packages from some server instead of a safe and trusted local directory (probably too obvious ;-), it would certainly be possible to add support for trusted externally hosted packages. However, for some reason there's a strong resistance against doing this, which I frankly don't understand. I agree with Stefan that the warning message wording is less than ideal. You'd normally call such blanket statements FUD, esp. since there are plenty external hosting services which are reliable and safe to use. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Thu, May 8, 2014 at 11:39 PM, M.-A. Lemburg
I agree with Stefan that the warning message wording is less than ideal. You'd normally call such blanket statements FUD, esp. since there are plenty external hosting services which are reliable and safe to use.
No, it's not FUD. Every external dependency adds another thing that can fail. I was just arguing this with one of my younger brothers this evening; he had some stuff stored in Google Docs, and he couldn't access it because of some outage. With a local git/hg repository, he would have had all his data right there, and even losing access to a central hub server just means you can't pull/push. Using someone else's server makes you depend on that server and everything in between. Obviously that's often a cost worth paying (hey, I'm posting this from Gmail, so I completely lose access to all these emails if it's inaccessible), but it's a legitimate concern, not FUD. (That doesn't stop the wording from being "less than ideal", though.) ChrisA
On 8 May 2014 23:39, M.-A. Lemburg
However, for some reason there's a strong resistance against doing this, which I frankly don't understand.
Because we're taking responsibility for the end-to-end user experience of PyPI, and are expressly trying to eliminate the elements of that user experience that are beyond the control of the PyPI admin team (even the question of "does this software actually work?" is in our sights if you consider a long enough time span). That's hard enough with just a couple of service providers (Fastly and Rackspace) in the mix - it quickly becomes impossible if every new dependency from an installation request adds a new point of failure. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 08.05.2014 15:57, Nick Coghlan wrote:
On 8 May 2014 23:39, M.-A. Lemburg
wrote: However, for some reason there's a strong resistance against doing this, which I frankly don't understand.
Because we're taking responsibility for the end-to-end user experience of PyPI, and are expressly trying to eliminate the elements of that user experience that are beyond the control of the PyPI admin team
Oh, I guess you'd have to rewrite most of those 40k packages then :-) Seriously, the word "eliminate" in there does not sit well with our goals for openness. External services like github, sourceforge, bitbucket, dropbox, cdns, etc. are not per-se evil and unreliable. pip should acknowledge this and not try to "eliminate" all hosting services in the world per default [sound of Empire Strikes Back theme] ;-)
(even the question of "does this software actually work?" is in our sights if you consider a long enough time span). That's hard enough with just a couple of service providers (Fastly and Rackspace) in the mix - it quickly becomes impossible if every new dependency from an installation request adds a new point of failure.
Like I said: the best option is to use a local directory which only contains packages files that you have inspected and actually trust :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On 9 May 2014 00:52, "M.-A. Lemburg"
On 08.05.2014 15:57, Nick Coghlan wrote:
(even the question of "does this software actually work?" is in our sights if you consider a long enough time span). That's hard enough with just a couple of service providers (Fastly and Rackspace) in the mix - it quickly becomes impossible if every new dependency from an installation request adds a new point of failure.
Like I said: the best option is to use a local directory which only contains packages files that you have inspected and actually trust :-)
Correct, but that raises the barrier to entry too high. The pip defaults are aimed at providing an experience with the fewest points of failure that is currently achievable, with a minimal learning curve. We still have a long way to go, but if people want to influence those design decisions, the relevant lists are pypa-dev (for pip specific discussions) and distutils-sig (for higher level cross-tool design decisions) Cheers, Nick.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, May 08 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On May 8, 2014, at 9:39 AM, M.-A. Lemburg
Well, to be fair and leaving aside uptime concerns and the general desire to always install packages from some server instead of a safe and trusted local directory (probably too obvious ;-), it would certainly be possible to add support for trusted externally hosted packages.
There is support for trusted externally hosted packages, you put the URL in PyPI and include a hash in the fragment like so: http://www.bytereef.org/software/mpdecimal/releases/cdecimal-2.3.tar.gz#md5=... The hash can be md5 or any of the sha-2 family. Now this does not mean that ``pip install cdecimal`` will automatically install this, because whether or not you're willing to install from servers other than PyPI[1] is a policy decision for the end user of pip. The only real contention point there is whether installing from other servers should be on or off by default. PEP438 selected off by default, and I agree with that decision. Installing externally hosted files, which are able to be safely downloaded[2], was a surprising behavior to *everyone* I've talked to who hadn't already discovered that pip/easy_install did that. For the people it wasn't surprising too, they said it was surprising when they had originally discovered it[3]. [1] To be specific, other than the configured index(es), which happens to default to PyPI. [2] For the definition of safe that PyPI/pip operate under, which is that the author of a package is assumed to be trusted by the person electing to download their package. [3] I suspect people who were around when PyPI *couldn't* host files and were only an index would be the exception to this.
However, for some reason there's a strong resistance against doing this, which I frankly don't understand.
I agree with Stefan that the warning message wording is less than ideal. You'd normally call such blanket statements FUD, esp. since there are plenty external hosting services which are reliable and safe to use.
I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here: cdecimal an externally hosted file and may be unreliable Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that I’m happy to have it and will gladly commit it myself to pip. [1] In my experience dealing with complaints of pip's users, one of their big ones was that some dependency they use was, typically unknown to them, hosted externally and they found out it was hosted externally because the server it was hosted on went down. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On May 8, 2014, at 9:58 AM, Donald Stufft
Now this does not mean that ``pip install cdecimal`` will automatically install this, because whether or not you're willing to install from servers other than PyPI[1] is a policy decision for the end user of pip.
I forgot to add, for externally hosted files that are able to be safely downloaded pip's users can either elect to use any/all of them via: $ pip install --allow-all-external cdecimal $ PIP_ALLOW_ALL_EXTERNAL=1 pip install cdecimal $ echo "--allow-all-external\ncdecimal" > requirements.txt $ pip install -r requirements.txt $ echo "[global]\nallow-allow-external=true" > ~/.pip/pip.conf $ pip install cdecimal They can also elect to allow externally hosted files for *only* cdecimal using: $ pip install --allow-external cdecimal decimal $ PIP_ALLOW_EXTERNAL=cdecimal pip install cdecimal $ echo "--allow-external decimal\ncdecimal" > requirements.txt $ pip install -r requirements.txt $ echo "[global]\nallow-external=decimal" > ~/.pip/pip.conf $ pip install cdecimal I may have the syntax on the non command flag options slightly wrong, those are auto generated for us in our option system and I don't personally use those options. Also the reason for the extra verbosity in ``--allow-external`` is so you can allow it on something you don't directly depend on, like: $ pip install --allow-external cdecimal foobar-which-depends-on-cdecimal ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, 08 May 2014 09:58:08 -0400, Donald Stufft
I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that I’m happy to have it and will gladly commit it myself to pip.
[1] In my experience dealing with complaints of pip's users, one of their big ones was that some dependency they use was, typically unknown to them, hosted externally and they found out it was hosted externally because the server it was hosted on went down.
"unreliable" reads as "not safe", ie: insecure. You probably want something like "and access to it may be unreliable". --David
On May 8, 2014, at 10:11 AM, R. David Murray
On Thu, 08 May 2014 09:58:08 -0400, Donald Stufft
wrote: I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that I’m happy to have it and will gladly commit it myself to pip.
[1] In my experience dealing with complaints of pip's users, one of their big ones was that some dependency they use was, typically unknown to them, hosted externally and they found out it was hosted externally because the server it was hosted on went down.
"unreliable" reads as "not safe", ie: insecure.
You probably want something like "and access to it may be unreliable".
--David
Done: https://github.com/pypa/pip/commit/69bf7067 ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, 08 May 2014 10:11:39 -0400, "R. David Murray"
On Thu, 08 May 2014 09:58:08 -0400, Donald Stufft
wrote: I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that Iâm happy to have it and will gladly commit it myself to pip.
[1] In my experience dealing with complaints of pip's users, one of their big ones was that some dependency they use was, typically unknown to them, hosted externally and they found out it was hosted externally because the server it was hosted on went down.
"unreliable" reads as "not safe", ie: insecure.
You probably want something like "and access to it may be unreliable".
Actually, thinking about this some more, *most* end-users aren't going to care that there's another point of failure here, they only care if it works or not when they try to install it. So something like "cdecimal is not hosted on pypi; download may fail if external server is unavailable" might be clearer. And once you're at that point, as a user I'm going to grumble, "Well, why the heck didn't you just try?", as I figure out how to re-execute the command so that it does try. --David
On Thu, 08 May 2014 10:21:34 -0400
"R. David Murray"
"unreliable" reads as "not safe", ie: insecure.
You probably want something like "and access to it may be unreliable".
Actually, thinking about this some more, *most* end-users aren't going to care that there's another point of failure here, they only care if it works or not when they try to install it. So something like "cdecimal is not hosted on pypi; download may fail if external server is unavailable" might be clearer.
And once you're at that point, as a user I'm going to grumble, "Well, why the heck didn't you just try?", as I figure out how to re-execute the command so that it does try.
Agreed. That warning looks rather pointless and only aimed at trying to enforce the pip developers' ideological preferences. Regards Antoine.
On May 8, 2014, at 10:31 AM, Antoine Pitrou
On Thu, 08 May 2014 10:21:34 -0400 "R. David Murray"
wrote: "unreliable" reads as "not safe", ie: insecure.
You probably want something like "and access to it may be unreliable".
Actually, thinking about this some more, *most* end-users aren't going to care that there's another point of failure here, they only care if it works or not when they try to install it. So something like "cdecimal is not hosted on pypi; download may fail if external server is unavailable" might be clearer.
And once you're at that point, as a user I'm going to grumble, "Well, why the heck didn't you just try?", as I figure out how to re-execute the command so that it does try.
Agreed. That warning looks rather pointless and only aimed at trying to enforce the pip developers' ideological preferences.
Regards
Antoine.
The pip developers didn’t make this decision. It was discussed on distutils-sig hammered out in a PEP, and then accepted. We took part in that discussion, but ultimately we implemented PEP438. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On May 8, 2014, at 10:21 AM, R. David Murray
On Thu, 08 May 2014 10:11:39 -0400, "R. David Murray"
wrote: On Thu, 08 May 2014 09:58:08 -0400, Donald Stufft
wrote: I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that Iâm happy to have it and will gladly commit it myself to pip.
[1] In my experience dealing with complaints of pip's users, one of their big ones was that some dependency they use was, typically unknown to them, hosted externally and they found out it was hosted externally because the server it was hosted on went down.
"unreliable" reads as "not safe", ie: insecure.
You probably want something like "and access to it may be unreliable".
Actually, thinking about this some more, *most* end-users aren't going to care that there's another point of failure here, they only care if it works or not when they try to install it. So something like "cdecimal is not hosted on pypi; download may fail if external server is unavailable" might be clearer.
And once you're at that point, as a user I'm going to grumble, "Well, why the heck didn't you just try?", as I figure out how to re-execute the command so that it does try.
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
Most users are not going to care up until the point where the external server is unavailable, and then they care a whole lot. On the tin it sounds reasonable to just download the external file if the server is up however we've done that for a long time and the end result has been end user pain. Now requiring someone to add a flag in order to download an externally hosted file is also end user pain. The difference between the two pains is when they happen. The requiring a flag pain happens at the point of decision, when you decide to make your deployment depend on something hosted externally. The default to allow pain happens sometime in the future, when you may or may not have any idea why suddenly your installs aren't working (and when you look, PyPI is up so you're really very confused why this particular file doesn't work). Even worse is the case when a project has some old files, but the newer versions aren't hosted and suddenly people are getting very old releases which is even more confusing to the end users. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, 08 May 2014 10:37:15 -0400, Donald Stufft
Most users are not going to care up until the point where the external server is unavailable, and then they care a whole lot. On the tin it sounds reasonable to just download the external file if the server is up however we've done that for a long time and the end result has been end user pain.
Now requiring someone to add a flag in order to download an externally hosted file is also end user pain. The difference between the two pains is when they happen. The requiring a flag pain happens at the point of decision, when you decide to make your deployment depend on something hosted externally. The default to allow pain happens sometime in the future, when you may or may not have any idea why suddenly your installs aren't working (and when you look, PyPI is up so you're really very confused why this particular file doesn't work). Even worse is the case when a project has some old files, but the newer versions aren't hosted and suddenly people are getting very old releases which is even more confusing to the end users.
Ah, I understand now. Your perspective is as someone who is using pip for *deployment*. I'm speaking from a python+plus+pip end-user perspective, which is going to be even more common now that it is part of the Python distribution. I'm not sure how you reconcile these two worlds. I would venture to suggest that if you are using it for deployment you really ought to be using a local package repository[*], not the global one; but, as someone observed, the sensible thing to do and what people actually do are often very different; and, since I haven't done this particular kind of deployment scenario in Python myself, there may be reasons I'm missing. However, your last mention of "end users" confuses me. Why are "end users" getting old packages in a deployment situation? Isn't it the developer/operations people (and the latter would, I assume, control the 'external packages' flag) who would be seeing that? Maybe you mean something by deployment different from how I use the word? --David [*] I found it *such* a pain to do this for perl/cpan. I have a project for a customer where I have to do this, and the hoops I had to jump through to get a reliable deployment (where packages wouldn't be unexpectedly upgraded under my feet) were nasty. (This was several years ago and I haven't revisited it, so maybe things have gotten better, or I just missed something.) I haven't had to do it for python yet, oddly enough, so I don't know how hard it is for Python.
On May 8, 2014, at 11:21 AM, R. David Murray
On Thu, 08 May 2014 10:37:15 -0400, Donald Stufft
wrote: Most users are not going to care up until the point where the external server is unavailable, and then they care a whole lot. On the tin it sounds reasonable to just download the external file if the server is up however we've done that for a long time and the end result has been end user pain.
Now requiring someone to add a flag in order to download an externally hosted file is also end user pain. The difference between the two pains is when they happen. The requiring a flag pain happens at the point of decision, when you decide to make your deployment depend on something hosted externally. The default to allow pain happens sometime in the future, when you may or may not have any idea why suddenly your installs aren't working (and when you look, PyPI is up so you're really very confused why this particular file doesn't work). Even worse is the case when a project has some old files, but the newer versions aren't hosted and suddenly people are getting very old releases which is even more confusing to the end users.
Ah, I understand now.
Your perspective is as someone who is using pip for *deployment*.
Deployment, or any kind of situation where you want to have a reproducible build. Generally via deployment yes.
I'm speaking from a python+plus+pip end-user perspective, which is going to be even more common now that it is part of the Python distribution.
I'm not sure how you reconcile these two worlds. I would venture to suggest that if you are using it for deployment you really ought to be using a local package repository[*], not the global one; but, as someone observed, the sensible thing to do and what people actually do are often very different; and, since I haven't done this particular kind of deployment scenario in Python myself, there may be reasons I'm missing.
People simply don’t do this[1]. Especially in a world with things like Heroku existing which makes it stupid simple to use pip to install from PyPI but installing from your own server requires standing up some infrastructure.
However, your last mention of "end users" confuses me. Why are "end users" getting old packages in a deployment situation? Isn't it the developer/operations people (and the latter would, I assume, control the 'external packages' flag) who would be seeing that? Maybe you mean something by deployment different from how I use the word?
Someone using pip, this may be a developer who is just trying to recreate their production environment locally, this may be someone using chef/puppet to automate installing via pip, this may be someone pushing to Heroku. The old versions thing is more that it’s really confusing when you type ``pip install foo`` on a monday and get 2.0, and ``pip install foo`` on weds and get 0.4.
--David
[*] I found it *such* a pain to do this for perl/cpan. I have a project for a customer where I have to do this, and the hoops I had to jump through to get a reliable deployment (where packages wouldn't be unexpectedly upgraded under my feet) were nasty. (This was several years ago and I haven't revisited it, so maybe things have gotten better, or I just missed something.)
I haven't had to do it for python yet, oddly enough, so I don't know how hard it is for Python.
For Python with pip you can use a requirements.txt file to create a set of dependencies that are pinned to exact versions like: foo==2.0 bar==2.3 And pip will (theoretically, our dep solving is real bad ATM) install exactly those versions from your index server. Generally this means PyPI which means the author can delete the version and upload a new file with the same version number. However it’s also trivial to stand up your own server. It can be as easy as pointing nginx/Apache at a static directory with autoindex = True. (See: https://wheels.caremad.io/). On top of that there is peep which adds a secure message digest on it to make sure that the author/index didn’t swap things out on you, and there is some discussion about how best to add that to pip itself.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, 08 May 2014 11:32:28 -0400, Donald Stufft
On May 8, 2014, at 11:21 AM, R. David Murray
wrote: Ah, I understand now.
Your perspective is as someone who is using pip for *deployment*.
Deployment, or any kind of situation where you want to have a reproducible build. Generally via deployment yes. [...] For Python with pip you can use a requirements.txt file to create a set of dependencies that are pinned to exact versions like:
foo==2.0 bar==2.3
And pip will (theoretically, our dep solving is real bad ATM) install exactly those versions from your index server. Generally this means PyPI which
OK, this makes sense, then. (I wish perl/cpan had something similar...maybe it does, but I couldn't find it at the time.) This still leaves the fact that there is a disconnect between the "needs" of two different audiences for PIP: people who deploy things, and everyone else who just uses pip to install stuff. The second group is going to overwhelm the first group, if it doesn't already. And I think that's all the comments I have on this issue :) --David
On May 8, 2014, at 12:42 PM, R. David Murray
On Thu, 08 May 2014 11:32:28 -0400, Donald Stufft
wrote: On May 8, 2014, at 11:21 AM, R. David Murray
wrote: Ah, I understand now.
Your perspective is as someone who is using pip for *deployment*.
Deployment, or any kind of situation where you want to have a reproducible build. Generally via deployment yes. [...] For Python with pip you can use a requirements.txt file to create a set of dependencies that are pinned to exact versions like:
foo==2.0 bar==2.3
And pip will (theoretically, our dep solving is real bad ATM) install exactly those versions from your index server. Generally this means PyPI which
OK, this makes sense, then. (I wish perl/cpan had something similar...maybe it does, but I couldn't find it at the time.)
This still leaves the fact that there is a disconnect between the "needs" of two different audiences for PIP: people who deploy things, and everyone else who just uses pip to install stuff.
Yup balancing between the two is something we have to do in every decision we make. When PEP438 was being discussed I did a pretty extensive amount of investigation into what affect this change would have [1]. What I found was that: - The sizable majority was projects would host things on PyPI - There was a significant chunk of projects where a single release or two would only be available externally and it was an accident that they weren’t uploaded. - Of the links that were available externally, very few of them were available in a way that was verifiable and were thus insecure to install. Because of this it was determined that simply allowing externally hosted files without also allowing externally hosted and unverified files would not actually have a significant impact for the vast bulk of the projects that were not hosted on PyPI.
The second group is going to overwhelm the first group, if it doesn't already.
Generally yes, because not every who uses pip to deploy uses pip to install locally, but most people who use pip to deploy also use pip locally.
And I think that's all the comments I have on this issue :)
[1]: https://github.com/dstufft/pypi.linkcheck ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft
There is support for trusted externally hosted packages, you put the URL in PyPI and include a hash in the fragment like so:
http://www.bytereef.org/software/mpdecimal/releases/cdecimal-2.3.tar.gz#md5=...
That is exactly the mode I was using until today. This mode produced the subject's warning message. Today I've switched to manual install mode with manual sha256sum verification which is *far* safer than anything you get via pip right now.
[2] For the definition of safe that PyPI/pip operate under, which is that the author of a package is assumed to be trusted by the person electing to download their package.
No, there are other holes, which you have conceded in your previous mail.
I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that I?m happy to have it and will gladly commit it myself to pip.
Do you honestly not see a difference between the cited warning and the *intended* warning "the server's availability may be unreliable"? Even the latter is FUD or a truism (it applies to any server). The real question is: Why is there a warning if the person running pip has explicitly allowed external packages? Stefan Krah
On May 8, 2014, at 10:36 AM, Stefan Krah
Donald Stufft
wrote: There is support for trusted externally hosted packages, you put the URL in PyPI and include a hash in the fragment like so:
http://www.bytereef.org/software/mpdecimal/releases/cdecimal-2.3.tar.gz#md5=...
That is exactly the mode I was using until today. This mode produced the subject's warning message.
Today I've switched to manual install mode with manual sha256sum verification which is *far* safer than anything you get via pip right now.
It is not safer in any meaingful way. If someone is in a position to compromise the integrity of PyPI's TLS, they can replace the hash on that page with something else. Now you've attempted to work around this by telling people to go look up the release announcement hash. However if someone can compromise the integrity of PyPI's TLS, they can also compromise the integrity of https://mail.python.org/, or GMane, or any other TLS based website[1]. All of that assumes that the end user is going to bother to verify the hash *at all* which almost none of them will and they'll just check the http url into their requirements.txt file and be downloading things over HTTP and be vulnerable to arbitrary code execution via MITM.
[2] For the definition of safe that PyPI/pip operate under, which is that the author of a package is assumed to be trusted by the person electing to download their package.
No, there are other holes, which you have conceded in your previous mail.
The presence of other holes is not a useful argument to avoid closing a hole. We're working to close all of them, and that ends up meaning we close one at a time.
I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that I?m happy to have it and will gladly commit it myself to pip.
Do you honestly not see a difference between the cited warning and the *intended* warning "the server's availability may be unreliable”?
Do I? No I don’t. However I’ve since adjusted the message based on R David Murray’s feedback to make sure it specifically says that access may be unreliable instead of just that the package itself may be unreliable.
Even the latter is FUD or a truism (it applies to any server).
No, because the use of an external host *adds* additional unreliability. If PyPI is down, then all packages are down, including ones that host externally. If the cdecimal server is down, then that one specific package is unavailable. It is impossible to do anything but reduce the overall availability by adding additional SPOFs.
The real question is: Why is there a warning if the person running pip has explicitly allowed external packages?
Why is there a warning? Originally that warning was there because the first part of the transition to this "mode" of defaults was to give an option to disable externally hosted files, but leave it on by default. In this phase we gave this warning to tell the people who just leave things to their default about this file. Should the warning itself still exist? I don't know, but a better avenue for asking for a change in pip is via our issue tracker instead of whining on python-dev.
Stefan Krah
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft
Today I've switched to manual install mode with manual sha256sum verification which is *far* safer than anything you get via pip right now.
It is not safer in any meaingful way.
If someone is in a position to compromise the integrity of PyPI's TLS, they can replace the hash on that page with something else. Now you've attempted to work around this by telling people to go look up the release announcement hash. However if someone can compromise the integrity of PyPI's TLS, they can also compromise the integrity of https://mail.python.org/, or GMane, or any other TLS based website[1].
Of course it is safer. Suppose a file is stored on PyPI: 1) Attacker guesses my username (or is it even visible, I'm not sure). 2) Clicks on "lost login". 3) Intercepts mail (difficult, but far from the TLS attack category). Maybe on a home or university network. Or a rogue person at a mail provider. 4) Changes the uploaded file together with the hash. pip would be perfectly happy, checking the hash via Google would turn up a mismatch. Stefan Krah
On May 8, 2014, at 11:34 AM, Stefan Krah
Donald Stufft
wrote: Today I've switched to manual install mode with manual sha256sum verification which is *far* safer than anything you get via pip right now.
It is not safer in any meaingful way.
If someone is in a position to compromise the integrity of PyPI's TLS, they can replace the hash on that page with something else. Now you've attempted to work around this by telling people to go look up the release announcement hash. However if someone can compromise the integrity of PyPI's TLS, they can also compromise the integrity of https://mail.python.org/, or GMane, or any other TLS based website[1].
Of course it is safer. Suppose a file is stored on PyPI:
1) Attacker guesses my username (or is it even visible, I'm not sure).
2) Clicks on "lost login".
3) Intercepts mail (difficult, but far from the TLS attack category). Maybe on a home or university network. Or a rogue person at a mail provider.
4) Changes the uploaded file together with the hash.
pip would be perfectly happy, checking the hash via Google would turn up a mismatch.
I said “meaningful”. Almost nobody is going to ever bother googling it and the likelihood that someone is able to MITM *you* specifically is far lesser than the likelihood that someone is going to MITM one of the cdecimal users. Additionally your messages aren’t signed and email isn’t an authenticated profile so if someone was able to get your password they could simply spoof and email from you to the mailing list with new hashes, or edit out the description telling people to go google some stuff. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft
I said ?meaningful?. Almost nobody is going to ever bother googling it and the likelihood that someone is able to MITM *you* specifically is far lesser than the likelihood that someone is going to MITM one of the cdecimal users.
I'm doing this for important installs. -- That is how I installed qmail and djbdns.
Additionally your messages aren?t signed and email isn?t an authenticated profile so if someone was able to get your password they could simply spoof and email from you to the mailing list with new hashes, or edit out the description telling people to go google some stuff.
Signing messages is pointless if the key isn't well connected. Also, I'm reading the lists and would notice a "release". Most importantly, the checksum mismatch would still be found, since the old messages with the correct sum would still exist under the scenario we're talking about (i.e. not GHCQ hacking into Belgacom routers). Stefan Krah
On May 8, 2014, at 12:03 PM, Stefan Krah
Donald Stufft
wrote: I said ?meaningful?. Almost nobody is going to ever bother googling it and the likelihood that someone is able to MITM *you* specifically is far lesser than the likelihood that someone is going to MITM one of the cdecimal users.
I'm doing this for important installs. -- That is how I installed qmail and djbdns.
Additionally your messages aren?t signed and email isn?t an authenticated profile so if someone was able to get your password they could simply spoof and email from you to the mailing list with new hashes, or edit out the description telling people to go google some stuff.
Signing messages is pointless if the key isn't well connected. Also, I'm reading the lists and would notice a "release". Most importantly, the checksum mismatch would still be found, since the old messages with the correct sum would still exist under the scenario we're talking about (i.e. not GHCQ hacking into Belgacom routers).
I’m unsure if you’re being willfully dense or if you’re just not understanding what I mean when I say “almost”. Of course there are going to be a few outliers where people do bother to do that, but it’s not going to be common place at all. But whatever, I’ve removed the warning that occurs when you install an externally hosted file [1] and it will be included in pip 1.6. I have not changed the defaults for --allow-all-external nor have I removed the warning that occurs when someone elects to install an unverifiable download. [1] https://github.com/pypa/pip/commit/9f56b79e8d ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft
I?m unsure if you?re being willfully dense or if you?re just not understanding what I mean when I say ?almost?. Of course there are going to be a few outliers where people do bother to do that, but it?s not going to be common place at all.
I suggest that you confine that style of discussion to distutils-sig. Stefan Krah
On 08.05.2014 15:58, Donald Stufft wrote:
On May 8, 2014, at 9:39 AM, M.-A. Lemburg
wrote: Well, to be fair and leaving aside uptime concerns and the general desire to always install packages from some server instead of a safe and trusted local directory (probably too obvious ;-), it would certainly be possible to add support for trusted externally hosted packages.
There is support for trusted externally hosted packages, you put the URL in PyPI and include a hash in the fragment like so:
http://www.bytereef.org/software/mpdecimal/releases/cdecimal-2.3.tar.gz#md5=...
The hash can be md5 or any of the sha-2 family.
Now this does not mean that ``pip install cdecimal`` will automatically install this, because whether or not you're willing to install from servers other than PyPI[1] is a policy decision for the end user of pip.
Hmm, if you call that feature "trusted externally hosted packages", pip should really do trust them, right ? ;-) I can understand that pip defaults to not trusting URLs which don't meet the above feature requirements, but not that it still warns about unreliable externally hosted packages even if the above feature is used. At the moment, pip will refuse to use an externally hosted files even if the package author uses the above hashed URLs; even with HTTPS and proper SSL certificate chain.
The only real contention point there is whether installing from other servers should be on or off by default. PEP438 selected off by default, and I agree with that decision. Installing externally hosted files, which are able to be safely downloaded[2], was a surprising behavior to *everyone* I've talked to who hadn't already discovered that pip/easy_install did that. For the people it wasn't surprising too, they said it was surprising when they had originally discovered it[3].
[1] To be specific, other than the configured index(es), which happens to default to PyPI. [2] For the definition of safe that PyPI/pip operate under, which is that the author of a package is assumed to be trusted by the person electing to download their package. [3] I suspect people who were around when PyPI *couldn't* host files and were only an index would be the exception to this.
However, for some reason there's a strong resistance against doing this, which I frankly don't understand.
I agree with Stefan that the warning message wording is less than ideal. You'd normally call such blanket statements FUD, esp. since there are plenty external hosting services which are reliable and safe to use.
I don't think the warning is FUD, and it doesn't mention anything security related at all. The exact text of the warning is in the subject of the email here:
cdecimal an externally hosted file and may be unreliable
Which is true as far as I can tell, it is externally hosted, and it may be unreliable[1]. If there is a better wording for that I’m happy to have it and will gladly commit it myself to pip.
The current version of pip writes: Downloading/unpacking pkg Could not find any downloads that satisfy the requirement pkg Some externally hosted files were ignored (use --allow-external pkg to allow). Cleaning up... No distributions at all found for pkg This wording if fine, IMO. The wording Stefan quoted gets generated for dependencies. This should probably be changed to the same wording (including the reference to the right command line option to use).
[1] In my experience dealing with complaints of pip's users, one of their big ones was that some dependency they use was, typically unknown to them, hosted externally and they found out it was hosted externally because the server it was hosted on went down.
I think that's a general problem, not one of some server being down: users put too much trust into the dependencies of packages they use. Regardless of how safe/reliable we make things w/r to file hosting, this problem does not go away. It's just too easy for people to get tricked into trusting packages they don't even know about. Nothing we'll ever change, though. People are lazy and easily drop all such concerns for ease of use :-( -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On 08.05.2014 16:42, M.-A. Lemburg wrote:
On 08.05.2014 15:58, Donald Stufft wrote:
On May 8, 2014, at 9:39 AM, M.-A. Lemburg
wrote: Well, to be fair and leaving aside uptime concerns and the general desire to always install packages from some server instead of a safe and trusted local directory (probably too obvious ;-), it would certainly be possible to add support for trusted externally hosted packages.
There is support for trusted externally hosted packages, you put the URL in PyPI and include a hash in the fragment like so:
http://www.bytereef.org/software/mpdecimal/releases/cdecimal-2.3.tar.gz#md5=...
The hash can be md5 or any of the sha-2 family.
Now this does not mean that ``pip install cdecimal`` will automatically install this, because whether or not you're willing to install from servers other than PyPI[1] is a policy decision for the end user of pip.
Hmm, if you call that feature "trusted externally hosted packages", pip should really do trust them, right ? ;-)
I can understand that pip defaults to not trusting URLs which don't meet the above feature requirements, but not that it still warns about unreliable externally hosted packages even if the above feature is used.
At the moment, pip will refuse to use an externally hosted files even if the package author uses the above hashed URLs; even with HTTPS and proper SSL certificate chain.
Could this perhaps be changed/reconsidered for pip ? Note that easy_install/setuptools does not have such problems. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On May 8, 2014, at 11:37 AM, M.-A. Lemburg
On 08.05.2014 16:42, M.-A. Lemburg wrote:
On 08.05.2014 15:58, Donald Stufft wrote:
On May 8, 2014, at 9:39 AM, M.-A. Lemburg
wrote: Well, to be fair and leaving aside uptime concerns and the general desire to always install packages from some server instead of a safe and trusted local directory (probably too obvious ;-), it would certainly be possible to add support for trusted externally hosted packages.
There is support for trusted externally hosted packages, you put the URL in PyPI and include a hash in the fragment like so:
http://www.bytereef.org/software/mpdecimal/releases/cdecimal-2.3.tar.gz#md5=...
The hash can be md5 or any of the sha-2 family.
Now this does not mean that ``pip install cdecimal`` will automatically install this, because whether or not you're willing to install from servers other than PyPI[1] is a policy decision for the end user of pip.
Hmm, if you call that feature "trusted externally hosted packages", pip should really do trust them, right ? ;-)
I can understand that pip defaults to not trusting URLs which don't meet the above feature requirements, but not that it still warns about unreliable externally hosted packages even if the above feature is used.
At the moment, pip will refuse to use an externally hosted files even if the package author uses the above hashed URLs; even with HTTPS and proper SSL certificate chain.
Could this perhaps be changed/reconsidered for pip ?
Note that easy_install/setuptools does not have such problems.
Anything can be changes or reconsidered of course. I feel pretty strongly that an installer should not install things from places other than the index without a specific opt in. That discussion would be best done on distutils-sig as it would require reversing the decision in PEP438. I really don't feel strongly one way or the other about the *warning* that happens when you allow an external file. It exists primarily because at the time it was implemented external files were default to allowed. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 8 May 2014 16:46, Donald Stufft
Anything can be changes or reconsidered of course. I feel pretty strongly that an installer should not install things from places other than the index without a specific opt in. That discussion would be best done on distutils-sig as it would require reversing the decision in PEP438.
I think it's worth reconsidering. Since this behaviour was implemented, there have been many instances of confusion and unhappiness with the situation, both from package developers and pip users. I don't think that's good for pip. I would like to see PEP 438 reviewed with the intention of working out how to fix the user experience (ideally while retaining the reliability enhancements, but accepting that compromises may be needed). Some points: 1. Often a user won't know that a package is externally hosted until an install fails. Having to add a flag and retry is a lousy experience. Why not at a minimum have an "externally hosted - are you sure?" prompt? 2. The way the flags are implemented (notably the need to repeat the package name) is clearly confusing and irritating for users, even if the reasons are logical. We should look at fixing that. 3. The user complaints against pip are significant and ongoing. I don't think they should be ignored. If PEP 438 cannot be reconsidered in the light of user feedback, then I think the pip developers may need to have a discussion about whether we conform to the PEP (clearly Donald thinks we should, I'm not sure I do, and I don't know about the others). But I don't think it's healthy for pip to be looking at deliberately ignoring an accepted PEP, so I'd prefer it if the debate was around addressing the issues in the PEP itself. 4. Maybe distutils-sig *isn't* the right place to discuss revising PEP 438. The issues getting raised are end-user problems, and the users most affected are unlikely to be on that list. Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot? Or maybe we have to accept that some developers have sound reasons for not hosting on PyPI and work with them to find an acceptable compromise? Has anyone checked what Stefan's reasons are for not hosting cdecimal on PyPI? Do they represent a use case that the PEP hasn't considered?
I really don't feel strongly one way or the other about the *warning* that happens when you allow an external file. It exists primarily because at the time it was implemented external files were default to allowed.
I think it's reasonable to remove the warning. If the user chooses to allow an external file, it makes sense to assume they understand the implications and not nag them about their decision. Particularly given the level of controversy the warning is generating. On a personal note, I'm uncomfortable with the way this change is perceived as a case of *pip* enforcing a behaviour that the pip developers feel should be required. I actually don't like this change particularly. So having pip implement the behaviour required by that PEP is to me simply a case of compliance with the agreed standard. But now, as a pip developer, being held responsible for the resulting user pain, and being expected to defend it, does not make me happy. Paul
On May 8, 2014, at 5:02 PM, Paul Moore
On 8 May 2014 16:46, Donald Stufft
wrote: Anything can be changes or reconsidered of course. I feel pretty strongly that an installer should not install things from places other than the index without a specific opt in. That discussion would be best done on distutils-sig as it would require reversing the decision in PEP438.
I think it's worth reconsidering. Since this behaviour was implemented, there have been many instances of confusion and unhappiness with the situation, both from package developers and pip users. I don't think that's good for pip. I would like to see PEP 438 reviewed with the intention of working out how to fix the user experience (ideally while retaining the reliability enhancements, but accepting that compromises may be needed).
I think most of the confusion has been over the fact that —allow-external takes a package name, not that it exists at all.
Some points:
1. Often a user won't know that a package is externally hosted until an install fails. Having to add a flag and retry is a lousy experience. Why not at a minimum have an "externally hosted - are you sure?" prompt?
A prompt is OK with me.
2. The way the flags are implemented (notably the need to repeat the package name) is clearly confusing and irritating for users, even if the reasons are logical. We should look at fixing that.
Yea, there’s a ticket for this.
3. The user complaints against pip are significant and ongoing. I don't think they should be ignored. If PEP 438 cannot be reconsidered in the light of user feedback, then I think the pip developers may need to have a discussion about whether we conform to the PEP (clearly Donald thinks we should, I'm not sure I do, and I don't know about the others). But I don't think it's healthy for pip to be looking at deliberately ignoring an accepted PEP, so I'd prefer it if the debate was around addressing the issues in the PEP itself.
I don’t think the problem with with the PEP.
4. Maybe distutils-sig *isn't* the right place to discuss revising PEP 438. The issues getting raised are end-user problems, and the users most affected are unlikely to be on that list.
Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot?
Do you have any data to point to that says it hasn’t worked? Just to see what impact it has had, I’m running my scripts again that I ran a year ago to see what has changed, already I can see they are processing MUCH faster than last year.
Or maybe we have to accept that some developers have sound reasons for not hosting on PyPI and work with them to find an acceptable compromise? Has anyone checked what Stefan's reasons are for not hosting cdecimal on PyPI? Do they represent a use case that the PEP hasn't considered?
If I recall correctly his reasoning is that he finds the legal requirements associated with uploading to PyPI to be unsatisfactory.
I really don't feel strongly one way or the other about the *warning* that happens when you allow an external file. It exists primarily because at the time it was implemented external files were default to allowed.
I think it's reasonable to remove the warning. If the user chooses to allow an external file, it makes sense to assume they understand the implications and not nag them about their decision. Particularly given the level of controversy the warning is generating.
The warning is gone as of a few hours ago.
On a personal note, I'm uncomfortable with the way this change is perceived as a case of *pip* enforcing a behaviour that the pip developers feel should be required. I actually don't like this change particularly. So having pip implement the behaviour required by that PEP is to me simply a case of compliance with the agreed standard. But now, as a pip developer, being held responsible for the resulting user pain, and being expected to defend it, does not make me happy.
Paul
I think the pain is being overrepresented and the positives are being ignored. The problem is the benefits of this PEP are much like the benefits of TLS too. For the vast majority of people they don’t notice anything different except installing things is faster and more reliable. They don’t attribute that to the PEP or this decision, they just internalize it as the new norm. However the people who this does affect will seek out why it broke and raise an issue citing that thing specifically. This creates a perception of lots of pain for no gain when the reality is not that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 9 May 2014 07:23, "Donald Stufft"
On May 8, 2014, at 5:02 PM, Paul Moore
wrote: Or maybe we have to accept that some developers have sound reasons for not hosting on PyPI and work with them to find an acceptable compromise? Has anyone checked what Stefan's reasons are for not hosting cdecimal on PyPI? Do they represent a use case that the PEP hasn't considered?
If I recall correctly his reasoning is that he finds the legal requirements associated with uploading to PyPI to be unsatisfactory.
I actually need to follow up on that, because the terms *were* legally questionable last time I looked (also too hard to review, since as far as I am aware, they're only presented during new user sign-up). I'll deal with that at work today. Cheers, Nick. P.S. Still the wrong list, folks. Most of the people you need to convince to get changes made to pip, PyPI and the packaging ecosystem in general aren't here.
On May 8, 2014, at 6:20 PM, Nick Coghlan
On 9 May 2014 07:23, "Donald Stufft"
wrote: On May 8, 2014, at 5:02 PM, Paul Moore
wrote: Or maybe we have to accept that some developers have sound reasons for not hosting on PyPI and work with them to find an acceptable compromise? Has anyone checked what Stefan's reasons are for not hosting cdecimal on PyPI? Do they represent a use case that the PEP hasn't considered?
If I recall correctly his reasoning is that he finds the legal requirements associated with uploading to PyPI to be unsatisfactory.
I actually need to follow up on that, because the terms *were* legally questionable last time I looked (also too hard to review, since as far as I am aware, they're only presented during new user sign-up).
I'll deal with that at work today.
I’m pretty sure VanL wrote the terms and has explicitly said they won’t change and are exactly as broad as they need to be without being any broader[1]. They are linked to from the footer of every UI centric PyPI page. [1] https://mail.python.org/pipermail/python-legal-sig/2013-March/000003.html ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 9 May 2014 08:22, "Donald Stufft"
On May 8, 2014, at 6:20 PM, Nick Coghlan
wrote:
I actually need to follow up on that, because the terms *were* legally
questionable last time I looked (also too hard to review, since as far as I am aware, they're only presented during new user sign-up).
I'll deal with that at work today.
I’m pretty sure VanL wrote the terms and has explicitly said they won’t change and are exactly as broad as they need to be without being any broader[1]. They are linked to from the footer of every UI centric PyPI
page. Thanks for the additional references. It should be possible to clarify the terms to address Red Hat's (and other users') concerns while still addressing the points Van is worried about (for example, adding a footnote explaining the use cases that need to be covered, and being explicit that users must respect *both* licenses). However, this subthread is now even further off-topic for this list, so I'll take it to python-legal-sig later today with some specific proposed wording changes. Cheers, Nick.
[1]
https://mail.python.org/pipermail/python-legal-sig/2013-March/000003.html
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
DCFA
On May 8, 2014, at 5:22 PM, Donald Stufft
Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot?
Do you have any data to point to that says it hasn’t worked? Just to see what impact it has had, I’m running my scripts again that I ran a year ago to see what has changed, already I can see they are processing MUCH faster than last year.
The data has finished processing, it represents a time diff of approximately one year. The pip release that caused all of this was released about 4-5 months ago. Overall PyPI has seen a 50% growth in installable projects in that time. If the change would have had no effect we'd expect to see a ~50% increase across the board. However what we've seen is a a 60% (+10% of expected) increase in projects that can only be installed from PyPI and a 12% decrease in projects that have any unsafe files (-62% of expected). Further more we can see that if pip were to change the default of --allow-all-external it would take 23 projects from unable to be installed by default to able to be installed by default. This represents 0.2% of installable projects on PyPI. It would take an additional 40 projects and make one or more additional files able to be downloaded by default. Some other data points: * We've gone from 86% of projects being installable from PyPI to 92%. * We've gone from 5% of projects being only unsafely installable to 3% * We've gone from 14% of projects having any files unsafe to install to 8% * We've gone from 0.004% of projects being safely hosted externally to 0.2% Looking at these numbers I think it's safe to say that in this time period that the "hosting hygiene" of a PyPI project is more likely to be a better state than it was a year ago. We cannot state for a fact if this is because of this change or not, however given that the fallout is ~23 (or ~63) projects out of 38,835 I think it is incredibly reasonable to leave the defaults alone since there is a reasonably high chance that they played at least some part in that change. I'd love to get these numbers to the point where the number of projects installable strictly from PyPI is 100% (or at least 100% installable safely), however 92% (or 92.2%) is getting pretty close to that and hopefully that number will just continue to grow until it hits 100%. For reference, here's the raw numbers as well as some summary of the data here: https://gist.github.com/dstufft/b14008d11c0a5760dbed And the repository where the raw data as well as the scripts used to collect and process it is here: https://github.com/dstufft/pypi.linkcheck linkcollector.py collections while linkwriter.py writes out the json file, and stats2.py processes and gives the numbers from the gist above. links.json is the data from a year ago, and 2014-05-08.links.json is the data from today. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On May 9, 2014, at 12:34 AM, Donald Stufft
The data has finished processing, it represents a time diff of approximately one year. The pip release that caused all of this was released about 4-5 months ago.
Oh I forgot to mention: In order to make the comparison as accurate as possible I've used the same collection script as I did a year ago. This is prior to the real advent of Wheels so these numbers do not take into accounts Wheels at all (which pip can also install) but it *does* include Eggs which pip cannot install. Further more it also includes #egg=dev urls which point to a VCS (and it would consider those an unverifiable/unsafe download). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 9 May 2014 05:34, Donald Stufft
On May 8, 2014, at 5:22 PM, Donald Stufft
wrote: Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot?
Do you have any data to point to that says it hasn’t worked? Just to see what impact it has had, I’m running my scripts again that I ran a year ago to see what has changed, already I can see they are processing MUCH faster than last year.
The data has finished processing, it represents a time diff of approximately one year. The pip release that caused all of this was released about 4-5 months ago.
Overall PyPI has seen a 50% growth in installable projects in that time. If the change would have had no effect we'd expect to see a ~50% increase across the board. However what we've seen is a a 60% (+10% of expected) increase in projects that can only be installed from PyPI and a 12% decrease in projects that have any unsafe files (-62% of expected).
Donald, Thanks for taking the time to get those figures. It does appear that there are less cases that would be affected than the number of complaints would imply. The only concern I have about this type of analysis is that it doesn't "weight" projects. It may be (and again, I have no data to back this up) that the projects that are affected detrimentally by this change are unusually popular or otherwise significant. There's obviously no way to assess this sensibly other than by making a judgement on the level of complaints. But arguing numbers was never my intention here, so let's just say that I concede that the change has had a positive effect, which is great. Paul
On May 9, 2014, at 5:01 AM, Paul Moore
On 9 May 2014 05:34, Donald Stufft
wrote: On May 8, 2014, at 5:22 PM, Donald Stufft
wrote: Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot?
Do you have any data to point to that says it hasn’t worked? Just to see what impact it has had, I’m running my scripts again that I ran a year ago to see what has changed, already I can see they are processing MUCH faster than last year.
The data has finished processing, it represents a time diff of approximately one year. The pip release that caused all of this was released about 4-5 months ago.
Overall PyPI has seen a 50% growth in installable projects in that time. If the change would have had no effect we'd expect to see a ~50% increase across the board. However what we've seen is a a 60% (+10% of expected) increase in projects that can only be installed from PyPI and a 12% decrease in projects that have any unsafe files (-62% of expected).
Donald, Thanks for taking the time to get those figures. It does appear that there are less cases that would be affected than the number of complaints would imply.
Of course, I don’t like making claims without backing them up if I can :)
The only concern I have about this type of analysis is that it doesn't "weight" projects. It may be (and again, I have no data to back this up) that the projects that are affected detrimentally by this change are unusually popular or otherwise significant. There's obviously no way to assess this sensibly other than by making a judgement on the level of complaints.
Yea, I don’t have a good way to weight those projects in any way. Normally I could get some sort of estimate by looking at the download numbers from PyPI but well ;) For the record, here’s the list of projects that are hosted *only* safely externally or that have *any* safely externally hosted files: https://gist.github.com/dstufft/1b16c305f97fff6cef2f Most of these don’t stand out to me at all. The only ones that do are: * pyOpenSSL which has one older release that is hosted that way * argparse which has the latest release hosted this way but has older releases hosted on PyPI * new relic which only hosts older releases externally * beautifulsoup4 which hosts things safely externally *and* on PyPI * Paste which has one “external” thing which is actually only external because it used a cheeseshop.python.org link instead of a pypi.python.org link. * ipython which has one older release hosted safely externally but the latest is on PyPI * netifaces which has one older release hosted safely externally but the rest are on PyPI
But arguing numbers was never my intention here, so let's just say that I concede that the change has had a positive effect, which is great. Paul
I didn’t mean to try to imply that it was :) I just wanted to make sure that *my* claims were true, or if they weren’t I wanted to be able to say that I was wrong. Since I had the numbers computed already it didn’t make any sense not to share them here. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 08.05.2014 23:22, Donald Stufft wrote:
On a personal note, I'm uncomfortable with the way this change is perceived as a case of *pip* enforcing a behaviour that the pip developers feel should be required. I actually don't like this change particularly. So having pip implement the behaviour required by that PEP is to me simply a case of compliance with the agreed standard. But now, as a pip developer, being held responsible for the resulting user pain, and being expected to defend it, does not make me happy.
I think the pain is being overrepresented and the positives are being ignored. The problem is the benefits of this PEP are much like the benefits of TLS too. For the vast majority of people they don’t notice anything different except installing things is faster and more reliable. They don’t attribute that to the PEP or this decision, they just internalize it as the new norm. However the people who this does affect will seek out why it broke and raise an issue citing that thing specifically. This creates a perception of lots of pain for no gain when the reality is not that.
Donald: I don't think anyone is arguing that hosting packages on PyPI is a bad thing and PyPI as a service has gotten a lot better than it was a few years ago. However, I find it troubling that we as Python developers are *forcing* the whole Python world to put their code into PyPI. There are plenty good reasons not to do this, and sometimes it's even impossible if you want to stay legal (see the PEP for details). Accordingly, we should respect those reasons make it possible for Python packages to live elsewhere, without having our tools put those packages into a bad light or making it harder for Python users to install such packages than needed. With the checksum uploaded to PyPI, the only argument against fetching packages from other places on the Internet is reliability, not security. PyPI is not the only reliable file hosting system on the Internet, so this argument is rather weak. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 09 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On May 9, 2014, at 4:12 AM, M.-A. Lemburg
On 08.05.2014 23:22, Donald Stufft wrote:
On a personal note, I'm uncomfortable with the way this change is perceived as a case of *pip* enforcing a behaviour that the pip developers feel should be required. I actually don't like this change particularly. So having pip implement the behaviour required by that PEP is to me simply a case of compliance with the agreed standard. But now, as a pip developer, being held responsible for the resulting user pain, and being expected to defend it, does not make me happy.
I think the pain is being overrepresented and the positives are being ignored. The problem is the benefits of this PEP are much like the benefits of TLS too. For the vast majority of people they don’t notice anything different except installing things is faster and more reliable. They don’t attribute that to the PEP or this decision, they just internalize it as the new norm. However the people who this does affect will seek out why it broke and raise an issue citing that thing specifically. This creates a perception of lots of pain for no gain when the reality is not that.
Donald: I don't think anyone is arguing that hosting packages on PyPI is a bad thing and PyPI as a service has gotten a lot better than it was a few years ago.
Didn’t mean to imply that I thought it was otherwise.
However, I find it troubling that we as Python developers are *forcing* the whole Python world to put their code into PyPI.
Forcing is a strong word. As of right now we’re "forcing" you to put it onto PyPI if you want to install it without *any* flags to pip. You're more then capable of hosting it externally if you wish, and pip will even tell the people who try to install it what they need to enable in order to install it. It even allows people to say "I don't care about any of this crap, just make all the external stuff work". Even if pip removed the external link handling all together and required you to do something like: $ pip install --extra-index-url https://example.com/our-packages/ foo $ pip install --find-links https://example.com/our-packages/ foo We still wouldn't be forcing anyone to upload things to PyPI. We are, however, discouraging people from not hosting on PyPI and providing incentives to doing that.
There are plenty good reasons not to do this, and sometimes it's even impossible if you want to stay legal (see the PEP for details).
I re-read the reasons, I'm not sure I really agree with most (or all?) of them however if people really want to do it, then there is nothing stopping them. It's their choice and people on the *other* side of that who are installing these packages also get to make a choice if they want to allow it or not.
Accordingly, we should respect those reasons make it possible for Python packages to live elsewhere, without having our tools put those packages into a bad light or making it harder for Python users to install such packages than needed.
I'm not sure what "put it into a bad light" is supposed to mean here. Is it about the warning? I've already removed that completely. As far as making it harder than it "needs" to be, that's a hard line to draw. I personally know people for whom things that are hosted externally have caused them a lot of pain to the point where they have to automatically spider PyPI for their own dependencies every hour or so in order to isolate themselves from the failure of random dependencies suddenly no longer being installable. This breakage is going to happen for basically any project that installs their dependencies with any frequency. One of the big projects that I'm aware of that has had this problem is Openstack who, in their CI, installs things several hundred times a day. If a service they depend on goes down that causes significant disruption for them. Now they've solved it by the above solution of hosting their own mirror of PyPI that includes spidering for externally installable things. They have one or two packages left that don't host directly on PyPI at which point they'll no longer need that afaik. I don't think telling every downstream of us of any size that in order to get reliable installs they are going to have to work around PyPI, and in some cases even develop custom software in order to do so effectively, is a very user friendly position.
With the checksum uploaded to PyPI, the only argument against fetching packages from other places on the Internet is reliability, not security.
I've never said differently. There are some minor privacy things in there but primarily it's about reliability and that people should generally be aware where their packages are coming from.
PyPI is not the only reliable file hosting system on the Internet, so this argument is rather weak.
It actually doesn't matter how reliable the other systems are. Reliability wise the best possible outcome is the external host has 100% (extremely unlikely) uptime and it has no affect on reliability. The realistic outcome is that the external hosts will not have 100% uptime and will infact decrease the total availability of the install sets on PyPI. This isn't an opinion it is fact. But in reality reliability also includes more things than just the server temporarily going down. It also includes things like the maintainer has quit maintaining that package, left Python all together, died even. There are a decent number of packages on PyPI that people use which have no active maintainer at all. If people rely on a package that isn't hosted on PyPI they take a larger risk that their dependencies are going to become uninstallable due to the above. When scanning the unverifiable links it is amazing how many of those links are now 404s or 500s or even "nodename not founds". All of these cause issues for people and are not hypothetical, I've dealt with people complaining about one or another of them. When things are hosted on PyPI then we (we being the infra team) are able to ensure that the likelihood of ``pip install <something>`` actually works and continues to work. When things are hosted externally the best we can do is say sorry. At that point the only major thing outside of our control that can cause someones installations to suddenly start failing is if the package author willfully removes their things from PyPI. That is unfortunate but it is also their right and there's nothing to be done for that. Thankfully that case is far less common than the case of "an external site just isn't there". I think it's important to point out that one of the driving factors that caused me to finally push for changes and what lead to PEP438 being created was that Mercurial's external hosted was being extremely flaky. I can't remember the exact details but I want to say that over the span of a week or two I was getting massive numbers of users complaining that ``pip install Mercurial`` was suddenly failing. This isn't to knock on the Mercurial folks or anything but to simply point out that these problems aren't things that just happen to (under|un)maintained software nor are they hypothetical. This PEP was born of the frustration that was being relayed to me by end users of PyPI/pip. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 9 May 2014 12:44, Donald Stufft
We still wouldn't be forcing anyone to upload things to PyPI. We are, however, discouraging people from not hosting on PyPI and providing incentives to doing that.
But you're doing so by inflicting pain on people using pip to install those packages. Those users complain about *pip*, not about the packages. Better to directly impact the package maintainers, rather than their users (who are innocent victims). Better still of course to encourage people to improve things, not to punish them for not doing so.
I think it's important to point out that one of the driving factors that caused me to finally push for changes and what lead to PEP438 being created was that Mercurial's external hosted was being extremely flaky. I can't remember the exact details but I want to say that over the span of a week or two I was getting massive numbers of users complaining that ``pip install Mercurial`` was suddenly failing. This isn't to knock on the Mercurial folks or anything but to simply point out that these problems aren't things that just happen to (under|un)maintained software nor are they hypothetical. This PEP was born of the frustration that was being relayed to me by end users of PyPI/pip.
So now "pip install Mercurial" always fails? And adding a flag allows it to work as well as before, but no better? How did that fix the issue? Seriously - I'm missing something here. Paul
On 9 May 2014 21:55, Paul Moore
On 9 May 2014 12:44, Donald Stufft
wrote: I think it's important to point out that one of the driving factors that caused me to finally push for changes and what lead to PEP438 being created was that Mercurial's external hosted was being extremely flaky. I can't remember the exact details but I want to say that over the span of a week or two I was getting massive numbers of users complaining that ``pip install Mercurial`` was suddenly failing. This isn't to knock on the Mercurial folks or anything but to simply point out that these problems aren't things that just happen to (under|un)maintained software nor are they hypothetical. This PEP was born of the frustration that was being relayed to me by end users of PyPI/pip.
So now "pip install Mercurial" always fails? And adding a flag allows it to work as well as before, but no better? How did that fix the issue? Seriously - I'm missing something here.
Mercurial downloads are now available directly from PyPI. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On May 9, 2014, at 7:55 AM, Paul Moore
On 9 May 2014 12:44, Donald Stufft
wrote: We still wouldn't be forcing anyone to upload things to PyPI. We are, however, discouraging people from not hosting on PyPI and providing incentives to doing that.
But you're doing so by inflicting pain on people using pip to install those packages. Those users complain about *pip*, not about the packages. Better to directly impact the package maintainers, rather than their users (who are innocent victims). Better still of course to encourage people to improve things, not to punish them for not doing so.
We can’t directly impact the package maintainers and the vast bulk of people who have had a problem who have complained about it to pip also need to add the —allow-unverifiable flag and would not simply be able to be fixed by allowing safely externally hosted files. Looking at the numbers and what packages are hosted externally, allowing safely externally hosted files would have practically no benefit to pip’s end users. The only case that I can see with a quick scan would be it would allow the latest version of argparse. TBH I think the biggest source of confusion reduction would be to remove the “safely externally hosted’ category all together and just make it hosted on PyPI -> Install by default, hosted off PyPI -> requires a per package flag. However I’m sure the vocal minority would have a problem with that.
I think it's important to point out that one of the driving factors that caused me to finally push for changes and what lead to PEP438 being created was that Mercurial's external hosted was being extremely flaky. I can't remember the exact details but I want to say that over the span of a week or two I was getting massive numbers of users complaining that ``pip install Mercurial`` was suddenly failing. This isn't to knock on the Mercurial folks or anything but to simply point out that these problems aren't things that just happen to (under|un)maintained software nor are they hypothetical. This PEP was born of the frustration that was being relayed to me by end users of PyPI/pip.
So now "pip install Mercurial" always fails? And adding a flag allows it to work as well as before, but no better? How did that fix the issue? Seriously - I'm missing something here.
No, This caused Mercurial to upload their packages to PyPI. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 9 May 2014 13:06, Donald Stufft
I think it's important to point out that one of the driving factors that caused me to finally push for changes and what lead to PEP438 being created was that Mercurial's external hosted was being extremely flaky. I can't remember the exact details but I want to say that over the span of a week or two I was getting massive numbers of users complaining that ``pip install Mercurial`` was suddenly failing. This isn't to knock on the Mercurial folks or anything but to simply point out that these problems aren't things that just happen to (under|un)maintained software nor are they hypothetical. This PEP was born of the frustration that was being relayed to me by end users of PyPI/pip.
So now "pip install Mercurial" always fails? And adding a flag allows it to work as well as before, but no better? How did that fix the issue? Seriously - I'm missing something here.
No, This caused Mercurial to upload their packages to PyPI.
You're claiming that Mercurial moved to hosting on PyPI solely because users suddenly needed to add a flag to install from pip? As opposed to because PyPI gave them a more reliable hosting platform, for example? OK. I certainly can't give any evidence to dispute that claim, although I'm surprised. Paul
On May 9, 2014, at 8:21 AM, Paul Moore
On 9 May 2014 13:06, Donald Stufft
wrote: I think it's important to point out that one of the driving factors that caused me to finally push for changes and what lead to PEP438 being created was that Mercurial's external hosted was being extremely flaky. I can't remember the exact details but I want to say that over the span of a week or two I was getting massive numbers of users complaining that ``pip install Mercurial`` was suddenly failing. This isn't to knock on the Mercurial folks or anything but to simply point out that these problems aren't things that just happen to (under|un)maintained software nor are they hypothetical. This PEP was born of the frustration that was being relayed to me by end users of PyPI/pip.
So now "pip install Mercurial" always fails? And adding a flag allows it to work as well as before, but no better? How did that fix the issue? Seriously - I'm missing something here.
No, This caused Mercurial to upload their packages to PyPI.
You're claiming that Mercurial moved to hosting on PyPI solely because users suddenly needed to add a flag to install from pip? As opposed to because PyPI gave them a more reliable hosting platform, for example? OK. I certainly can't give any evidence to dispute that claim, although I'm surprised.
Paul
I don’t know that for a fact but If my memory is correct that’s a reasonable assumption based on the timeline. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 9 May 2014 13:25, Donald Stufft
You're claiming that Mercurial moved to hosting on PyPI solely because users suddenly needed to add a flag to install from pip? As opposed to because PyPI gave them a more reliable hosting platform, for example? OK. I certainly can't give any evidence to dispute that claim, although I'm surprised.
I don’t know that for a fact but If my memory is correct that’s a reasonable assumption based on the timeline.
As Stefan pointed out, improved PyPI infrastructure such as the CDN seems like a much more likely reason for Mercurial to have switched. Paul
On Fri, 9 May 2014 13:47:49 +0100 Paul Moore
On 9 May 2014 13:25, Donald Stufft
wrote: You're claiming that Mercurial moved to hosting on PyPI solely because users suddenly needed to add a flag to install from pip? As opposed to because PyPI gave them a more reliable hosting platform, for example? OK. I certainly can't give any evidence to dispute that claim, although I'm surprised.
I don’t know that for a fact but If my memory is correct that’s a reasonable assumption based on the timeline.
As Stefan pointed out, improved PyPI infrastructure such as the CDN seems like a much more likely reason for Mercurial to have switched.
Here is the only reference I could find: http://www.selenic.com/pipermail/mercurial-devel/2013-July/052129.html Regards Antoine.
Paul Moore
You're claiming that Mercurial moved to hosting on PyPI solely because users suddenly needed to add a flag to install from pip? As opposed to because PyPI gave them a more reliable hosting platform, for example? OK. I certainly can't give any evidence to dispute that claim, although I'm surprised.
That is my understanding of the posted statistics, too. PyPI was overloaded, the improved infrastructure fixed that, so more projects now host on PyPI. Stefan Krah
On 09.05.2014 13:44, Donald Stufft wrote:
On May 9, 2014, at 4:12 AM, M.-A. Lemburg
wrote: Donald: I don't think anyone is arguing that hosting packages on PyPI is a bad thing and PyPI as a service has gotten a lot better than it was a few years ago.
Didn’t mean to imply that I thought it was otherwise.
However, I find it troubling that we as Python developers are *forcing* the whole Python world to put their code into PyPI.
Forcing is a strong word. As of right now we’re "forcing" you to put it onto PyPI if you want to install it without *any* flags to pip.
Which is what most people expect to be able to do.
You're more then capable of hosting it externally if you wish, and pip will even tell the people who try to install it what they need to enable in order to install it. It even allows people to say "I don't care about any of this crap, just make all the external stuff work".
Even if pip removed the external link handling all together and required you to do something like:
$ pip install --extra-index-url https://example.com/our-packages/ foo $ pip install --find-links https://example.com/our-packages/ foo
We still wouldn't be forcing anyone to upload things to PyPI. We are, however, discouraging people from not hosting on PyPI and providing incentives to doing that.
This is all true, but in reality, you are making externally hosted packages second class citizens in Python land by requiring extra options even for package listings that are perfectly safe to download from other servers. As mentioned before: I can understand that you want to make downloads safe for users, but if a package is hosted on some other reliable servers and pip can check that it's a valid package, there's little to argue for not allowing such downloads.
There are plenty good reasons not to do this, and sometimes it's even impossible if you want to stay legal (see the PEP for details).
I re-read the reasons, I'm not sure I really agree with most (or all?) of them however if people really want to do it, then there is nothing stopping them. It's their choice and people on the *other* side of that who are installing these packages also get to make a choice if they want to allow it or not.
You don't have to agree with those reasons. Fact is, they exist, prevent package authors from uploading to PyPI and we as Python developers should respect those reasons. With the new infrastructure, it's far more attractive to host packages on PyPI than it was before, so people who do not host on PyPI will have carefully thought about this.
Accordingly, we should respect those reasons make it possible for Python packages to live elsewhere, without having our tools put those packages into a bad light or making it harder for Python users to install such packages than needed.
I'm not sure what "put it into a bad light" is supposed to mean here. Is it about the warning? I've already removed that completely.
It's using the same unfortunate strategy that setuptools used by requiring an option called --single-version-externally-managed to be able to install a package without all the .pth and egg logic (an option that pip now uses per default, BTW ;-)). I snipped the rest of the discussion and reliability, using unmaintained packages and projects using their own mirrors (which should really be the standard, not an exceptional case), because it's not really leading anywhere: At the time we discussed the PEP, security was the main concern, not reliability. Now that there is a secure way to reference external distribution files, I think we should revisit the defaults decision in the light of the new possibilities. BTW: The PEP mentions re-hosting tools to upload their packages to PyPI and also says "This re-hosting tool MUST be available before automated hosting-mode changes are announced to package maintainers." I am not aware of such tools. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 09 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On May 9, 2014, at 9:58 AM, M.-A. Lemburg
On 09.05.2014 13:44, Donald Stufft wrote:
On May 9, 2014, at 4:12 AM, M.-A. Lemburg
wrote: Donald: I don't think anyone is arguing that hosting packages on PyPI is a bad thing and PyPI as a service has gotten a lot better than it was a few years ago.
Didn’t mean to imply that I thought it was otherwise.
However, I find it troubling that we as Python developers are *forcing* the whole Python world to put their code into PyPI.
Forcing is a strong word. As of right now we’re "forcing" you to put it onto PyPI if you want to install it without *any* flags to pip.
Which is what most people expect to be able to do.
Sure, but sometimes it’s better to make an informed choice about things being installed instead of having it be installed by default and sometimes it’s better to disallow doing something at all. Further most people don’t expect an install to touch any server host other than PyPI. As far as I’m aware none of the other package repositories feature this, and even with the fact we support it barely anyone even cares to utilize it.
You're more then capable of hosting it externally if you wish, and pip will even tell the people who try to install it what they need to enable in order to install it. It even allows people to say "I don't care about any of this crap, just make all the external stuff work".
Even if pip removed the external link handling all together and required you to do something like:
$ pip install --extra-index-url https://example.com/our-packages/ foo $ pip install --find-links https://example.com/our-packages/ foo
We still wouldn't be forcing anyone to upload things to PyPI. We are, however, discouraging people from not hosting on PyPI and providing incentives to doing that.
This is all true, but in reality, you are making externally hosted packages second class citizens in Python land by requiring extra options even for package listings that are perfectly safe to download from other servers.
As mentioned before: I can understand that you want to make downloads safe for users, but if a package is hosted on some other reliable servers and pip can check that it's a valid package, there's little to argue for not allowing such downloads.
Except the fact that the only people I’ve ever seen *happy* that people can host packages externally are a handful (<5) of people (tbh primarily you and Stefan) and the feedback I get from every other person around the web in unequivocally yes please get rid of externally hosted files, they’ve done nothing but cause me pain. It’s not reasonable to allow a minority of users to negatively impact the majority.
There are plenty good reasons not to do this, and sometimes it's even impossible if you want to stay legal (see the PEP for details).
I re-read the reasons, I'm not sure I really agree with most (or all?) of them however if people really want to do it, then there is nothing stopping them. It's their choice and people on the *other* side of that who are installing these packages also get to make a choice if they want to allow it or not.
You don't have to agree with those reasons. Fact is, they exist, prevent package authors from uploading to PyPI and we as Python developers should respect those reasons.
No I disagree that they are reasons at all. Half of the ones you enumerated are nonsensical or irrelevant, the other half can be fixed by adding features to or fixing PyPI. One or two read like reasons why someone might actually decide not to upload something to PyPI but which that reason isn’t really all that reasonable and finally a grand total of one or two of them look like an actual legit reasons and it only applies to Crypto software. I mean your reasoning included things like:
PyPI doesn’t let you upload two files with the same name, you gave the example of UC2 vs UC4
The thing being if PyPI did allow you to do that then we’d have no way to determine which one is the right one and half of the people would just get randomly failing installs because we guessed the wrong one.
They don’t know it’s possible to upload to PyPI
I don’t even know how to response to this one.
PyPI used to be unreliable but isn’t now and they don’t know about it
Nor this one.
Not wanting people to download a package at all and instead check out from a VCS repo
This’ll randomly fail for people depending on if they happen to have that VCS installed. Also I don’t see any evidence that people are actually doing this outside of supporting a foo==dev install which pip won’t install by default either.
Not wanting to add latency for yourself for downloading from PyPI
Instead you’re going to inflict extra latency on everyone else when you could just run a mirror or upload the packages to your own server.
File types that PyPI doesn’t support and which utilize a third party package format, PyPM being a given example.
Seriously? Should an apt repo support external links because you can’t install RPMs with apt?
Wanting people to have to read something before installing the package because it relies on some additional setup.
This is completely nonsensical if they want people to have to read something before they attempt to pip install it then they don’t want pip to discover the external file at all then.
It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail.
So if this is an actual problem file a bug and we can make it so PyPI lets you upload things as “drafts” and then you can publish them all at once or some such thing.
having a strong need to know the number of downloads per package and associated statistics such as downloads per country, per year/month/day/hour
If additional statistics are desired for PyPI file a bug and we can add them.
not wanting to give up access to the download log files
This appears to be repeating the last reason with different wording.
the PyPI uploads not being compatible to their release process
I’m not sure how it’s possible to have a release process where it isn’t compatible to upload to an additional server. Maybe ones that don’t currently do it but ones that are fundamentally incompatible? Almost all of your “reasons” read like someone who maybe had one corner case where it might have made sense but didn’t want to admit that it was a corner case so tried to think up any random reason they could think of in order to make the list look longer and more impressive than it actually was.
With the new infrastructure, it's far more attractive to host packages on PyPI than it was before, so people who do not host on PyPI will have carefully thought about this.
Accordingly, we should respect those reasons make it possible for Python packages to live elsewhere, without having our tools put those packages into a bad light or making it harder for Python users to install such packages than needed.
I'm not sure what "put it into a bad light" is supposed to mean here. Is it about the warning? I've already removed that completely.
It's using the same unfortunate strategy that setuptools used by requiring an option called --single-version-externally-managed to be able to install a package without all the .pth and egg logic (an option that pip now uses per default, BTW ;-)).
I snipped the rest of the discussion and reliability, using unmaintained packages and projects using their own mirrors (which should really be the standard, not an exceptional case), because it's not really leading anywhere:
Using your own mirror shouldn’t be the standard if all you’re doing is automatically updating that mirror. It’s a hack to get around unreliability and it should be seen of as a sign of a failure to provide a service that people can rely on and that’s how I see it. People depend on this service and it’s irresponsible to not treat it as a critical piece of infrastructure.
At the time we discussed the PEP, security was the main concern, not reliability.
The main concern, not the only concern.
Now that there is a secure way to reference external distribution files, I think we should revisit the defaults decision in the light of the new possibilities.
Re-evaluate all you want. If they change it will be over my strenuous objections. The feedback I get from users is overwhelmly positive. Even the ones who get struck by needing to add extra flags tell me how they are glad that pip and PyPI are moving towards a more reliable model.
BTW: The PEP mentions re-hosting tools to upload their packages to PyPI and also says "This re-hosting tool MUST be available before automated hosting-mode changes are announced to package maintainers." I am not aware of such tools.
It appears this got missed. I’ll add it to my stack to remedy this.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, May 09 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald, I'm out of his discussion. I have one last request: please don't gossip about core devs in public as long as you have commit privs: https://botbot.me/freenode/python-requests/ Stefan Krah
On May 9, 2014, at 12:11 PM, Stefan Krah
Donald, I'm out of his discussion. I have one last request: please don't gossip about core devs in public as long as you have commit privs:
I don’t really know how to respond to this. I was talking to some people I know and I gave them a summary of what was happening. I stand by everything I said there and I don’t think any of it was wrong or even gossip at all. If people feel that was inappropriate then go ahead and take my commit privs. I have them to make it easier for me to write PEPs and to update ensurepip. If they’re going to be used as an excuse to attempt to censor me then I’d rather not have them as I generally always speak my mind and I won’t stop doing so. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Stefan,
If the only way you can think of to invalidate Donald's (vastly
superior) arguments is to accuse of him of "gossip", you should
probably reconsider your arguments. Looking at the conversation you
didn't actually link to
(https://botbot.me/freenode/python-requests/msg/14389415/) there is no
gossip. There are no insinuations about your character. All that is
there is a succinct description of the topic of this conversation.
Cheers,
Ian
On Fri, May 9, 2014 at 11:23 AM, Donald Stufft
On May 9, 2014, at 12:11 PM, Stefan Krah
wrote: Donald, I'm out of his discussion. I have one last request: please don't gossip about core devs in public as long as you have commit privs:
I don’t really know how to respond to this. I was talking to some people I know and I gave them a summary of what was happening. I stand by everything I said there and I don’t think any of it was wrong or even gossip at all.
If people feel that was inappropriate then go ahead and take my commit privs. I have them to make it easier for me to write PEPs and to update ensurepip. If they’re going to be used as an excuse to attempt to censor me then I’d rather not have them as I generally always speak my mind and I won’t stop doing so.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/graffatcolmingov%40gmail....
On 09.05.2014 17:39, Donald Stufft wrote:
On May 9, 2014, at 9:58 AM, M.-A. Lemburg
wrote: On 09.05.2014 13:44, Donald Stufft wrote:
On May 9, 2014, at 4:12 AM, M.-A. Lemburg
wrote: Donald: I don't think anyone is arguing that hosting packages on PyPI is a bad thing and PyPI as a service has gotten a lot better than it was a few years ago.
Didn’t mean to imply that I thought it was otherwise.
However, I find it troubling that we as Python developers are *forcing* the whole Python world to put their code into PyPI.
Forcing is a strong word. As of right now we’re "forcing" you to put it onto PyPI if you want to install it without *any* flags to pip.
Which is what most people expect to be able to do.
Sure, but sometimes it’s better to make an informed choice about things being installed instead of having it be installed by default and sometimes it’s better to disallow doing something at all.
Further most people don’t expect an install to touch any server host other than PyPI.
For some idea of "most people" that's probably true. I'd argue that most people probably don't really care all that much where their packages are coming from as long as they are getting the packages registered with PyPI, the Python package index, and can be sure that no one has tampered with them. But both arguments don't really count much, since it's all speculation.
As far as I’m aware none of the other package repositories feature this, and even with the fact we support it barely anyone even cares to utilize it.
Most Linux repos comes with a list of standard repos to include. In Python land, Plone uses zc.buildout with a similar default list of repos.
You're more then capable of hosting it externally if you wish, and pip will even tell the people who try to install it what they need to enable in order to install it. It even allows people to say "I don't care about any of this crap, just make all the external stuff work".
Even if pip removed the external link handling all together and required you to do something like:
$ pip install --extra-index-url https://example.com/our-packages/ foo $ pip install --find-links https://example.com/our-packages/ foo
We still wouldn't be forcing anyone to upload things to PyPI. We are, however, discouraging people from not hosting on PyPI and providing incentives to doing that.
This is all true, but in reality, you are making externally hosted packages second class citizens in Python land by requiring extra options even for package listings that are perfectly safe to download from other servers.
As mentioned before: I can understand that you want to make downloads safe for users, but if a package is hosted on some other reliable servers and pip can check that it's a valid package, there's little to argue for not allowing such downloads.
Except the fact that the only people I’ve ever seen *happy* that people can host packages externally are a handful (<5) of people (tbh primarily you and Stefan) and the feedback I get from every other person around the web in unequivocally yes please get rid of externally hosted files, they’ve done nothing but cause me pain.
It’s not reasonable to allow a minority of users to negatively impact the majority.
I think you're getting this wrong: There *are* package authors who would like to host PyPI indexed packages external to PyPI and they all have good reasons to do so. It may well be a minority, but that's fine. Changing the default to allow secure external downloads is *not* impacting any majority. At worst, it's impacting the users of those packages, but that's really within the realm and responsibility of the package authors, not PyPI or the infrastructure team, so I don't understand why you are strongly objecting this.
There are plenty good reasons not to do this, and sometimes it's even impossible if you want to stay legal (see the PEP for details).
I re-read the reasons, I'm not sure I really agree with most (or all?) of them however if people really want to do it, then there is nothing stopping them. It's their choice and people on the *other* side of that who are installing these packages also get to make a choice if they want to allow it or not.
You don't have to agree with those reasons. Fact is, they exist, prevent package authors from uploading to PyPI and we as Python developers should respect those reasons.
No I disagree that they are reasons at all. Half of the ones you enumerated are nonsensical or irrelevant, the other half can be fixed by adding features to or fixing PyPI. One or two read like reasons why someone might actually decide not to upload something to PyPI but which that reason isn’t really all that reasonable and finally a grand total of one or two of them look like an actual legit reasons and it only applies to Crypto software.
I guess continuing this discussion doesn't really make any sense. I'm trying to respect your reasoning, but you are apparently not respecting mine. Without such respect there's no basis for a discussion. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 09 2014)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Fri, 09 May 2014 11:39:02 -0400, Donald Stufft
On May 9, 2014, at 9:58 AM, M.-A. Lemburg
wrote: On 09.05.2014 13:44, Donald Stufft wrote:
On May 9, 2014, at 4:12 AM, M.-A. Lemburg
wrote: I snipped the rest of the discussion and reliability, using unmaintained packages and projects using their own mirrors (which should really be the standard, not an exceptional case), because it's not really leading anywhere: Using your own mirror shouldn’t be the standard if all you’re doing is automatically updating that mirror. It’s a hack to get around unreliability and it should be seen of as a sign of a failure to provide a service that people can rely on and that’s how I see it. People depend on this service and it’s irresponsible to not treat it as a critical piece of infrastructure.
I don't understand this. Why it is our responsibility to provide a free service for a large project to repeatedly download a set of files they need? Why does it not make more sense for them to download them once, and only update their local copies when they change? That's almost completely orthogonal to making the service we do provide reliable. For perspective, Gentoo requests that people only do an emerge sync at most once a day, and if they have multiple machines to update, that they only do one pull, and they update the rest of their infrastructure from their local copy. As another point of information for comparison, Gentoo downloads files from wherever they are hosted first, and only if that fails falls back to a Gentoo provided mirror (if I remember correctly...I think the Gentoo mirror copy doesn't always exist?). --David
On May 9, 2014, at 1:28 PM, R. David Murray
On Fri, 09 May 2014 11:39:02 -0400, Donald Stufft
wrote: On May 9, 2014, at 9:58 AM, M.-A. Lemburg
wrote: On 09.05.2014 13:44, Donald Stufft wrote:
On May 9, 2014, at 4:12 AM, M.-A. Lemburg
wrote: I snipped the rest of the discussion and reliability, using unmaintained packages and projects using their own mirrors (which should really be the standard, not an exceptional case), because it's not really leading anywhere: Using your own mirror shouldn’t be the standard if all you’re doing is automatically updating that mirror. It’s a hack to get around unreliability and it should be seen of as a sign of a failure to provide a service that people can rely on and that’s how I see it. People depend on this service and it’s irresponsible to not treat it as a critical piece of infrastructure.
I don't understand this. Why it is our responsibility to provide a free service for a large project to repeatedly download a set of files they need? Why does it not make more sense for them to download them once, and only update their local copies when they change? That's almost completely orthogonal to making the service we do provide reliable.
Well here’s the thing right. The large projects repeatedly downloading the same set of files is a canary. If any particular project goes uninstallable on PyPI (or if PyPI itself goes down) then nobody can install it, the people installing things over and over every day or the people who just happened to be installing it during that downtime. However intermittent failures and general insatiability is going to be noticed by the projects who install things over and over again quicker and easier and thus it becomes a lot easier to use them as a general gauge for what the average “uptime” is. IOW if PyPI goes unavailable for 10 minutes 5 times a day, you might get a handful of “small” installers (e.g. not the big projects) in each downtime, but a different set who are likely to shrug it off and just call treat it as the norm even though it’s very disruptive to what they’re doing. However the big project is highly likely to hit every single one of those downtimes and be able to say “wow PyPI is flaky as hell”. To expand further on that if we assume that we want ``pip install <foo>`` to be reliable and not work sometimes and work at other times then we’re aiming for as high as uptime as possible. PyPI gets enough traffic that any single large project isn’t a noticeable drop in our bucket and due to the way our caching works it actually helps us to be faster and more reliable to have people constantly hitting packages because they’ll be in cache and able to be served without hitting the Origin servers. Just for the record, PyPI gets roughly 350 req/s basically 24/7, in the month of April we served 71.4TB of data with 877.4 million requests of which 80.5% never made it to the actual servers that run PyPI and were served directly out of the geo distributed CDN that sits in front of PyPI. We are vastly better positioned to maintain a reliable infrastructure than ask that every large project that uses Python to do the same. The reason that it’s our responsibility for providing it is because we chose to provide it. There isn’t a moral imperative to run PyPI, but running PyPI badly seems like a crummy thing to do.
For perspective, Gentoo requests that people only do an emerge sync at most once a day, and if they have multiple machines to update, that they only do one pull, and they update the rest of their infrastructure from their local copy.
To be clear, there are other reasons to run a local mirror but I don’t think that it’s reasonable to expect anyone who wants a reliable install using pip to stand up their own infrastructure. Further to this point here I’m currently working on adding caching by default for pip so that we minimize how often different people hit PyPI and we do it automatically and in a way that doesn’t generally require people to think about it and that also doesn’t require them to stand up their own infra.
As another point of information for comparison, Gentoo downloads files from wherever they are hosted first, and only if that fails falls back to a Gentoo provided mirror (if I remember correctly...I think the Gentoo mirror copy doesn't always exist?).
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 5/9/2014 2:12 PM, Donald Stufft wrote:
On May 9, 2014, at 1:28 PM, R. David Murray
wrote:
I don't understand this. Why it is our responsibility to provide a free service for a large project to repeatedly download a set of files they need? Why does it not make more sense for them to download them once, and only update their local copies when they change? That's almost completely orthogonal to making the service we do provide reliable.
Well here’s the thing right. The large projects repeatedly downloading the same set of files is a canary. If any particular project goes uninstallable on PyPI (or if PyPI itself goes down) then nobody can install it, the people installing things over and over every day or the people who just happened to be installing it during that downtime. However intermittent failures and general insatiability is going to be noticed by the projects who install things over and over again quicker and easier and thus it becomes a lot easier to use them as a general gauge for what the average “uptime” is.
I have had the same question as David, so I also appreciate your answer.
IOW if PyPI goes unavailable for 10 minutes 5 times a day, you might get a handful of “small” installers (e.g. not the big projects) in each downtime, but a different set who are likely to shrug it off and just call treat it as the norm even though it’s very disruptive to what they’re doing. However the big project is highly likely to hit every single one of those downtimes and be able to say “wow PyPI is flaky as hell”.
To expand further on that if we assume that we want ``pip install <foo>`` to be reliable and not work sometimes and work at other times then we’re aiming for as high as uptime as possible. PyPI gets enough traffic that any single large project isn’t a noticeable drop in our bucket and due to the way our caching works it actually helps us to be faster and more reliable to have people constantly hitting packages because they’ll be in cache and able to be served without hitting the Origin servers.
Just for the record, PyPI gets roughly 350 req/s basically 24/7, in the month of April we served 71.4TB of data with 877.4 million requests of which 80.5% never made it to the actual servers that run PyPI and were served directly out of the geo distributed CDN that sits in front of PyPI. We are vastly better positioned to maintain a reliable infrastructure than ask that every large project that uses Python to do the same.
The reason that it’s our responsibility for providing it is because we chose to provide it. There isn’t a moral imperative to run PyPI, but running PyPI badly seems like a crummy thing to do.
Agreed.
For perspective, Gentoo requests that people only do an emerge sync at most once a day, and if they have multiple machines to update, that they only do one pull, and they update the rest of their infrastructure from their local copy.
To be clear, there are other reasons to run a local mirror but I don’t think that it’s reasonable to expect anyone who wants a reliable install using pip to stand up their own infrastructure.
Ok, you are not saying that caching is bad, but that having everyone reinvent caching, and possibly doing it badly, or at least not in thebest way, is bad.
Further to this point here I’m currently working on adding caching by default for pip so that we minimize how often different people hit PyPI and we do it automatically and in a way that doesn’t generally require people to think about it and that also doesn’t require them to stand up their own infra.
This seems like the right solution. It would sort of make each machine a micro-CDN node. -- Terry Jan Reedy
On May 9, 2014, at 4:20 PM, Terry Reedy
On 5/9/2014 2:12 PM, Donald Stufft wrote:
On May 9, 2014, at 1:28 PM, R. David Murray
wrote: I don't understand this. Why it is our responsibility to provide a free service for a large project to repeatedly download a set of files they need? Why does it not make more sense for them to download them once, and only update their local copies when they change? That's almost completely orthogonal to making the service we do provide reliable.
Well here’s the thing right. The large projects repeatedly downloading the same set of files is a canary. If any particular project goes uninstallable on PyPI (or if PyPI itself goes down) then nobody can install it, the people installing things over and over every day or the people who just happened to be installing it during that downtime. However intermittent failures and general insatiability is going to be noticed by the projects who install things over and over again quicker and easier and thus it becomes a lot easier to use them as a general gauge for what the average “uptime” is.
I have had the same question as David, so I also appreciate your answer.
IOW if PyPI goes unavailable for 10 minutes 5 times a day, you might get a handful of “small” installers (e.g. not the big projects) in each downtime, but a different set who are likely to shrug it off and just call treat it as the norm even though it’s very disruptive to what they’re doing. However the big project is highly likely to hit every single one of those downtimes and be able to say “wow PyPI is flaky as hell”.
To expand further on that if we assume that we want ``pip install <foo>`` to be reliable and not work sometimes and work at other times then we’re aiming for as high as uptime as possible. PyPI gets enough traffic that any single large project isn’t a noticeable drop in our bucket and due to the way our caching works it actually helps us to be faster and more reliable to have people constantly hitting packages because they’ll be in cache and able to be served without hitting the Origin servers.
Just for the record, PyPI gets roughly 350 req/s basically 24/7, in the month of April we served 71.4TB of data with 877.4 million requests of which 80.5% never made it to the actual servers that run PyPI and were served directly out of the geo distributed CDN that sits in front of PyPI. We are vastly better positioned to maintain a reliable infrastructure than ask that every large project that uses Python to do the same.
The reason that it’s our responsibility for providing it is because we chose to provide it. There isn’t a moral imperative to run PyPI, but running PyPI badly seems like a crummy thing to do.
Agreed.
For perspective, Gentoo requests that people only do an emerge sync at most once a day, and if they have multiple machines to update, that they only do one pull, and they update the rest of their infrastructure from their local copy.
To be clear, there are other reasons to run a local mirror but I don’t think that it’s reasonable to expect anyone who wants a reliable install using pip to stand up their own infrastructure.
Ok, you are not saying that caching is bad, but that having everyone reinvent caching, and possibly doing it badly, or at least not in thebest way, is bad.
Yea, caching isn’t in general a bad thing, and actually PyPI uses it heavily. All access to /simple/ and /packages/ is cached for 24 hours by our CDN unless someone uploads or deletes a file, in which case we selectively purge those URLs from the CDN cache so that they (nearly) instantly get the updated results. Warehouse (aka PyPI 2.0) is designed to utilize our CDN cache even further and I’m hoping to get our cache rate even higher using it.
Further to this point here I’m currently working on adding caching by default for pip so that we minimize how often different people hit PyPI and we do it automatically and in a way that doesn’t generally require people to think about it and that also doesn’t require them to stand up their own infra.
This seems like the right solution. It would sort of make each machine a micro-CDN node.
Yes, it’s bog standard HTTP stuff just like a browser does it. The major difference is we’re limiting the maximum lifetime of a cache item in the client (pip) for the index pages but we are not doing that for the package files themselves. This is done to prevent a misconfigured server from causing pip to not see new versions for hours/days/weeks/years/whatever. Additionally this change also includes making pip smarter about HTTP requests in that if we have a stale item in the cache which as a Last-Modified or an ETag header we’ll do a conditional GET which will hopefully be returned with an HTTP 304 Not Modified so that we can simply refresh the stale item in the cache and use it again instead of needing to download an entire response body again. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 05/08/2014 02:02 PM, Paul Moore wrote:
Socially, this change does not seem to be having the effect of persuading more package developers to host on PyPI. The stick doesn't appear to have worked, maybe we should be trying to find a carrot? Or maybe we have to accept that some developers have sound reasons for not hosting on PyPI and work with them to find an acceptable compromise? Has anyone checked what Stefan's reasons are for not hosting cdecimal on PyPI? Do they represent a use case that the PEP hasn't considered?
Well, I do host a small handful of modules on PyPI, but I can say that some of my pain points are: - getting a good name: the obvious ones are taken, so the search begins to find a name that is not taken and yet still feels at least somewhat appropriate: my OO path module ended up being called strpath (dumb name); eventually changed to antipathy my script param line parser is called scription (okay name) my enum backport is called enum34 (blah) my dbf package is called dbf (lucky lucky lucky!) - reputation of PyPI is not that great: anybody can upload packages, no curating is done, once there the name is taken for all time, no standard version numbering, . . . (granted, these are my impressions and may not be entirely valid) Anyway, I know it's a volunteer effort I don't want to criticize those actually running it, but these are some of the problems that I see with the service itself. -- ~Ethan~
On Fri, May 09, 2014 at 06:09:28AM -0700, Ethan Furman
On 05/08/2014 02:02 PM, Paul Moore wrote: Well, I do host a small handful of modules on PyPI, but I can say that some of my pain points are:
- getting a good name: the obvious ones are taken, so the search begins to find a name that is not taken and yet still feels at least somewhat appropriate:
my OO path module ended up being called strpath (dumb name); eventually changed to antipathy
my script param line parser is called scription (okay name)
my enum backport is called enum34 (blah)
my dbf package is called dbf (lucky lucky lucky!)
For many reasons I avoid github/gitorious/bitbucket, but there is one thing they do right -- two-levels namespaces. "Namespaces are one honking great idea -- let's do more of those!" Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
Donald Stufft
hosted packages are brittle and more prone to failure. Every single external server adds *another* SPOF into any particular install set. Even if every external server has a 99.9% uptime, when you combine multiple of them the total uptime of any particular set of requirements drops quickly. This has been a major complaint that people have had over time.
We have been through that many times; to me this is an indication that people are using pip under circumstances when they should not. pip is not apt-get. [I am aware that *you* know that, just stating it again for the broader audience.]
2) Last time I looked, access credentials (via "lost login") were sent out in plaintext.
The existence of other security issues is not an excuse to not fix a security issue. There are other problems and we're slowly working on trying to clear them out.
It is, however, a reason to avoid error messages that could *imply* (rightly or wrongly) to users that the combination of pip and internal packages is safe.
3) AFAIK people can upload a different (malicious) version of a package with *the exact same name*.
Yes, a malicious author is needfully outside of the threat model for PyPI/pip.
How so? I'm avoiding this attack by publishing sha256sums at release time. The point is that I *cannot* change cdecimal-2.3.tar.gz without a user digging up a checksum mismatch from the mailing list archives.
6) D.J. Bernstein, who is somewhat security minded, has been shipping his software *for years* with just plain HTTP and published checksums.
Argument from authority doesn't really hold up very well when DJB doesn't distribute is software in a way that is intended to be consumed mechanically. Also while he may be a crypto expert he is far from an expert on successfully distributing software, unless you also think that the signature checking in most OS provided package managers is pointless.
That is sort of a strawman. The whole point *is* that certain distribution models don't lend themselves to mechanical consumption. I cannot speak for DJB, perhaps he is just thinking that GPG signing is pointless if users can't validate the signature and SSL is pointless because one cannot trust governments. OS package signing is useful since the packages are curated. If anyone could upload a package to Debian, whereupon it would be signed with the official key, apt-get would lose its usefulness. Stefan Krah
On May 8, 2014, at 11:19 AM, Stefan Krah
Donald Stufft
wrote: hosted packages are brittle and more prone to failure. Every single external server adds *another* SPOF into any particular install set. Even if every external server has a 99.9% uptime, when you combine multiple of them the total uptime of any particular set of requirements drops quickly. This has been a major complaint that people have had over time.
We have been through that many times; to me this is an indication that people are using pip under circumstances when they should not. pip is not apt-get.
[I am aware that *you* know that, just stating it again for the broader audience.]
2) Last time I looked, access credentials (via "lost login") were sent out in plaintext.
The existence of other security issues is not an excuse to not fix a security issue. There are other problems and we're slowly working on trying to clear them out.
It is, however, a reason to avoid error messages that could *imply* (rightly or wrongly) to users that the combination of pip and internal packages is safe.
3) AFAIK people can upload a different (malicious) version of a package with *the exact same name*.
Yes, a malicious author is needfully outside of the threat model for PyPI/pip.
How so? I'm avoiding this attack by publishing sha256sums at release time. The point is that I *cannot* change cdecimal-2.3.tar.gz without a user digging up a checksum mismatch from the mailing list archives.
Practically speaking exactly zero people will ever bother to dig up the checksum from a mailing list archive, or even verify the hash that you have right on PyPI itself. Worse security that people actually use is infinitely better than better security that nobody uses.
6) D.J. Bernstein, who is somewhat security minded, has been shipping his software *for years* with just plain HTTP and published checksums.
Argument from authority doesn't really hold up very well when DJB doesn't distribute is software in a way that is intended to be consumed mechanically. Also while he may be a crypto expert he is far from an expert on successfully distributing software, unless you also think that the signature checking in most OS provided package managers is pointless.
That is sort of a strawman. The whole point *is* that certain distribution models don't lend themselves to mechanical consumption. I cannot speak for DJB, perhaps he is just thinking that GPG signing is pointless if users can't validate the signature and SSL is pointless because one cannot trust governments.
OS package signing is useful since the packages are curated. If anyone could upload a package to Debian, whereupon it would be signed with the official key, apt-get would lose its usefulness.
People are going to mechanically consume them. You can tell them it’s wrong all you want but they are going to do it.
Stefan Krah
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
participants (13)
-
Antoine Pitrou
-
Chris Angelico
-
Donald Stufft
-
Ethan Furman
-
Ian Cordasco
-
M.-A. Lemburg
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
R. David Murray
-
Stefan Krah
-
Terry Reedy
-
Victor Stinner