On 4 September 2014 10:00, Ethan Furman email@example.com wrote:
On 09/03/2014 04:36 PM, Antoine Pitrou wrote:
On Thu, 4 Sep 2014 09:19:56 +1000 Nick Coghlan firstname.lastname@example.org wrote:
Python is routinely updated to bugfix releases by Linux distributions and other distribution channels, you usually have no say over what's shipped in those updates. This is not like changing the major version used for executing the script, which is normally a manual change.
We can potentially deal with the more conservative part of the user base on the redistributor side - so long as the PEP says it's OK for us to not apply this particular change if we deem it appropriate to do so.
So people would believe python.org that they would get HTTPS cert validation by default, but their upstream distributor would have disabled it for them? That's even worse...
I agree. If the vendors don't want to have validation by default, they should stick with 2.7.8.
Yes, that's the way it would work in practice - we'd call it 2.7.8, and backport fixes from upstream 2.7.x as needed (as someone put it to me recently, a useful way to think of component version numbers in RHEL is as referring to the *oldest* pieces of code in that component).
I've spent quite a bit more time considering the proposal, and I'm now satisfied that making this change upstream isn't likely to cause any major problems, due to the fact that folks who are likely to suffer from this change aren't likely to even be on 2.7 yet.
While we managed not to be last this time, the RHEL/CentOS ecosystem is still a useful benchmark for the roll out of Python versions into conservative enterprise organisations, and the more conservative users within *that* ecosystem are likely to wait for the x.1 release (at the earliest) rather than upgrading as soon as x.0 is out. RHEL 7.0 only came out in June, so most of those conservative environments are still going to be on Python 2.6 in RHEL 6. While we shipped 2.7 support well before the release of RHEL 7 as part of Software Collections and OpenShift, the kinds of environments where properly validating SSL by default may cause problems aren't likely to be on the leading edge of adopting new approaches to software deployment like SCL and PaaS.
Fixing the HTTPS validation defaults would have several significant positive consequences:
- lowers a major barrier to Python adoption (and continued usage) for public internet focused services - fixes a latent security defect for Python applications operating over the public internet - fixes a latent security defect for Python applications operating in a properly configured intranet environment - reveals a latent security defect for Python applications communicating with improperly configured public internet services - reveals a latent security defect for Python applications operating in improperly configured intranet environments
The debate is thus solely over how to manage the consequences of the last two, since going from "silent failure" to "noisy failure" looks like going from "working" to "broken" to someone that isn't already intimately familiar with the underlying issues.
That question needs to be considered separately for 3 different versions:
- 3.5 - 3.4 - 2.7
3.5 is not controversial, the answer is that the standard library's handling of HTTPS URLs should change to verify certificates properly. No ifs, buts, or maybes - Python 3.5 should automatically verify all HTTPS connections, with explicit developer action required to skip (or otherwise relax) the validation check.
So far, we have assumed that 3.4 will get at most a warning. However, I have changed my mind on that, because Python 3 is still largely an early adopter driven technology (it's making inroads into more conservative environments, but it's still at least a few years away from catching up to Python 2 on that front). As a result, the kinds of environments RDM and I are worried about will generally *not* be using Python 3, or if they are, it will be for custom scripts that they can fix. I wouldn't suggest actually making that change without getting an explicit +1 from the Canonical folks (since 3.4 is in Ubuntu LTS 14.04), but I would now personally be +1 on just *fixing it* in 3.4.2, rather than doing a bunch of additional development work purely so we can make folks wait another year for the Python 3 standard library to correctly handle HTTPS.
That leaves Python 2.7, and I have to say I'm now persuaded that a backport (including any required httplib and urllib features) is the right way to go. One of the tasks I'd been dreading as a follow-on from PEP 466 was organising the code audit to make sure our existing Python 2 applications are properly configuring SSL. If we instead change Python 2.7.9 to validate certificates by default, then the need to do that audit *goes away*, replaced by the far more mundane tasking of doing integration testing on 2.7.9, which we'd have to do *anyway*. Systematically solving the Python 2 HTTPS problem ceases to be something special, and instead just becomes a regular upstream bug fix that will be covered by our normal regression testing processes.
There's also the fact that Python 2.7.9 is becoming, in effect, the 2.8 several folks have been asking for (from a HTTPS perspective, anyway), but done in such a way that it feeds more cleanly into the redistributor channels, rather than having the multi-year lead time (and massive additional overhead) that a 2.8 release would suffer.
On the redistributor side, we (as in Red Hat) *specifically* offer paid services to help users manage the risks associated with this kind of change (for example https://access.redhat.com/support/policy/updates/errata#Extended_Update_Supp...), and we charge them extra for it because the extra complexity it introduces *is* a pain to support.
If, as a vendor, we're not willing to do something like that as part of our base subscription, then I don't think upstream should feel any obligation to do it for free - the whole *point* of redistributors from a community perspective is for us to handle the boring & annoying stuff whenever possible, so that upstream don't need to worry about it so much. Seriously - we *don't want* the extremely high touch users that need lots of reassurance as direct upstream consumers, as meeting their expectations requires such a high level of responsiveness and stress that it simply isn't ethical to ask people to do it for free. When someone genuinely *wants* that kind of customer-vendor relationship, trying to employ a colleague-colleague community style relationship instead ends up being incredibly unpleasant for both sides.
I freely admit that I'm heavily biased on this point - the fact that self-organising community projects are understandably less interested in the boring & annoying bits that some users want or need is ultimately what pays my salary. But this kind of thing is genuinely difficult for a collaborative community driven project to provide, and I think it's reasonable to expect that people who want a slower pace and/or greater selectivity in their upgrades pay for the privilege (or at least rely on a slower moving, freely available, filtered and curated platform environment like CentOS or Ubuntu LTS).
P.S. I actually believe this is also why we see so much open source development being done on Mac OS X these days, and even a growing presence of Windows - at the platform level, many open source devs *don't want* a collegial relationship with a community, they want a commercial relationship with a vendor so their computer "just works". Take the mindset many of us have towards our OS, move up the stack a few more layers, and we can see how many task oriented application developers might feel about programming language runtimes, web frameworks, etc - for many folks, they're just tools that help to get a job done, not communities to participate in (and that's OK - it just means we need to fully consider the benefits of relying on commercial rather than community focused approaches to meeting their needs).