[Python-ideas] Python's Source of Randomness and the random.py module Redux

Tue Sep 15 13:04:49 CEST 2015

On 14 September 2015 at 23:39, Paul Moore <p.f.moore at gmail.com> wrote:
> (The rest of your emails, I'm going to read fully and digest before
> responding. Might take a day or so.)

Point by point responses exhaust and frustrate me, and don't really serve much
purpose other than to perpetuate the debate. So I'm going to make some final
points, and then stop. This is based on having read the various emails
responding to my earlier comments. If it looks like I haven't read something,
please assume I have but either you didn't get your point across, or maybe I
simply don't agree with you.

Why now?
--------

First of all, the big question for me is why now? The random module has been
around in its current form for many, many years. Security issues are not new,
maybe they are slowly increasing, but there's been no step change. The only
thing that seems to have changed is that someone (Theo) has drawn attention to
the random module.

So I feel that the onus is on the people proposing change to address that.
Show me the evidence that we've had an actual problem for many years, and
demonstrate that it's a good job we spotted it at last, and now have a chance
to fix it. Explain to me what has been going wrong all these years that I'd
never even noticed. Arguments that people are misusing the module aren't
sufficient in themselves - they've (presumably) been doing that for years. In
all that time, who was hacked? Who lost data? As a result of random.random
being a PRNG rather than being crypto-secure?

I'm not asking for an unassailable argument, just acknowledgement that it's
*your* job to address that question, and not mine to persuade you that "we've
been alright so far" is a compelling reason to reject your proposal.

Incorrect code on SO etc
------------------------

As regards people picking up insecure code snippets from the internet and
using them, there's no news there. I can look round and find hundreds of bits
of incorrect code in any area you want. People copy/paste garbage code all the
time. To my embarassment, I've done it myself in the past :-(

But I'm reminded of https://xkcd.com/386/ - "somebody is wrong on the
internet!"

This proposal, and in particular the suggestion that we need to
retrospectively make the code snippets quoted here secure, strikes me as a
huge exercise in trying to correct all the people who are wrong on the
internet. There's certainly value in "safe by default" APIs, I don't disagree
with that, but I honestly fail to see how quoting incorrect code off the
internet is a compelling argument for anything.

Millions of users are affected
------------------------------

The numbers game is also a frustrating exercise here. We keep hearing that
"millions of users are affected by bad code", that scans of Google almost
immediately find sites with vulnerabilities.

But I don't see anyone pointing at a single documented case of an actual
exploit caused by Python's random module. There's no bug report. There's no
security alert notification.

How are those millions of users affected? Their level of risk is increased?
Who can tell that? Are any of the sites identified holding personal data? Not
all websites on the internet are *worth* hacking.

And I feel that expressing that view is somehow frowned on. That "it doesn't
matter" is an unacceptable view to hold. And so, the responses to my questions
feel personal, they feel like criticisms of me personally, that I'm being
unprofessional. I don't want to make this a big deal, but the code of conduct
says "we're tactful when approaching differing views", and it really doesn't
feel like that.

I understand that the whole security thing is a numbers game. And that it's
about assessing risk. But what risk is enough to trigger a response? A 10%
increased chance of any given website being hacked? 5%? 1%? Again, I'm not
asking to use the information to veto a change. I'm asking to *understand
your position*. To better assess your arguments, so that I can be open to
persuasion, and to *agree* with you, if your arguments are sound.

Furthermore, should we not take into account other languages and approaches at
this point? Isn't PHP a well-known "soft target"? Isn't phishing and social
engineering the best approach to hacking these days, rather than cracking
RNGs? I don't know, and I look to security experts for advice here. So please
explain for me, how are you assessing the risks, and why do you judge this
specific risk high enough to warrant a response?

The impression I get is that the security view is that *any* risk, no matter
how small, once identified, warrants a response. "Do nothing" is never an
option. If that's your position, then I'm sorry, but I simply don't agree with
you. I don't want to live in a world that paranoid, and I'm unsure how to get
past this point to have a meaningful dialog.

History, and security's "bad rep"
---------------------------------

Donald asked if I was experiencing some level of spill-over from
distutils-sig, where there has *also* been a lot of security churn (far more
than here). Yes, I am. No doubt about that. On distutils-sig, and pip in
particular, it's clear to see a lot of frustration from users with the
long-running series of security changes. The tone of bug reports is frustrated
and annoyed. Users want a break from being forced to make changes.

Outside of Python, and speaking purely from my own experience in the corporate
world, security is pretty uniformly seen as an annoying overhead, and a block
on actually getting the job done. You can dismiss that as misguided, but it's
a fact. "We need to do this for security" is a direct challenge to people to
dismiss it as unnecessary, and often to immediately start looking for ways to
bypass the requirement "so that it doesn't get in the way". I try not to take
that attitude in this sort of debate, but at the same time, I do try to
*represent* that view and ask for help in addressing it.

The level of change in core Python is far less than on distutils-sig, and has
been relatively isolated from "non-web" areas. People understand (and are
grateful for) increases in "secure by default" behaviour in code like urllib
and ssl. They know that these are places where security is important, where
getting it right is harder than you'd think, and where trusting experts to do
the hard thinking for you is important.

But things like hash randomisation and the random module are less obviously
security related. The feedback from hash randomisation focused on "why did you
break my code?". It wasn't a big deal, people were relying on undocumented
behaviour and accepted that, but they did see it as a breakage from a security
fix. I expect the same to be true with the random module, but with the added
dimension that we're proposing changing documented behaviour this time.

As a result of similar arguments applying to every security change, and those
arguments never *really* seeming to satisfy people, there's a lot of
reiterated debate. And that's driving interested but non-expert people away
from contributing to the discussion. So we end up with a lack of checks and
balances because people without a vested interest in tightening security "tune
out" of the debates. I see that as a problem. But ultimately, if we can't find
a better way of running these discussions, I don't know how we fix it. I
certainly can't continue being devil's advocate every time.

Anyway, that's me done on this thread. I hope I've added more benefit than
cost to the discussion. Thanks to everyone for responding to my questions -
even if we all felt like we were both just repeating the same thing, it's a
lot of effort doing so and I appreciate your time.

Paul