[Python-ideas] Python's Source of Randomness and the random.py module Redux

Tue Sep 15 14:27:44 CEST 2015

On September 15, 2015 at 7:04:52 AM, Paul Moore (p.f.moore at gmail.com) wrote:
> On 14 September 2015 at 23:39, Paul Moore wrote:
> > (The rest of your emails, I'm going to read fully and digest before
> > responding. Might take a day or so.)
>  
> Point by point responses exhaust and frustrate me, and don't really serve much
> purpose other than to perpetuate the debate. So I'm going to make some final
> points, and then stop. This is based on having read the various emails
> responding to my earlier comments. If it looks like I haven't read something,
> please assume I have but either you didn't get your point across, or maybe I
> simply don't agree with you.
>  
> Why now?
> --------
>  
> First of all, the big question for me is why now? The random module has been
> around in its current form for many, many years. Security issues are not new,
> maybe they are slowly increasing, but there's been no step change. The only
> thing that seems to have changed is that someone (Theo) has drawn attention to
> the random module.
>  
> So I feel that the onus is on the people proposing change to address that.
> Show me the evidence that we've had an actual problem for many years, and
> demonstrate that it's a good job we spotted it at last, and now have a chance
> to fix it. Explain to me what has been going wrong all these years that I'd
> never even noticed. Arguments that people are misusing the module aren't
> sufficient in themselves - they've (presumably) been doing that for years. In
> all that time, who was hacked? Who lost data? As a result of random.random
> being a PRNG rather than being crypto-secure?
>  
> I'm not asking for an unassailable argument, just acknowledgement that it's
> *your* job to address that question, and not mine to persuade you that "we've
> been alright so far" is a compelling reason to reject your proposal.

The answer to "Why Now?"" is basically because someone brought it up. I realize
that's a pretty arbitrary thing but I'm not sure what answer would even be
acceptable here. When is an OK time to do it in your eye? Is it only after
there is a public, known attack against the RNG? Is it only when the module is
first being added?

The sad state of affairs is that it's only been relatively recently that our
industry as a whole has really taken security seriously so there is a lot of
things out there that are not well designed from a security POV. We can't go
back in time and change the original mistake, but we can repair it going into
the future.

>  
> Incorrect code on SO etc
> ------------------------
>  
> As regards people picking up insecure code snippets from the internet and
> using them, there's no news there. I can look round and find hundreds of bits
> of incorrect code in any area you want. People copy/paste garbage code all the
> time. To my embarassment, I've done it myself in the past :-(
>  
> But I'm reminded of https://xkcd.com/386/ - "somebody is wrong on the
> internet!"
>  
> This proposal, and in particular the suggestion that we need to
> retrospectively make the code snippets quoted here secure, strikes me as a
> huge exercise in trying to correct all the people who are wrong on the
> internet. There's certainly value in "safe by default" APIs, I don't disagree
> with that, but I honestly fail to see how quoting incorrect code off the
> internet is a compelling argument for anything.

The argument is basically that security is an important part of API design, and
that if you look at what people are doing in practice, it gives you an idea of
how people think they should use the API. It's kind of like looking at a
situation like this: https://i.imgur.com/0gnb7Us.jpg and concluding that maybe
we should pave that worn down footpath, because people are going to use it
anyways.

>  
> Millions of users are affected
> ------------------------------
>  
> The numbers game is also a frustrating exercise here. We keep hearing that
> "millions of users are affected by bad code", that scans of Google almost
> immediately find sites with vulnerabilities.
>  
> But I don't see anyone pointing at a single documented case of an actual
> exploit caused by Python's random module. There's no bug report. There's no
> security alert notification.

So a big part of this is certainly preventative. It's a fairly relatively
recent development that hacking went from indivduals or small teams doing it
to big targets to a business on it's own. There are literally giant office
complexes in places like Russia and China filled with employees in cubicles,
but they aren't writing software like at a normal company, they are just
trawling around the internet, looking for targets, trying to expand botnets
looking for anything and everything they can get their hands on.

It's also true that there isn't going to be a big fanfaire for *most* actual
hacked computers/sites. Most of the time the people running the site simply
won't ever know, they'll just be silently hosting malware or having their
user's passwords being fed into other sites. It's very few exploits that
actually get noticed and when noticed it's unlikely they get public attention.

I'd also suggest that for changes like these, if someone was exploited by this
they'd probably look at the documentation for random.py and see that they were
accidently using the module wrong, and then blame themselves and not ever
bother to file a bug report. It is my opinion that it's not really their fault
that the API lead them to believe that what they were doing was right.

>  
> How are those millions of users affected? Their level of risk is increased?
> Who can tell that? Are any of the sites identified holding personal data? Not
> all websites on the internet are *worth* hacking.

Actually, all sites on the internet *are* worth hacking, depending on what you
call hacking. Malware is constantly being hosted on tiny sites that most
wouldn't call "worth" hacking, but malware authors were able to hack in some
way and then they uploaded their malware there. If there are user logins it's
likely that people reused username and passwords, so if you can get the
passwords from one smaller site, it's possible you can use that as a door into
a larger, more important site. Plus, there's also the desire for botnets to
add more and more nodes into their swarm, they don't care what site you're
hosting, they just want the machine.

One key problem to the security of the internet as a whole is that there are a
lot of small sites without dedicated security teams, or anyone who really knows
security at all. These are easy targets for people and most languages and
libraries make it far too easy for people to do the wrong thing.

>  
> And I feel that expressing that view is somehow frowned on. That "it doesn't
> matter" is an unacceptable view to hold. And so, the responses to my questions
> feel personal, they feel like criticisms of me personally, that I'm being
> unprofessional. I don't want to make this a big deal, but the code of conduct
> says "we're tactful when approaching differing views", and it really doesn't
> feel like that.
>  
> I understand that the whole security thing is a numbers game. And that it's
> about assessing risk. But what risk is enough to trigger a response? A 10%
> increased chance of any given website being hacked? 5%? 1%? Again, I'm not
> asking to use the information to veto a change. I'm asking to *understand
> your position*. To better assess your arguments, so that I can be open to
> persuasion, and to *agree* with you, if your arguments are sound.

It's basically a gut feeling since we can't get any hard data here. Things like
being able to look online and find code in the wild that does this wrong within
minutes gives us an idea at how likely it is as well as reasoning about what
people who don't know what the difference is between ``random.random()`` and
``random.SystemRandom().random()`` as well as just a little bit of guessing
based on experience with similar situations.

Another input into this equation is how much it's likely that this change would
break someone and once broken, how easy it will be to fix things.

I sadly can't give anything more specific than that here, because it's a bit of
an artform crossed with personal biases :(

>  
> Furthermore, should we not take into account other languages and approaches at
> this point? Isn't PHP a well-known "soft target"? Isn't phishing and social
> engineering the best approach to hacking these days, rather than cracking
> RNGs? I don't know, and I look to security experts for advice here. So please
> explain for me, how are you assessing the risks, and why do you judge this
> specific risk high enough to warrant a response?
>  
> The impression I get is that the security view is that *any* risk, no matter
> how small, once identified, warrants a response. "Do nothing" is never an
> option. If that's your position, then I'm sorry, but I simply don't agree with
> you. I don't want to live in a world that paranoid, and I'm unsure how to get
> past this point to have a meaningful dialog.

Do nothing is absolutely an option, but most security focused folks don't take
a scorched earth view of security so we often times don't bother to even
mention a possible change unless we think that doing nothing is the wrong
answer. An example going back to PEP 476 where we enabled TLS verification by
default on HTTPS, we limited it to *only* HTTPS even though TLS is used by
many other protocols because it was our opinion that doing nothing for those
protocols was the right call. Those are protocols are still insecure by
default, but doing something about that by default would break too much for us
to be willing to even suggest it.

On top of that, we tend to want to prioritize the things we do try to have
happen, so we focus on things with the smallest fallout or the biggest upsides
and we ignore other things until later.

This is probably why there's some bias that it looks like doing nothing is an
option, because we already self select what we choose to push forward because
we *do* care about backwards compatability too.

>  
> History, and security's "bad rep"
> ---------------------------------
>  
> Donald asked if I was experiencing some level of spill-over from
> distutils-sig, where there has *also* been a lot of security churn (far more
> than here). Yes, I am. No doubt about that. On distutils-sig, and pip in
> particular, it's clear to see a lot of frustration from users with the
> long-running series of security changes. The tone of bug reports is frustrated
> and annoyed. Users want a break from being forced to make changes.’

I think a lot of these changes are paying down technical debt of two decades of
(industry standard) lack of focus on security. It sucks, but when we come out
the other side (because hopefully, new APIs and modules will be better designed
with security in mind given our new landscape) we should hopefully be in a much
better situation.

In the distutils-sig side, I think that PEP 470 was the last breaking change
that I can think of that we'll need to do in the name of security, we've paid
down that particular bit of technical debt, and once that lands we'll have a
pretty decent story. We still have other kinds of techincal debt to pay down
though :(

>  
> Outside of Python, and speaking purely from my own experience in the corporate
> world, security is pretty uniformly seen as an annoying overhead, and a block
> on actually getting the job done. You can dismiss that as misguided, but it's
> a fact. "We need to do this for security" is a direct challenge to people to
> dismiss it as unnecessary, and often to immediately start looking for ways to
> bypass the requirement "so that it doesn't get in the way". I try not to take
> that attitude in this sort of debate, but at the same time, I do try to
> *represent* that view and ask for help in addressing it.
>  
> The level of change in core Python is far less than on distutils-sig, and has
> been relatively isolated from "non-web" areas. People understand (and are
> grateful for) increases in "secure by default" behaviour in code like urllib
> and ssl. They know that these are places where security is important, where
> getting it right is harder than you'd think, and where trusting experts to do
> the hard thinking for you is important.
>  
> But things like hash randomisation and the random module are less obviously
> security related. The feedback from hash randomisation focused on "why did you
> break my code?". It wasn't a big deal, people were relying on undocumented
> behaviour and accepted that, but they did see it as a breakage from a security
> fix. I expect the same to be true with the random module, but with the added
> dimension that we're proposing changing documented behaviour this time.
>  
> As a result of similar arguments applying to every security change, and those
> arguments never *really* seeming to satisfy people, there's a lot of
> reiterated debate. And that's driving interested but non-expert people away
> from contributing to the discussion. So we end up with a lack of checks and
> balances because people without a vested interest in tightening security "tune
> out" of the debates. I see that as a problem. But ultimately, if we can't find
> a better way of running these discussions, I don't know how we fix it. I
> certainly can't continue being devil's advocate every time.

Things don't really satisify people because they often times fundamentally
don't care about security. That is perfectly reasonable, so don't think that I
expect everyone to care about security, but they simply don't. However, In my
opinion we have a moral obligation to try and do what we reasonably can to
protect people. It's a bit like social safety nets, one person might ask why
they are being asked to pay taxes, after all they never needed government
assistance but by asking every citizen to pay in, they can try and help people
from falling through the cracks. This isn't a social safety net, it's a
security safety net.

>  
> Anyway, that's me done on this thread. I hope I've added more benefit than
> cost to the discussion. Thanks to everyone for responding to my questions -
> even if we all felt like we were both just repeating the same thing, it's a
> lot of effort doing so and I appreciate your time.
>  
> Paul
>  

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA