[Python-ideas] Python's Source of Randomness and the random.py module Redux

Donald Stufft donald at stufft.io
Tue Sep 15 02:14:32 CEST 2015

On September 14, 2015 at 6:39:28 PM, Paul Moore (p.f.moore at gmail.com) wrote:
> (The rest of your emails, I'm going to read fully and digest before
> responding. Might take a day or so.)
> On 14 September 2015 at 21:36, Donald Stufft wrote:
> > I think maybe a problem here is a difference in how we look at the data. It
> > seems that you might focus on the probability of you personally (or the things
> > you work on) getting attacked and thus benefiting from these changes, whereas
> > I, and I suspect the others like me, think about the probability of *anyone*
> > being attacked.
> This may be true, in some sense. But I'm not willing to accept that
> you are thinking about everyone, but I'm somehow selfishly only
> thinking of myself. If that's what you were implying, then frankly
> it's a pretty offensive way of disregarding my viewpoint. Knowing you,
> I'm sure that's *not* how you meant it - but do you see how easy it is
> for the way you word something to make it nearly impossible for me to
> see past your wording to get to the actual meaning of what you're
> trying to say? I didn't even consciously notice the implication
> myself, at first. I simply started writing a pretty argumentative
> rebuttal, because I felt that somehow I needed to correct what you
> said, but I couldn't quite say why.

No, I don’t mean it in the way of you being selfish. I'm not quite sure the
right wording here, essentially the probably of an event happening to a
particular indivdual vs the probablity of an event occuring at all. To use your
lottery example, I *think*, and perhaps I'm wrong, that you're looking at it in
terms of, the chance of any particular person participating in the lottery
winning the lottery is low, so why should each of these people, as an invidual
make plans for how to get the money when they win the lottery, because as
inviduals they are unlikely to win. Whereas I flip it around and think, that
someone, somewhere is likely going to win the lottery, so the lottery system
should make plans for how to get them the money when they win.

I'm not sure the right "name" for each type, and I don't want to continue to
try and hamfist it, because I don't mean it in an offensive or an "I'm better
than you" way and I fear putting my foot in my mouth again :(

> Looking at the reality of what I focus on, I'd say it's more like
> this. I mistrust arguments that work on the basis that "someone,
> somewhere, might do X bad thing, therefore we must all pay cost Y".
> The reasons are complex (and I don't know that I fully understand all
> of my thought processes here) but some aspects that immediately strike
> me are:
> * The probability of X isn't really quantified. I may win the lottery,
> but I don't quit my job - the probability is low. The probability of X
> matters.
> * My experience of the probability of X happening varies wildly from
> that of whoever's making the point. Who is right? Why must one of us
> "win" and be right? Can't it simply be that my data implies that over
> the full data set, the actual probability of X is lower than you
> thought?
> * The people paying cost Y are not the cause of, nor are they impacted
> by, X (except in an abstract "we all suffer if bad things happen"
> sense). I believe in the general principle of "you pay for what you
> use", so to me you're arguing for the wrong people to be made to pay.
> Hopefully, those are relatively objective measures. More subjectively,
> * It's way too easy to say "if X happens once, we have a problem". If
> you take the stance that we have to prevent X from *ever* happening,
> you allow yourself the freedom to argue with vague phrases like
> "might", while leaving the burden of absolute proofs on me. (In the
> context of RNG proposals, this is where arguments like "let's
> implement a secure secret library" get dismissed - they still leave
> open the possibility of *someone* using an inappropriate RNG, so "they
> don't solve the issue" - even if they reduce the chance of that
> happening by a certain amount - and neither you nor I can put a figure
> on how much, so let's not try).

Just to be clear, I don’t think that "If X happens once, it's a problem" is a
reasonable belief and I don't personally have that belief. It's a sliding scale
where we need to figure out where the right solution for Python is for each
particular problem. I certainly wouldn't want to use a language that took the
approach that if X can ever happen, we need to prevent X. I have seen a number
of users incorrectly use the random.py module to where I think that the danger
is "real". I also think that, if this were a brand new module, it would be a
no brainer (but perhaps I'm wrong) for the default, module level to have a safe
by default API. Going off that assumption then I think the question is really
just "Is it worth it?" not "does this make more sense then the current?".

> * There's little evidence that I can see of preventative security
> measures having improved things. Maybe this is because it's an "arms
> race" situation, and keeping up is all we can hope for. Maybe it's
> because it's hard to demonstrate a lack of evidence, so the demand for
> evidence is unreasonable. I don't know.

By preventive security measures, do you mean things like PEP 466? I don't quite
know how to accurately state it, but I'm certain that PEP 466 directly improved
the security of the entire internet (and continues to do so as it propagates).

> * For many years I ran my PC with no anti-virus software. I never got
> a virus. Does that prove anything? Probably not. The anti-virus
> software on my work PC is the source of *far* more issues than I have
> ever seen caused by a virus. Does *that* prove anything? Again,
> probably not. But my experience with at least *that* class of pressure
> to implement security is that the cure is worse than the disease.
> Where does that leave the burden of proof? Again, I don't know, but my
> experience should at least be considered as relevant data.

Antivirus is a particularly bad example of security software :/ It's a massive
failing of the security industry that they exist in the state they do. There's
a certain bias here though, because it is the job of security sensitive code
to "break" things (as in, take otherwise valid input and make it not work). In
an ideal world, security software just sits there doing "nothing" from the
POV of someone who isn't a security engineer and then will, often through no
fault of their own, pop and and make things go kabloom because it detected
something insecure happening. This means that for most people, the only
interaction they have with something designed to protect them, is when it steps
in to make things stop working.

It is relevant data, but I think it goes back to the different way of looking
at things (what is the individual chance of an event happening, vs the chance
of an event happening across the entire population). This might also be why
you'll see the backwards compat folks focus more on experienced driven data and
security folks focus more on hypotheticals about what could happen.

> * Everyone I have ever encountered in a work context (as opposed to in
> open-source communities) seems to me to be in a similar situation to
> mine. I believe I'm speaking for them, but because it's a
> closed-source in house environment, I've got no public data to back my
> comments.
> And totally subjective,
> * I'm extremely tired of the relentless pressure of "we need to do X,
> because security". While the various examples of X may all have ended
> up being essentially of no disadvantage to me, feeling obliged to
> read, understand, and comment on the arguments presented every time,
> gets pretty wearing.

I'm not sure what to do about this :(

On one side, you're not obligated to read, understand, and comment on every
thing that's raised but I totally understand why you do, because I do too, but
I'm not sure how to help this without saying that people who care about
security shouldn't bring it up either?

> * I can't think of a single occasion where we *don't* do X. That may
> well be confirmation bias, but again subjectively, it feels like
> nobody's listening to the objections. I get that the original
> proposals get modified, but if never once has the result been "you're
> right, the cost is too high, we'll not do X" then that puts
> security-related proposals in a pretty unique position.

Off the top of my head I remember the on by default hash randomization for
Python 2.x (or the actual secure hash randomization since 2.x still has the one
that is trivial to recover the original seed).

I don't actually remember that many cases where python-dev choose to broke
backwards compatability for security. The only ones I can think of are:

* The hash randomization on Python 3.x (sort of? Only if you depended on dict
  ordering, which wasn't a guarentee anyways).
* The HTTPS improvements where we switched Python to default to default to
  verifying certificates.
* The backports of several security features to 2.7 (backport of 3.4's ssl
  module, hmac.compare_digest, os.urandom's persistent FD, hashlib.pbkdf2_hmac,
  hashlib.algorithms_guaranteed, hashlib.algorithms_available).

There are probably things that I'm not thinking of, but the hash randomization
only broke things if you were depending on dict/set having ordering which isn't
a promised property of dict/set. The backports of security features was done in
a pretty minimally invasive way where it would (ideally) only break things if
you relied on those names *not* existing on Python 2.7 (which was a nonzero
but small set). The HTTPS verification is the main thing I can think of where
python-dev actually broke backwards compatibility in an obvious way for people
relying on something that was documented to work a particular way.

Are there example I'm not remembering (probably!)? It doesn't feel like 2 sort
of backwards incompatible changes and 1 backwards incompatible change in the
lifetime of Python is really that much to me?

Is there some cross over between distutils-sig maybe? I've focused a lot more
on pushing security on that side of things both because it personally affects
me more and because I think insecure defaults there are a lot worse than
insecure defaults in any particular module in the Python standard library.

> Finally, in relation to that last point, and one thing I think is a
> key difference in our thinking. I do *not* believe that security
> proposals (as opposed to security bug fixes) are different from any
> other type of proposal. I believe that they should be subject to all
> the same criteria for acceptance that anything else is. I suspect that
> you don't agree with that stance, and believe that security proposals
> should be held to different standards (e.g., a demonstrated
> *probability* of benefit is sufficient, rather than evidence of actual
> benefit being needed). But please speak for yourself on this - I'm not
> trying to put words into your mouth, it's just my impression.

Well, I think that all proposals are based on what the probability is it's
going to help some particular percentage of people, and whether it's going to
help enough people to be worth the cost.

What I think is special about security is the cost of *not* doing something.
Security "fails open" in that if someone does something insecure, it's not
going to raise an exception or give different results or something like that.
It's going to appear to "work" (in that you get the results you expect) while
the user is silently insecure. Compare this to, well let's pretend that there
was never a deterministic RNG in the standard library. If a scientist or a
game designer inappropiately used random.py they'd pretty quickly learn that
they couldn't give the RNG a seed, and that even if it was a CSPRNG that had
an "add_seed" method that might confuse them it'd be pretty obvious on the
second execution of their program that it's giving them a different result.

I think that the bar *should* be lower for something that just silently or
subtlety does the "wrong" thing vs something that obviously and loudly does
the wrong thing. Particularly when the downside of doing the "wrong" thing
is as potentionally disasterous as it is with security.

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Python-ideas mailing list