[Python-ideas] Python's Source of Randomness and the random.py module Redux

Mon Sep 14 22:23:17 CEST 2015

On September 14, 2015 at 3:14:45 PM, Paul Moore (p.f.moore at gmail.com) wrote:
>  
> Here's a point - it seems likely that the people arguing for this
> change are of the opinion that I'm not appreciating their position.
> (For the record, I'm not being deliberately obstructive in case anyone
> thought otherwise. In my view at least, I don't understand the
> security guys' position). Assuming that's the case, then I'm probably
> one of the people who needs educating. But I don't feel like anyone's
> trying to educate me, just that I'm being browbeaten until I give in.
>  
> Education != indoctrination.

For the record, I'm not sure what part you don't understand. I'm happy to try
and explain it, but I think I'm misunderstanding what you're not understanding
or something because I personally feel like I did explain what I think you're
misunderstanding.

Part of the problem (probably) here is that there isn't an exact person we're
trying to protect here. The general gist is that if you use the deterministic
APIs in a security sensitive situation, then you may be vulnerable depending
on exactly what you're doing. We think that in particular, the API of the
random module will lead inexperienced or un(der)informed developers to use the
API in situations that it's not appropiate and from that, have an insecure
piece of software they wrote. We're people who think that the defaults of the
software should be "generally" secure (as much so as is reasonable) and that
if you want to do something that isn't safe then you should explicitly opt in
to that (the flipside is, things shouldn't be so locked down as to be unusable
without having to turn off all of the security knobs, this is where the
"generally" in generally secure comes into play).

A particularly nasty side effects of this, is that it's almost never the people
who wrote this software who are harmed by it being broken and it's almost
always their users who didn't have anything to do with it.

So essentially the goal is to try and make it harder for people to accidently
misuse the random module. If that doesn't answer your confusion, if you can
try to reword it to get it through my thick skull better, I'm happy to continue
to try an answer it (on or off list).

>  
> At its best, good security practice should *help* people write
> reliable, easy to use systems. Or at a minimum, not get in the way.
> But the PR message needs always to be "I understand the constraints
> you're dealing with", not "you must do this for your own good".
> Otherwise the "follow the rules until the auditors go away" attitude
> just gets reinforced. Hence my focus on seeing proof that breakages
> are justified *in the context of the target audience I am responsible
> for*.

Right, and this is actually trying to do that. By removing a possibly dangerous
default and making the default safer. Defaults matter a lot in security (and
sadly, a lot of software doesn't have safe defaults) because a lot of software
will never use anything but the defaults.

>  
> Conversely, you're right that I can't force anyone else to try to
> educate people in good security practices, however much better than me
> at it I might think they are. In actual fact, though, I think a lot of
> people do a lot of good work educating others - as I say, most of what
> I've learned has been from lists like these.
>  
> >> Honestly, this type of debate doesn't do the security community much
> >> good - there's too little willingness to compromise, and as a result
> >> the more neutral participants (which, frankly, is pretty much anyone
> >> who doesn't have a security agenda to promote) end up pushed into a
> >> "reject everything" stance simply as a reaction to the black and white
> >> argument style.
> >
> > Except you seem to have missed much of the compromises being discussed
> > and conceded by the security minded folks.
>  
> OK, you have a point - there have been changes to the proposals. But
> there are fundamental points that have (as far as I can see) never
> been acknowledged. As a result, the changes feel less like compromises
> based on understanding each other's viewpoints, and more like repeated
> attempts to push something through, even if it's not what was
> originally proposed. (I *know* this is an emotional position - please
> understand I'm fed up and not always managing to word things
> objectively).

I think part of this is that a lot of the folks proposing these changes are
also sensitive to the backwards compatability needs and have already baked that
into their thoughts. We don't generally come into these with "scorched earth"
suggestions of fixing some situation where security could be improved but
instead try and figure out a decent balance of security and not breaking things
to try and cover most of the ground with as little cost as possible.

My very first email in this particular thread (that started this thread) was
the first one I had with a fully solid proposal in it. The last paragraph in
that proposal asked the question "Do we want to protect users by default?" My
next email presents two possible options depending on which we considered to be
"less" breaking, either deprecating the module scoped functions completely or
change their defaults to something secure and mentioned that if we can't change
the default, the user-land CSPRNG probably isn't a useful addition because it's
benefit is primarily in being able to make it the default option.

I don't see anyone who is talking about making a change not also talking about
what areas of backwards compatibility it would actually break.

I think part of this too is that security is a bit weird, it's not a boolean
property but there are particular bars you need to pass before it's an actual
solution to the problem. So for a lot of us, we'll figure out that bar and draw
a line in the sand and say "If this proposal crosses this line, then doing
nothing is better than doing something" because it'd just be churn for churns
sake at that point. That's why you'll see particular points that we essentially
won't give up, because if they are given up we might as well do nothing. In
this particular instance, the point is that the API of the random module leads
people to use it incorrectly, so unless we address that, we might as well just
leave it alone.

>  
> Specifically, I have been told that I can't argue my "convenience"
> over the weight of all the other people who could fall into security
> traps with the current API. Let's review that, shall we?

I think I was the one who said that to you, and I'd like to explain why I said
it (beyond the fact I was riled up). Essentially I had in my mind something
like what Nick has proposed, which you've said later on you think is relatively 
unobtrusive, and unlikely to cause serious compatibility, which I agree with.
Then I saw you arguing against what I felt was a pretty mundane API break that
was fairly trivial to work around, and it signaled to me that you were saying
that having to type a few extra letters was a bridge too far. This reads to me
like someone saying "Well I know how to use it correctly, it's their own fault
if others don't". I'm not saying that's what you actually think but that's how
it read to me.

>  
> * My argument is that breaking backward compatibility needs to be
> justified. People have different priorities. "Security risks should be
> fixed" isn't (IMO) a free pass. Why should it be? "Windows
> compatibility issues should be fixed" isn't a free pass. "PyPy/Jython
> compatibility issues should be fixed" isn't a free pass. Forcing me to
> adjust my priorities so that I care about security when I don't want
> (or IMO need) to isn't acceptable.

The justification is essentially that it will protect some people with minimal
impact to others. The main impact will be people who actually needed a
deterministic RNG will need to use something like ``random.seeded_random``
instead of just ``random`` and importantly, this will break in a fairly obvious
manner instead of the silently wrong situation for people who are currently
using the top level API incorrectly.

As a bit of a divergence, the "silently wrong" part is why defaults tend to
matter a lot in security. Unless you're well versed in it, most people don't
think about it and since it "works" they don't inquire further. Something that
is security sensitive that always "works" (as in, doesn't raise an error) is
broken which is the inverse of how most people think about software. To put it
another way, it's the job of security sensitive APIs to break things, ideally
only in cases where it's important to break, but unless you're actually testing
that it breaks in those attack scenarios, secure and insecure looks exactly the
same.

> * The security arguments seem to be largely in the context of web
> application development (cookies, passwords, shared secrets, ...)
> That's not the only context that matters.

You're right it's not the only context that matters, however it's often brought
up for a few reasons:

* Security largely doesn't matter for software that doesn't accept or send
 input from some untrusted source which narrows security down to be mostly
 network based applications.

* The HTTP protocol is "eating the world" and we're seeing more and more things
  using it as their communication protocol (even for things that are not
  traditional browser based applications).

* Traditional Web Applications/Sites are a pretty large target audience for
  Python and in particular a lot of the security folks come from that world
  because the web is a hostile place.

But you can replace web application with anything that an untrusted user can
interact with over any protocol and the argument is basically the same.

> * As I said above, in my experience, a compatibility break "to make
> things more secure" is seen as equating security with inconvenience,
> and can actually harm attempts to educate users in better security
> practices.

Sadly, I don't think this is fully resolvable :(

It is the nature of security that it's purpose is to take something that
otherwise "works" and make it no longer work because it doesn't satisfy the
constraints of the security system.

> * In many environments, reproducibility of random streams is
> important. I'm not an expert on those fields, although I've hit some
> situations where seeding is a requirement. As far as I am aware, most
> of those situations have no security implications. So for them, the
> PEP is all cost, no benefit. Sure the cost is small, but it's
> non-zero.

Right, and I don't think anyone is saying this isn't an important use case,
just that if you need a deterministic RNG and you don't get one, that is a
fairly obvious problem but if you need a CSPRNG and you don't get one, that is
not obvious.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA