Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

June 9, 2016

...
On Jun 9, 2016, at 7:25 AM, Larry Hastings <larry@hastings.org> wrote:
A problem has surfaced just this week in 3.5.1.  Obviously this is a good time to fix it for 3.5.2.  But there's a big argument over what is "broken" and what is an appropriate "fix".
Couple clarifications:

random.py
---------

In the abstract it doesn't hurt to seed MT with a CSPRNG, it just doesn't
provide much (if any) benefit and in this case it is hurting us because of the
cost on import (which will exist on other platforms as well no matter what we
do here for Linux). There are a couple solutions to this problem:

* Use getrandom(GRND_NONBLOCK) for random.Random since it doesn't matter if we
  get cryptographically secure random numbers or not.

* Switch it to use something other than a CSPRNG by default since it doesn't
  need that.

* Instead of seeding itself from os.urandom on import, have it lazily do that
  the first time one of the random.rand* functions are called.

* Do nothing, and say that ``import random`` relies on having the kernel's
  urandom pool initialized.

Between these options, I have a slight preference for switching it to use a non
CSPRNG, but I really don't care that much which of these options we pick. Using
random.Random is not secure and none of the above options meaningfully change
the security posture of something that accidently uses it.

SipHash and the Interpreter Startup
-----------------------------------

I have complicated thoughts on what SipHash should do. For something like, a
Django process, we never want it to be initialized with “bad” entropy, however
reading straight from /dev/urandom, or getrandom(GRND_NONBLOCK) means that we
might get that if we start the process early enough in the boot process. The
rub here is that I cannot think of a situation where by the time you’re at the
point you’re starting up something like Django, you’re even remotely likely to
not have an initialized random pool. The other side of this issue is that we
have Python scripts which do not need a secure random being passed to SipHash
running early enough in the boot process with systemd that we need to be able
to have SipHash initialization not block on waiting for /dev/urandom.

So I’m torn between the “Practicality beats Purity” mindset, which says we
should just let SipHash seed itself with whatever quality of random from the
urandom pool is currently available and the “Special cases aren’t special
enough to break the rules” mindset which says that we should just make it
easier for scripts in this edge case to declare they don’t care about hash
randomization to remove the need for it (in other words, a CLI flag that
matches PYTHONHASHSEED in functionality). An additional wrinkle in the mix is
that we cannot get non-blocking random on many (any?) modern OS besides Linux,
so we're going to run into this same problem if say, FreeBSD decides to put a
Python script early enough in the boot sequence.

In the end, both of these choices make me happy and unhappy in different ways
but I would lean towards adding a CLI flag for the special case and letting the
systemd script that caused this problem invoke their Python with that flag. I
think this because:

* It leaves the interpreter so that it is secure by default, but provides the
  relevant knobs to turn off this default in cases where a user doesn't need
  or want it.
* It solves the problem in a cross platform way, that doesn't rely on the
  nuances of the CSPRNG interface on one particular supported platform.

os.urandom
----------

There have been a lot of proposals thrown around, and people pointing to
different sections of the documentation to justify different opinions. This is
easily the most contentious question we have here.

It is my belief that reading from urandom is the right thing to do for
generating cryptographically secure random numbers. This is a view point held
by every major security expert and cryptographer that I'm aware of. Most (all?)
major platforms besides Linux do not allow reading from their equivalent of
/dev/urandom until it has been successfully initialized and it is widely held
by all security experts and cryptographers that I'm aware of that this property
is a good one, and the Linux behavior of /dev/urandom is a wart/footgun but
that prior to getrandom() there simply wasn't a better option on Linux.

With that in mind, I think that we should, to the best of our ability given the
platform we're on, ensure that os.urandom does not return bytes that the OS
does not think is cryptographically secure.

In practice this means that os.urandom should do one of two things in the very
early boot process on Linux:

* Block waiting on the kernel to initialize the urandom pool, and then return
  the now secure random bytes given to us.
* Raise an exception saying that the pool has not been initialized and thus
  os.urandom is not ready yet.

The key point in both of these options is that os.urandom never [1] returns
bytes prior to the OS believing that it can give us cryptographically secure
random bytes.

I believe I have a preference for blocking on waiting the kernel to intialize
the urandom pool, because that makes Linux behave similarly to the other
platforms that I'm aware of.

I do not believe that adding additional public functions like some other people
have expressed to be a good option. I think they muddy the waters and I think
that it forces us to try and convince people that "no really, yes everyone
says you should use urandom, but you actually want getrandom". Particularly
since the outcome of these two functions would be exactly the same in all but
a very narrow edge case on Linux.

Larry has suggested that os.py should only ever be thin shells around OS
provided functionality and thus os.urandom should simply mimic whatever the
behavior of /dev/urandom is on that OS. For os.urandom in particular this is
already not the case since it calls CryptGetRandom on Windows, but putting that
aside since that's a Windows vs POSIX difference, we're not talking about
adding a great amount of functionality around something provided by the OS.
We're only talking about using a different interface to access the same
underlying functionality. In this case, an interface that better suits the
actual use of os.urandom in the wild and provides better properties all around.

He's also pointed out that the documentation does not guarantee that the result
of os.urandom will be cryptographically strong in the following quote:

    This function returns random bytes from an OS-specific randomness source.
    The returned data should be unpredictable enough for cryptographic
    applications, though its exact quality depends on the OS implementation. 

My read of this quote, is that this is a hedge against operating systems that
have implemented their urandom pool in such a way that it does not return
cryptographically secure random numbers that you don't come back and yell at
Python for it. In other words, it's a hedge against /dev/urandom being
https://xkcd.com/221/. I do not think this documentation excuses us from using
a weaker interface to the OS-specific randomness source simply because it's
name happens to match the name of the function. Particularly since earlier on
in that documentation it states:

    Return a string of n random bytes suitable for cryptographic use.

and the Python standard library, and the entire ecosystem as I know it, as well
as all security experts and crypto experts believe you should treat it as such.
This is largely because if your urandom pool is implemented in a way that, in
the general case it provides insecure random values, then you're beyond the
pale and there's nothing that Python, or anyone but your OS vendor, can do to
help you.

Further more, I think that the behavior I want (that os.urandom is secure by
default to the best of our abilities) is tricker to get right, and requires
interfacing with C code. However, getting the exact semantics of /dev/urandom
on Linux is trivial to do with a single line of Python code:

    def urandom(amt): open("/dev/urandom", "rb").read(amt)

So if you're someone who is depending on the Linux urandom behavior in an edge
case that almost nobody is going to hit, you can trivially get the old behavior
back. Even better, if you're someone depending on this, you're going to get an
*obvious* failure rather than silently getting insecure bytes. On top of all of
that, this only matters in a small edge case, most likely to only ever been hit
by OS vendors themselves, who are in the best position to make informed
decisions about how to work around the fact the urandom entropy pool hasn't
already been initialized rather than expecting every other user to have to try
and ensure that they don't start their Python script too early.

[1] To the best of our ability, given the interfaces and implementation
    provided to us by the OS.

—
Donald Stufft