
On Jun 9, 2016, at 7:25 AM, Larry Hastings <larry@hastings.org> wrote:
A problem has surfaced just this week in 3.5.1. Obviously this is a good time to fix it for 3.5.2. But there's a big argument over what is "broken" and what is an appropriate "fix".
Couple clarifications: random.py --------- In the abstract it doesn't hurt to seed MT with a CSPRNG, it just doesn't provide much (if any) benefit and in this case it is hurting us because of the cost on import (which will exist on other platforms as well no matter what we do here for Linux). There are a couple solutions to this problem: * Use getrandom(GRND_NONBLOCK) for random.Random since it doesn't matter if we get cryptographically secure random numbers or not. * Switch it to use something other than a CSPRNG by default since it doesn't need that. * Instead of seeding itself from os.urandom on import, have it lazily do that the first time one of the random.rand* functions are called. * Do nothing, and say that ``import random`` relies on having the kernel's urandom pool initialized. Between these options, I have a slight preference for switching it to use a non CSPRNG, but I really don't care that much which of these options we pick. Using random.Random is not secure and none of the above options meaningfully change the security posture of something that accidently uses it. SipHash and the Interpreter Startup ----------------------------------- I have complicated thoughts on what SipHash should do. For something like, a Django process, we never want it to be initialized with “bad” entropy, however reading straight from /dev/urandom, or getrandom(GRND_NONBLOCK) means that we might get that if we start the process early enough in the boot process. The rub here is that I cannot think of a situation where by the time you’re at the point you’re starting up something like Django, you’re even remotely likely to not have an initialized random pool. The other side of this issue is that we have Python scripts which do not need a secure random being passed to SipHash running early enough in the boot process with systemd that we need to be able to have SipHash initialization not block on waiting for /dev/urandom. So I’m torn between the “Practicality beats Purity” mindset, which says we should just let SipHash seed itself with whatever quality of random from the urandom pool is currently available and the “Special cases aren’t special enough to break the rules” mindset which says that we should just make it easier for scripts in this edge case to declare they don’t care about hash randomization to remove the need for it (in other words, a CLI flag that matches PYTHONHASHSEED in functionality). An additional wrinkle in the mix is that we cannot get non-blocking random on many (any?) modern OS besides Linux, so we're going to run into this same problem if say, FreeBSD decides to put a Python script early enough in the boot sequence. In the end, both of these choices make me happy and unhappy in different ways but I would lean towards adding a CLI flag for the special case and letting the systemd script that caused this problem invoke their Python with that flag. I think this because: * It leaves the interpreter so that it is secure by default, but provides the relevant knobs to turn off this default in cases where a user doesn't need or want it. * It solves the problem in a cross platform way, that doesn't rely on the nuances of the CSPRNG interface on one particular supported platform. os.urandom ---------- There have been a lot of proposals thrown around, and people pointing to different sections of the documentation to justify different opinions. This is easily the most contentious question we have here. It is my belief that reading from urandom is the right thing to do for generating cryptographically secure random numbers. This is a view point held by every major security expert and cryptographer that I'm aware of. Most (all?) major platforms besides Linux do not allow reading from their equivalent of /dev/urandom until it has been successfully initialized and it is widely held by all security experts and cryptographers that I'm aware of that this property is a good one, and the Linux behavior of /dev/urandom is a wart/footgun but that prior to getrandom() there simply wasn't a better option on Linux. With that in mind, I think that we should, to the best of our ability given the platform we're on, ensure that os.urandom does not return bytes that the OS does not think is cryptographically secure. In practice this means that os.urandom should do one of two things in the very early boot process on Linux: * Block waiting on the kernel to initialize the urandom pool, and then return the now secure random bytes given to us. * Raise an exception saying that the pool has not been initialized and thus os.urandom is not ready yet. The key point in both of these options is that os.urandom never [1] returns bytes prior to the OS believing that it can give us cryptographically secure random bytes. I believe I have a preference for blocking on waiting the kernel to intialize the urandom pool, because that makes Linux behave similarly to the other platforms that I'm aware of. I do not believe that adding additional public functions like some other people have expressed to be a good option. I think they muddy the waters and I think that it forces us to try and convince people that "no really, yes everyone says you should use urandom, but you actually want getrandom". Particularly since the outcome of these two functions would be exactly the same in all but a very narrow edge case on Linux. Larry has suggested that os.py should only ever be thin shells around OS provided functionality and thus os.urandom should simply mimic whatever the behavior of /dev/urandom is on that OS. For os.urandom in particular this is already not the case since it calls CryptGetRandom on Windows, but putting that aside since that's a Windows vs POSIX difference, we're not talking about adding a great amount of functionality around something provided by the OS. We're only talking about using a different interface to access the same underlying functionality. In this case, an interface that better suits the actual use of os.urandom in the wild and provides better properties all around. He's also pointed out that the documentation does not guarantee that the result of os.urandom will be cryptographically strong in the following quote: This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. My read of this quote, is that this is a hedge against operating systems that have implemented their urandom pool in such a way that it does not return cryptographically secure random numbers that you don't come back and yell at Python for it. In other words, it's a hedge against /dev/urandom being https://xkcd.com/221/. I do not think this documentation excuses us from using a weaker interface to the OS-specific randomness source simply because it's name happens to match the name of the function. Particularly since earlier on in that documentation it states: Return a string of n random bytes suitable for cryptographic use. and the Python standard library, and the entire ecosystem as I know it, as well as all security experts and crypto experts believe you should treat it as such. This is largely because if your urandom pool is implemented in a way that, in the general case it provides insecure random values, then you're beyond the pale and there's nothing that Python, or anyone but your OS vendor, can do to help you. Further more, I think that the behavior I want (that os.urandom is secure by default to the best of our abilities) is tricker to get right, and requires interfacing with C code. However, getting the exact semantics of /dev/urandom on Linux is trivial to do with a single line of Python code: def urandom(amt): open("/dev/urandom", "rb").read(amt) So if you're someone who is depending on the Linux urandom behavior in an edge case that almost nobody is going to hit, you can trivially get the old behavior back. Even better, if you're someone depending on this, you're going to get an *obvious* failure rather than silently getting insecure bytes. On top of all of that, this only matters in a small edge case, most likely to only ever been hit by OS vendors themselves, who are in the best position to make informed decisions about how to work around the fact the urandom entropy pool hasn't already been initialized rather than expecting every other user to have to try and ensure that they don't start their Python script too early. [1] To the best of our ability, given the interfaces and implementation provided to us by the OS. — Donald Stufft