[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

Thu Jun 16 06:04:39 EDT 2016

> On Jun 16, 2016, at 4:46 AM, Barry Warsaw <barry at python.org> wrote:
> 
> We can educate them through documentation, but I don't think it's appropriate
> to retrofit existing APIs to different behavior based on those faulty
> assumptions, because that has other negative effects, such as breaking the
> promises we make to experienced and knowledgeable developers.

You can’t document your way out of a usability problem, in the same way that while it was true that urllib was *documented* to not verify certificates by default, that didn’t matter because a large set of users used it like it did anyways.

In my opinion, this is a usability issue as well. You have a ton of third party documentation and effort around “just use urandom” for Cryptographic random which is generally the right (and best!) answer except for this one little niggle on a Linux platform where /dev/urandom *may* produce predictable bytes (but usually doesn’t). That documentation typically doesn’t go into telling people this small niggle because prior to getrandom(0) there wasn’t much they could do about it except use /dev/random which is bad in every other situation but early boot cryptographic keys.

Regardless of what we document it as, people are going to use os.urandom for cryptographic purposes because for everyone who doesn’t keep up on exactly what modules are being added to Python who has any idea about cryptography at all is going to look for a Python interface to urandom. That doesn’t even begin to touch the thousands upon thousands of uses that already exist in the wild that are assuming that os.urandom will always give them cryptographic random, who now *need* to write this as:

try:
    from secrets import token_bytes
except ImportError:
    from os import urandom as token_bytes

In order to get the best cryptographic random available to them on their system, which assumes they’re even going to notice at all that there’s a new secrets model, and requires each and every use of os.urandom to change.

Honestly, I think that the first sentence in the documentation should most obviously be the most pertinent one, and the first sentence here is "Return a string of n random bytes suitable for cryptographic use.”. The bit about how the exact quality depends on the OS and documenting what device it uses is, to my eyes, obviously a hedge to say that “Hey, if this gives you bad random it’s your OSs fault not ours, we can’t produce good random where your OS can’t give us some” and to give people a suggestion of where to look to determine if they’re going to get good random or not.

I do not think “uses /dev/urandom” is, or should be considered a core part of this API, it already doesn’t use /dev/urandom on Windows where it doesn’t exist nor does it use /dev/urandom in 3.5+ if it can help it. Using getrandom(0) or using getrandom(GRND_NONBLOCK) and raising an exception on EAGAIN is still accessing the urandom CSPRNG with the same general runtime characteristics of /dev/urandom outside of cases where it’s not safe to actually use /dev/urandom.

Frankly, I think it’s a disservice to Python developers to leave in this footgun.

—
Donald Stufft