[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

Thu Jun 16 03:53:38 EDT 2016

On Wed, Jun 15, 2016 at 11:45 PM, Barry Warsaw <barry at python.org> wrote:
> On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote:
>
>>No, this is a bad idea. Asking novice developers to make security
>>decisions they're not yet qualified to make when it's genuinely
>>possible for us to do the right thing by default is the antithesis of
>>good security API design, and os.urandom() *is* a security API
>>(whether we like it or not - third party documentation written by the
>>cryptographic software development community has made it so, since
>>it's part of their guidelines for writing security sensitive code in
>>pure Python).
>
> Regardless of what third parties have said about os.urandom(), let's look at
> what *we* have said about it.  Going back to pre-churn 3.4 documentation:
>
>     os.urandom(n)
>     Return a string of n random bytes suitable for cryptographic use.
>
>     This function returns random bytes from an OS-specific randomness
>     source. The returned data should be unpredictable enough for cryptographic
>     applications, though its exact quality depends on the OS
>     implementation. On a Unix-like system this will query /dev/urandom, and on
>     Windows it will use CryptGenRandom(). If a randomness source is not found,
>     NotImplementedError will be raised.
>
>     For an easy-to-use interface to the random number generator provided by
>     your platform, please see random.SystemRandom.
>
> So we very clearly provided platform-dependent caveats on the cryptographic
> quality of os.urandom().  We also made a strong claim that there's a direct
> connection between os.urandom() and /dev/urandom on "Unix-like system(s)".
>
> We broke that particular promise in 3.5. and semi-fixed it 3.5.2.
>
>>Adding *new* APIs is also a bad idea, since "os.urandom() is the right
>>answer on every OS except Linux, and also the best currently available
>>answer on Linux" has been the standard security advice for generating
>>cryptographic secrets in pure Python code for years now, so we should
>>only change that guidance if we have extraordinarily compelling
>>reasons to do so, and we don't.
>
> Disagree.
>
> We have broken one long-term promise on os.urandom() ("On a Unix-like system
> this will query /dev/urandom") and changed another ("should be unpredictable
> enough for cryptographic applications, though its exact quality depends on OS
> implementations").
>
> We broke the experienced Linux developer's natural and long-standing link
> between the API called os.urandom() and /dev/urandom.  This breaks pre-3.5
> code that assumes read-from-/dev/urandom semantics for os.urandom().
>
> We have introduced churn.  Predicting a future SO question such as "Can
> os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes
> possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the
> 3.5.x series, and yes possibly in Python 3.6 and beyond".

It also depends on the kernel version, since it will never block on
old kernels that are missing getrandom(), but it might block on future
kernels if Linux's /dev/urandom ever becomes blocking. (Ted's said
that this is not going to happen now, but the only reason it isn't was
that he tried to make the change and it broke some distros that are
still in use -- so it seems entirely possible that it will happen a
few years from now.)

> We have a better answer for "cryptographically appropriate" use cases in
> Python 3.6 - the secrets module.  Trying to make os.urandom() "the right
> answer on every OS" weakens the promotion of secrets as *the* module to use
> for cryptographically appropriate use cases.
>
> IMHO it would be better to leave os.urandom() well enough alone, except for
> the documentation which should effectively say, a la 3.4:
>
>     os.urandom(n)
>     Return a string of n random bytes suitable for cryptographic use.
>
>     This function returns random bytes from an OS-specific randomness
>     source. The returned data should be unpredictable enough for cryptographic
>     applications, though its exact quality depends on the OS
>     implementation. On a Unix-like system this will query /dev/urandom, and on
>     Windows it will use CryptGenRandom(). If a randomness source is not found,
>     NotImplementedError will be raised.
>
>     Cryptographic applications should use the secrets module for stronger
>     guaranteed sources of randomness.
>
>     For an easy-to-use interface to the random number generator provided by
>     your platform, please see random.SystemRandom.

This is not an accurate docstring, though. The more accurate docstring
for your proposed behavior would be:

os.urandom(n)
Return a string of n bytes that will usually, but not always, be
suitable for cryptographic use.

This function returns random bytes from an OS-specific randomness
source. On non-Linux OSes, this uses the best available source of
randomness, e.g. CryptGenRandom() on Windows and /dev/urandom on OS X,
and thus will be strong enough for cryptographic use. However, on
Linux it uses a deprecated API (/dev/urandom) which in rare cases is
known to return bytes that look random, but aren't. There is no way to
know when this has happened; your code will just silently stop being
secure. In some unusual configurations, where Python is not configured
with any source of randomness, it will raise NotImplementedError.

You should never use this function. If you need unguessable random
bytes, then the 'secrets' module is always a strictly better choice --
unlike this function, it always uses the best available source of
cryptographic randomness, even on Linux. Alternatively, if you need
random bytes but it doesn't matter whether other people can guess
them, then the 'random' module is always a strictly better choice --
it will be faster, as well as providing useful features like
deterministic seeding.

---

In practice, your proposal means that ~all existing code that uses
os.urandom becomes incorrect and should be switched to either secrets
or random. This is *far* more churn for end-users than Nick's
proposal.

...Anyway, since there's clearly going to be at least one PEP about
this, maybe we should stop rehashing bits and pieces of the argument
in these long threads that most people end up skipping and then
rehashing again later?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org