[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

Wed Jun 15 19:12:57 EDT 2016

On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
[...]
> For 3.6+, we can instead make it so that the only things that actually
> rely on cryptographic quality randomness being available are:
>
> - calling a secrets module API
> - calling a random.SystemRandom method
> - calling os.urandom directly
>
> These are all APIs that were either created specifically for use in
> security sensitive situations (secrets module), or have long been
> documented (both within our own documentation, and in third party
> documentation, books and Q&A sites) as being an appropriate choice for
> use in security sensitive situations (os.urandom and
> random.SystemRandom).
>
> However, we don't need to make those block waiting for randomness to
> be available - we can update them to raise BlockingIOError instead
> (which makes it trivial for people to decide for themselves how they
> want to handle that case).
>
> Along with that change, we can make it so that starting the
> interpreter will never block waiting for cryptographic randomness to
> be available (since it doesn't need it), and importing the random
> module won't block waiting for it either.

This all seems exactly right to me, to the point that I've been
dreading having to find the time to write pretty much this exact
email. So thank you :-)

> To the best of our knowledge, on all operating systems other than
> Linux, encountering the new exception will still be impossible in
> practice, as there is no known opportunity to run Python code before
> the kernel random number generator is ready.
>
> On Linux, init scripts may still run before the kernel random number
> generator is ready, but will now throw an immediate BlockingIOError if
> they access an API that relies on crytographic randomness being
> available, rather than potentially deadlocking the init process. Folks
> encountering that situation will then need to make an explicit
> decision:
>
> - loop until the exception is no longer thrown
> - switch to reading from /dev/urandom directly instead of calling os.urandom()
> - switch to using a cross-platform non-cryptographic API (probably the
> random module)
>
> Victor has some additional technical details written up at
> http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to
> formalise this proposed approach as a PEP (the current reference is
> http://bugs.python.org/issue27282 )

I'd make two additional suggestions:

- one person did chime in on the thread to say that they've used
os.urandom for non-security-sensitive purposes, simply because it
provided a convenient "give me a random byte-string" API that is
missing from random. I think we should go ahead and add a .randbytes
method to random.Random that simply returns a random bytestring using
the regular RNG, to give these users a nice drop-in replacement for
os.urandom.

Rationale: I don't think the existence of these users should block
making os.urandom appropriate for generating secrets, because (1) a
glance at github shows that this is very unusual -- if you skim
through this search you get page after page of functions with names
like "generate_secret_key"

  https://github.com/search?l=python&p=2&q=urandom&ref=searchresults&type=Code&utf8=%E2%9C%93

and (2) for the minority of people who are using os.urandom for
non-security-sensitive purposes, if they find os.urandom raising an
error, then this is just a regular bug that they will notice
immediately and fix, and anyway it's basically never going to happen.
(As far as we can tell, this has never yet happened in the wild, even
once.) OTOH if os.urandom is allowed to fail silently, then people who
are using it to generate secrets will get silent catastrophic
failures, plus those users can't assume it will never happen because
they have to worry about active attackers trying to drive systems into
unusual states. So I'd much rather ask the non-security-sensitive
users to switch to using something in random, than force the
cryptographic users to switch to using secrets. But it does seem like
it would be good to give those non-security-sensitive users something
to switch to :-).

- It's not exactly true that the Python interpreter doesn't need
cryptographic randomness to initialize SipHash -- it's more that
*some* Python invocations need unguessable randomness (to first
approximation: all those which are exposed to hostile input), and some
don't. And since the Python interpreter has no idea which case it's
in, and since it's unacceptable for it to break invocations that don't
need unguessable hashes, then it has to err on the side of continuing
without randomness. All that's fine.

But, given that the interpreter doesn't know which state it's in,
there's also the possibility that this invocation *will* be exposed to
hostile input, and the 3.5.2+ behavior gives absolutely no warning
that this is what's happening. So instead of letting this potential
error pass silently, I propose that if SipHash fails to acquire real
randomness at startup, then it should issue a warning. In practice,
this will almost never happen. But in the rare cases it does, it at
least gives the user a fighting chance to realize that their system is
in a potentially dangerous state. And by using the warnings module, we
automatically get quite a bit of flexibility. If some particular
invocation (e.g. systemd-cron) has audited their code and decided that
they don't care about this issue, they can make the message go away:

   PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning

OTOH if some particular invocation knows that they do process
potentially hostile input early on (e.g. cloud-init, maybe?), then
they can explicitly promote the warning to an error:

  PYTHONWARNINGS=error::NoEntropyAtStartupWarning

(I guess the way to implement this would be for the SipHash
initialization code -- which runs very early -- to set some flag, and
then we expose that flag in sys._something, and later in the startup
sequence check for it after the warnings module is functional.
Exposing the flag at the Python level would also make it possible for
code like cloud-init to do its own explicit check and respond
appropriately.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org