[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

Fri Jun 10 15:54:11 EDT 2016

I will observe that feelings have gotten a little heated, so without
making any suggestions to how the python-dev community should decide
things, let me offer some observations that might perhaps shed a
little light, and perhaps dispell a little bit of the heat.

As someone who has been working in security for a long time --- before
I started getting paid to hack Linux full-time, worked on Kerberos,
was on the Security Area Directorate of the IETF, where among other
things I was one of the working group chairs for the IP Security
(ipsec) working group --- I tend to cringe a bit when people talk
about security in terms of absolutes.  For example, the phrase
"improving Python's security".  Security is something that is best
talked about given a specific threat environment, where the value of
what you are trying to protect, the capabilities and resources of the
attackers, etc., are all well known.

This gets hard for those of us who work on infrastructure which can
get used in many different arenas, and so that's something that
applies both to the Linux Kernel and to C-Python, because how people
will use the tools that we spend so much of our passion crafting is
largely out of our control, and we may not even know how they are
using it.

As far as /dev/urandom is concerned, it's true that it doesn't block
before it has been initialized.  If you are a security academic who
likes to write papers about how great you are at finding defects in
other people's work.  This is definitely a weakness.

Is it a fatal weakness?  Well, first of all, on most server and
desktop deployments, we save 1 kilobyte or so of /dev/urandom output
during the shutdown sequence, and immediately after the init scripts
are completed.  This saved entropy is then piped back into /dev/random
infrastructure and used initialized /dev/random and /dev/urandom very
early in the init scripts.  On a freshly instaled machine, this won't
help, true, but in practice, on most systems, /dev/urandom will get
initialized from interrupt timing sampling within a few seconds after
boot.  For example, on a sample Google Compute Engine VM which is
booted into Debian and then left idle, /dev/urandom was initialized
within 2.8 seconds after boot, while the root file system was
remounted read-only 1.6 seconds after boot.

So even on Python pre-3.5.0, realistically speaking, the "weakness" of
os.random would only be an issue (a) if it is run within the first few
seconds of boot, and (b) os.random is used to directly generate a
long-term cryptographic secret.  If you are fork openssl or ssh-keygen
to generate a public/private keypair, then you aren't using os.random.

Furthermore, if you are running on a modern x86 system with RDRAND,
you'll also be fine, because we mix in randomness from the CPU chip
via the RDRAND instruction.

So this whole question of whether os.random should block *is*
important in certain very specific cases, and if you are generating
long-term cryptogaphic secrets in Python, maybe you should be worrying
about that.  But to be honest, there are lots of other things you
should be worrying about as well, and I would hope that people writing
cryptographic code would be asking questions of how the random nunmber
stack is working, not just at the C-Python interpretor level, but also
at the OS level.

My preference would be that os.random should block, because the odds
that people would be trying to generate long-term cryptographic
secrets within seconds after boot is very small, and if you *do* block
for a second or two, it's not the end of the world.  The problem that
triggered this was specifically because systemd was trying to use
C-Python very early in the boot process to initialize the SIPHASH used
for the dictionary, and it's not clear that really needed to be
extremely strong because it wasn't a long-term cryptogaphic secret ---
certainly not how systemd was using that specific script!

The reason why I think blocking is better is that once you've solved
the "don't hang the VM for 90 seconds until python has started up",
someone who is using os.random will almost certainly not be on the
blocking path of the system boot sequence, and so blocking for 2
seconds before generating a long-term cryptographic secret is not the
end of the world.

And if it does block by accident, in a security critical scenario it
will hopefully force the progammer to think, and and in a non-security
critical scenario, it should be easy to switch to either a totally
non-blocking interface, or switch to a pseudo-random interface hwich
is more efficient.

*HOWEVER*, on the flip side, if os.random *doesn't* block, in 99.999%
percent of the cases, the python script that is directly generating a
long-term secret will not be started 1.2 seconds after the root file
system is remounted read/write, so it is *also* not the end of the
world.  Realistically speaking, we do know which processes are likely
to be generating long-term cryptographic secrets imnmediately after
boot, and they'll most likely be using progams like openssl or
openssh-keygen, to actually generate the cryptogaphic key, and in both
of those places, (a) it's there problem to get it right, and (b)
blocking for two seconds is a completely reasonable thing to do, and
they will probably do it, so we're fine.

So either way, I think it will be fine.  I may have a preference, but
if Python choses another path, all will be well.  There is an old
saying that Academic politics are often so passionate because the
stakes are so small.  It may be that one of the reasons why this topic
has been so passionate is precisely because of Sayre's Law.

Peace,

						- Ted