[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

Sun Jun 12 14:07:22 EDT 2016

On Jun 11, 2016 11:13 PM, "Theodore Ts'o" <tytso at mit.edu> wrote:
>
> On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote:
> >
> > It was a RaspberryPI that ran a shell script on boot that called
> > ssh-keygen.  That shell script could have just as easily been a
> > Python script that called os.urandom via
> > https://github.com/sybrenstuvel/python-rsa instead of a shell script
> > that called ssh-keygen.
>
> So I'm going to argue that the primary bug was in the how the systemd
> init scripts were configured.  In generally, creating keypairs at boot
> time is just a bad idea.  They should be created lazily, in a
> just-in-time paradigm.
>
> Consider that if you assume that os.urandom can block, this isn't
> necessarily going to do the right thing either --- if you use
> getrandom and it blocks, and it's part of a systemd unit which is
> blocking futher boot progress, then the system will hang for 90
> seconds, and while it's hanging, there won't be any interrupts, so the
> system will be dead in the water, just like the orignal bug report
> complaining that Python was hanging when it was using getrandom() to
> initialize its SipHash.

Hi Ted,

>From another perspective, I guess one could also argue that the best place
to fix this is in the kernel: if a process is blocked waiting for entropy
then the kernel probably shouldn't take that its cue to turn off all the
entropy generation mechanisms, just like how if a process is blocked
waiting for disk I/O then we probably shouldn't power down the disk
controller. Obviously this is a weird case because the kernel is
architected in a way that makes the dependency between the disk controller
and the I/O request obvious, while the dependency between the random pool
and... well... everything else, more or less, is much more subtle and goes
outside the usual channels, and we wouldn't want to rearchitect everything
just for this. But for example, if a process is actively blocked waiting
for the initial entropy, one could spawn a kernel thread that keeps the
system from quiescing by attempting to scrounge up entropy as fast as
possible, via whatever mechanisms are locally appropriate (e.g. doing a
busy-loop racing two clocks against each other, or just scheduling lots of
interrupts -- which I guess is the same thing, more or less). And the
thread would go away again as soon as userspace wasn't blocked on entropy.
That way this deadlock wouldn't be possible.

I guess someone *might* complain about the idea of the entropy pool
actually spending resources instead of being quietly parasitic, because
this is the kernel and someone will always complain about everything :-).
But complaining about this makes about much sense as complaining about the
idea of spending resources trying to service I/O when a process is blocked
on that ("maybe if we wait long enough then some other part of the system
will just kind of accidentally page in the data we need as a side effect of
whatever it's doing, and then this thread will be able to proceed").

Is this an approach that you've considered?

> At which point there will be another bug complaining about how python
> was causing systemd to hang for 90 seconds, and there will be demand
> to make os.random no longer block.  (Since by definition, systemd can
> do no wrong; it's always other programs that have to change to
> accomodate systemd.  :-)

FWIW, the systemd thing is a red herring -- this was debian's configuration
of a particular daemon that is not maintained by the systemd project, and
the exact same thing would have happened with sysvinit if debian had tried
using python 3.5 early in their rcS.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160612/bc7bb2a3/attachment.html>