On Jun 11, 2016 11:13 PM, "Theodore Ts'o" <tytso@mit.edu> wrote:
>
> On Sat, Jun 11, 2016 at 05:46:29PM -0400, Donald Stufft wrote:
> >
> > It was a RaspberryPI that ran a shell script on boot that called
> > ssh-keygen. That shell script could have just as easily been a
> > Python script that called os.urandom via
> > https://github.com/sybrenstuvel/python-rsa instead of a shell script
> > that called ssh-keygen.
>
> So I'm going to argue that the primary bug was in the how the systemd
> init scripts were configured. In generally, creating keypairs at boot
> time is just a bad idea. They should be created lazily, in a
> just-in-time paradigm.
>
> Consider that if you assume that os.urandom can block, this isn't
> necessarily going to do the right thing either --- if you use
> getrandom and it blocks, and it's part of a systemd unit which is
> blocking futher boot progress, then the system will hang for 90
> seconds, and while it's hanging, there won't be any interrupts, so the
> system will be dead in the water, just like the orignal bug report
> complaining that Python was hanging when it was using getrandom() to
> initialize its SipHash.
Hi Ted,
From another perspective, I guess one could also argue that the best place to fix this is in the kernel: if a process is blocked waiting for entropy then the kernel probably shouldn't take that its cue to turn off all the entropy generation mechanisms, just like how if a process is blocked waiting for disk I/O then we probably shouldn't power down the disk controller. Obviously this is a weird case because the kernel is architected in a way that makes the dependency between the disk controller and the I/O request obvious, while the dependency between the random pool and... well... everything else, more or less, is much more subtle and goes outside the usual channels, and we wouldn't want to rearchitect everything just for this. But for example, if a process is actively blocked waiting for the initial entropy, one could spawn a kernel thread that keeps the system from quiescing by attempting to scrounge up entropy as fast as possible, via whatever mechanisms are locally appropriate (e.g. doing a busy-loop racing two clocks against each other, or just scheduling lots of interrupts -- which I guess is the same thing, more or less). And the thread would go away again as soon as userspace wasn't blocked on entropy. That way this deadlock wouldn't be possible.
I guess someone *might* complain about the idea of the entropy pool actually spending resources instead of being quietly parasitic, because this is the kernel and someone will always complain about everything :-). But complaining about this makes about much sense as complaining about the idea of spending resources trying to service I/O when a process is blocked on that ("maybe if we wait long enough then some other part of the system will just kind of accidentally page in the data we need as a side effect of whatever it's doing, and then this thread will be able to proceed").
Is this an approach that you've considered?
> At which point there will be another bug complaining about how python
> was causing systemd to hang for 90 seconds, and there will be demand
> to make os.random no longer block. (Since by definition, systemd can
> do no wrong; it's always other programs that have to change to
> accomodate systemd. :-)
FWIW, the systemd thing is a red herring -- this was debian's configuration of a particular daemon that is not maintained by the systemd project, and the exact same thing would have happened with sysvinit if debian had tried using python 3.5 early in their rcS.
-n