[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

Thu Jun 16 15:33:16 EDT 2016

On 16 June 2016 at 18:03, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 16 June 2016 at 09:39, Paul Moore <p.f.moore at gmail.com> wrote:
>> I'm willing to accept the view of the security experts that there's a
>> problem here. But without a clear explanation of the problem, how can
>> a non-specialist like myself have an opinion? (And I hope the security
>> POV isn't "you don't need an opinion, just do as we say").
>
> If you're not writing Linux (and presumably *BSD) scripts and
> applications that run during system initialisation or on embedded ARM
> hardware with no good sources of randomness, then there's zero chance
> of any change made in relation to this affecting you (Windows and Mac
> OS X are completely immune, since they don't allow Python scripts to
> run early enough in the boot sequence for there to ever be a problem).

Understood. I could quite happily ignore this thread for all the
impact it will have on me. However, I've seen enough of these debates
(and witnessed the frustration of the security advocates) that I want
to try to understand the issues better - as much as anything so that I
don't end up adding uninformed opposition to these threads (in my day
job, unfortunately, security is generally the excuse for all sorts of
counter-productive rules, and never offers any practical benefits that
I am aware of, so I'm predisposed to rejecting arguments based on
security - that background isn't accurate in this environment and I'm
actively trying to counter it).

> The only question at hand is what CPython should do in the case where
> the operating system *does* let Python scripts run before the system
> random number generator is ready, and the application calls a security
> sensitive API that relies on that RNG:
>
> - throw BlockingIOError (so the script developer knows they have a
> potential problem to fix)
> - block (so the script developer has a system hang to debug)
> - return low quality random data (so the script developer doesn't even
> know they have a potential problem)
>
> The last option is the status quo, and has a remarkable number of
> vocal defenders.

Understood. It seems to me that there are two arguments here -
backward compatibility (which is always a pressure, but sometimes
applied too vigourously and not always consistently) and "we've always
done it that way" (aka "people will have to consider what happens when
they run under 3.4 anyway, so how will changing help?"). Jusging
backward compatibility is always a matter of trade-offs, hence my
interest in the actual benefits.

> The second option is what we changed the behaviour to in 3.5 as a side
> effect of switching to a syscall to save a file descriptor (and *also*
> inadvertently made a gating requirement for CPython starting at all,
> without which I'd be very surprised if anyone actually noticed the
> potentially blocking behaviour in os.urandom itself)

OK, so (given that the issue of CPython starting at all was an
accidental, and now corrected, side effect) why is this so bad? Maybe
not in a minor release, but at least for 3.6? How come this has caused
such a fuss? I genuinely don't understand why people see blocking as
such an issue (and as far as I can tell, Ted Tso seems to agree). The
one case where this had an impact was a quickly fixed bug - so as far
as I can tell, the risk of problems caused by blocking is purely
hypothetical.

> The first option is the one I'm currently writing a PEP for, since it
> makes the longstanding advice to use os.urandom() as the low level
> random data API for security sensitive operations unequivocally
> correct (as it will either do the right thing, or throw an exception
> which the developer can handle as appropriate for their particular
> application)

In my code, I typically prefer Python to make detailed decisions for
me (e.g. requests follows redirects by default, it doesn't expect me
to do so manually). Now certainly this is a low-level interface so the
rules are different, but I don't see why blocking by default isn't
"unequivocally correct" in the same way that it is on other platforms,
rather than raising an exception and requiring the developer to do the
wait manually. (What else would they do - fall back to insecure data?
I thought the point here was that that's the wrong thing to do?)
Having a blocking default with a non-blocking version seems just as
arguable, and has the advantage that naive users (I don't even know if
we're allowing for naive users here) won't get an unexpected exception
and handle it badly because they don't know what to do (a sadly common
practice in my experience).

OK. Guido has pronounced, you're writing a PEP. None of this debate is
really constructive any more. But I still don't understand the
trade-offs, which frustrates me. Surely security isn't so hard that it
can't be explained in a way that an interested layman like myself can
follow? :-(

Paul