[Python-ideas] Should our default random number generator be secure?

Mon Sep 14 08:38:25 CEST 2015

On Sun, Sep 13, 2015 at 10:29 PM, Tim Peters <tim.peters at gmail.com> wrote:
> [Stephen J. Turnbull <stephen at xemacs.org>]
>> it's easy to ignore the state access APIs, and call it the same way
>> that you would call a CSPRNG. In fact that's what's documented as
>> correct usage, as Paul Moore points out.  Thus, programmers who
>> are using a PRNG whose parameters can be inferred from its output,
>> and should not be doing so, generally won't know it until the
>> (potentially widespread) harm is done.  It would be nice if it wasn't
>> so easy for them to use the MT.
>
> And yet nobody so far has a produced a single example of any harm done
> in any of the near-countless languages that supply non-crypto RNGs.  I
> know, my lawyer gets annoyed too when I point out there hasn't been a
> nuclear war ;-)

Here you go:
  https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf

They present real-world attacks on PHP applications that use something
like the "password generation" code we've been talking about as a way
to generate cookies and password reset nonces, including in particular
the case of applications that use a strongly-seeded Mersenne Twister
as their RNG:

"We develop a suite of new techniques and tools to mount attacks
against all PRNGs of the PHP core system even when it is hardened with
the Suhosin patch [which adds strong seeding] and apply our techniques
to create practical exploits for a number of the most popular PHP
applications (including Mediawiki, Gallery, osCommerce and Joomla)
focusing on the password reset functionality. Our exploits allow an
attacker to completely take over arbitrary user accounts."

"Section 5.3: ... In this section we give a description of the
Mersenne Twister generator and present an algorithm that allows the
recovery of the internal state of the generator even when the output
is truncated. Our algorithm also works in the presence of non
consecutive outputs ..."

Out of curiosity, I tried searching github for "random cookie
language:python". The 5th hit (out of ~100k) was a web project that
appears to use this insecure method to generate session cookies:
  https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/utils/cookie.py
  https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/api/router.py#L56-L66
(Fortunately this project doesn't appear to actually have any login or
permissions functionality, so I don't think this is an actual
CVE-worthy bug, but that's just a luck -- I'm sure there are plenty of
projects that start out looking like this one and then add security
features without revisiting how they generate session ids.)

There's a reason security people are so Manichean about these kinds of
things. If something is not intended to be secure or used in
security-sensitive ways, then fine, no worries. But if it is, then
there's no point in trying to mess around with "probably mostly
secure" -- either solve the problem right or don't bother. (See also:
the time Python wasted trying to solve hash randomization without
actually solving hash randomization [1].) If Tim Peters can get fooled
into thinking something like using MT to generate session ids is
"probably mostly secure", then what chance do the rest of us have?
<wink>

NB this isn't an argument for *whether* we should make random
cryptographically strong by default; it's just an argument against
wasting time debating whether it's already "secure enough". It's not
secure. Maybe that's okay, maybe it's not.

For the record though I do tend to agree with the idea that it's not
okay, because it's an increasingly hostile world out there, and
secure-random-by-default makes whole classes of these issues just
disappear. It's not often that you get to fix thousands of bugs with
one commit, including at least some with severity level "all your
users' private data just got uploaded to bittorrent".

I like Nick's proposal here:
    https://code.activestate.com/lists/python-ideas/35842/
as probably the most solid strategy for implementing that idea -- the
only projects that would be negatively affected are those that are
using the seeding functionality of the global random API, which is a
tiny fraction, and the effect on those projects is that they get
nudged into using the superior object-oriented API.

-n

[1] https://lwn.net/Articles/574761/

-- 
Nathaniel J. Smith -- http://vorpus.org