[Python-ideas] Should our default random number generator be secure?

Nathaniel Smith njs at pobox.com
Thu Sep 10 01:15:31 CEST 2015

On Wed, Sep 9, 2015 at 3:19 PM, Tim Peters <tim.peters at gmail.com> wrote:
> Well, most arguments about random functions rely on fantasies ;-)  For
> example, yes, the Twister is provably equidistributed to 32 bits
> across 623 dimensions, but ... does it make a difference to anything?
> That's across the Twister's _entire period_, which couldn't actually
> be generated across the age of the universe.
> What may really matter to an application is whether it will see rough
> equidistribution across the infinitesimally small slice (of the
> Twister's full period) it actually generates.  And you'll find very
> little about _that_ (last time I searched, I found nothing).  For
> assurances about that, people rely on test suites developed to test
> generators.

Yeah, equidistribution is not a guarantee of anything on its own. For
example, an integer counter modulo 2**(623*32) is equidistributed to
32 bits across 623 dimensions, just like the Mersenne Twister. Mostly
people talk about equidistribution because for a deterministic RNG,
(a) being non-equidistributed would be bad, (b) equidistribution is
something you can reasonably hope to prove for simple
non-cryptographic generators, and mathematicians like writing proofs.

OTOH equidistribution is not even well-defined for the OpenBSD
"arc4random" generator, because it is genuinely non-deterministic --
it regularly mixes new entropy into its state as it goes -- and
equidistribution by definition requires determinism. So it "fails"
this test of "randomness" because it is too random...

In practice, the chances that your Monte Carlo simulation is going to
give bad results because of systematic biases in arc4random are much,
much lower than the chances that it will give bad results because of
subtle hardware failures that corrupt your simulation. And hey, if
arc4random *does* mess up your simulation, then congratulations, your
simulation is publishable as a cryptographic attack and will probably
get written up in the NYTimes :-).

The real reasons to prefer non-cryptographic RNGs are the auxiliary
features like determinism, speed, jumpahead, multi-thread
friendliness, etc. But the stdlib random module doesn't really provide
any of these (except determinism in strictly limited cases), so I'm
not sure it matters much.

> The Twister's provably perfect equidistribution across its whole
> period also has its scary sides.  For example, run random.random()
> often enough, and it's _guaranteed_ you'll eventually reach a state
> where the output is exactly 0.0 hundreds of times in a row.  That will
> happen as often as it "should happen" by chance, but that's scant
> comfort if you happen to hit such a state.  Indeed, the Twister was
> patched relatively early in its life to try to prevent it from
> _starting_ in such miserable states.   Such states are nevertheless
> still reachable from every starting state.

This criticism seems a bit unfair though -- even a true stream of
random bits (e.g. from a true unbiased quantum source) has this
property, and trying to avoid this happening would introduce bias that
really could cause problems in practice. A good probabilistic program
is one that has a high probability of returning some useful result,
but they always have some low probability of returning something
weird. So this is just saying that most people don't understand
probability. Which is true, but there isn't much that the random
module can do about it :-)


Nathaniel J. Smith -- http://vorpus.org

More information about the Python-ideas mailing list