[Python-ideas] Should our default random number generator be secure?

Thu Sep 10 05:23:22 CEST 2015

[Nathaniel Smith <njs at pobox.com>]
> Yeah, equidistribution is not a guarantee of anything on its own. For
> example, an integer counter modulo 2**(623*32) is equidistributed to
> 32 bits across 623 dimensions, just like the Mersenne Twister.

The analogy is almost exact.  If you view MT's state as a single
19937-bit integer (=623*32 + 1), then MT's state "simply" cycles
through a specific permutation of

    range(1, 2**19937)

(with a different orbit consisting solely of 0).  That was the "hard
part" to prove.  Everything about equidistribution was more of an
observation following from that.  Doesn't say anything about
distribution "in the small" (across small slices), alas.

> ... And hey, if
> arc4random *does* mess up your simulation, then congratulations, your
> simulation is publishable as a cryptographic attack and will probably
> get written up in the NYTimes :-).

Heh.  In the NYT or a security wonk's blog, maybe.  But why would a
reputable journal believe me?  By design, the results of using the
OpenBSD arc4random can't be reproduced ;-)

> The real reasons to prefer non-cryptographic RNGs are the auxiliary
> features like determinism, speed, jumpahead, multi-thread
> friendliness, etc. But the stdlib random module doesn't really provide
> any of these (except determinism in strictly limited cases), so I'm
> not sure it matters much.

Python's implementation of MT has never changed anything about the
sequence produced from a given seed state, and indeed gives the same
sequence from the same seed state as every other correct
implementation of the same flavor of MT.  That is "strictly limited",
to perfection ;-)  At a higher level, depends on the app.  People are
quite creative at defeating efforts to be helpful ;-)

>> The Twister's provably perfect equidistribution across its whole
>> period also has its scary sides.  For example, run random.random()
>> often enough, and it's _guaranteed_ you'll eventually reach a state
>> where the output is exactly 0.0 hundreds of times in a row.  That will
>> happen as often as it "should happen" by chance, but that's scant
>> comfort if you happen to hit such a state.  Indeed, the Twister was
>> patched relatively early in its life to try to prevent it from
>> _starting_ in such miserable states.   Such states are nevertheless
>> still reachable from every starting state.

> This criticism seems a bit unfair though

Those are facts, not criticisms.  I like the Twister very much.  But
those who have no fear of it are dreaming - while those who have
significant fear of it are also dreaming.  It's my job to ensure
nobody is either frightened or complacent ;-)

> -- even a true stream of random bits (e.g. from a true unbiased
> quantum source) has this property,

But good generators with astronomically smaller periods do not.  In a
sense, MT made it possible to get results "far more random" than any
widely used deterministic generator before it.  The patch I mentioned
above was to fix real problems in real life, where people using simple
seeding schemes systematically ended up with results so transparently
ludicrous nobody could possibly accept them for any purpose.

"The fix" consisted of using scrambling functions to spray the bits in
the user-*supplied* seed all over the place, in a pseudo-random way,
to probabilistically ensure "the real" state wouldn't end up with "too
many" zero bits.  "A lot of zero bits" tends to persist across MT
state transitions for a long time.

> and trying to avoid this happening would introduce bias that really
> could cause problems in practice.

For example, nobody _needs_ a generator capable of producing hundreds
of 0.0 in a row.  Use a variant even of MT with a much smaller period,
and that problem goes away, with no bad effects on any app.

Push that more, and what many Monte Carlo applications _really_ want
is "low discrepancy":  some randomness is essential, but becomes a
waste of cycles and assaults "common sense" if it gets too far from
covering the domain more-or-less uniformly.  So there are many ways
known of generating "quasi-random" sequences instead, melding a notion
of randomness with guarantees of relatively low discrepancy (low
deviation from uniformity).  Nothing works best for all purposes -
except Python.

> A good probabilistic program is one that has a high probability
> of returning some useful result, but they always have some
> low probability of returning something weird. So this is just
> saying that most people don't understand probability.

Or nobody does.  Probability really is insufferably subtle.  Guido
should ban it.

> Which is true, but there isn't much that the random module can do
> about it :-)

Python should just remove it.  It's an "attractive nuisance" to everyone ;-)