[Python-ideas] Should our default random number generator be secure?

Tim Peters tim.peters at gmail.com
Wed Sep 16 02:43:36 CEST 2015


...

[Marc-Andre]
> Ah, now we're getting somewhere :-)
>
> If we accept that non-guessable, but deterministic is a good
> compromise, then adding a cipher behind MT sounds like a reasonable
> way forward, even as default.
>
> For full crypto strength, people would still have to rely on
> solutions like /dev/urandom or the OpenSSL one (or reseed the
> default RNG every now and then). All others get the benefit of
> non-guessable, but keep the ability to seed the default RNG in
> Python.

I expect the only real reason "new entropy" is periodically mixed in
to OpenBSD's arc4random() is to guard against that a weakness in
ChaCha20 may be discovered later.  If there were any known
computationally feasible way whatsoever to distinguish bare-bones
ChaCha20's output from a "truly random" sequence, it wouldn't be
called "crypto" to begin with.

But reseeding MT every now and again is definitely not suitable for
crypto purposes.  You would need to reseed at least every 624 outputs,
and from a crypto-strength seed source.  In which case, why bother
with MT at all?  You could just as well use the crypto source
directly.


> Is there some research on this (MT + cipher or hash) ?

Oh, sure.  MT's creators noted from the start that it would suffice to
run MT's outputs through a crypto hash (like your favorite flavor of
SHA).  That's just as vulnerable to "poor seeding" attacks as plain
MT, but it's computationally infeasible to deduce the state from any
number of hashed outputs (unless your crypto hash is at  least partly
invertible, in which case it's not really a crypto hash ;-) ).;

For other approaches, search for CryptMT.  MT's creators suggested a
number of other schemes over the years.  The simplest throws away the
"tempering" part of MT (the 4 lines that map the raw state word into a
mildly scrambled output word - not because it needs to be thrown away,
but because they think it would no longer be needed given what
follows).  Then one byte is obtained via grabbing the next MT 32-bit
output, folding it into a persistent accumulator via multiplication,
and just revealing the top byte:

    accum = some_odd_integer
    while True:
        accum *= random.getrandbits(32) | 1
        yield accum >> 24

I did see one paper suggesting it was possible to distinguish the
output of that from a truly random sequence given 2**50 consecutive
outputs (but that's all - still no way to deduce the state).


More information about the Python-ideas mailing list