[Python-ideas] Should our default random number generator be secure?
Tim Peters
tim.peters at gmail.com
Sat Sep 12 05:23:42 CEST 2015
[Marc-Andre]
...
>> While implementing this, I found that there's a bit more trickery
>> involved due to the fact that the MT RNG in Python writes the
>> 624 words internal state in batches - once every 624 times
>> the .getrandbits() function is called.
>>
>> So you may need up to 624*2 - 1 output values to determine a
>> correct array of internal state values.
[Tim]
> Don't be too sure about that. From an information-theoretic view,
> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31
> more bits than the internal state actually has. You don't need to
> reproduce Python's current internal MT state exactly, you only need to
> create _a_ MT state that will produce the same values forever after.
> Specifically, the index of the "current" vector element is an artifact
> of the implementation, and doesn't need to be reproduced. You're free
> to set that index to anything you like in _your_ MT state - the real
> goal is to get the same results.
Concrete proof of concept. First code to reconstruct state from 624
consecutive 32-bit outputs:
def invert(transform, output, n=100):
guess = output
for i in range(n):
newguess = transform(guess)
if newguess == output:
return guess
guess = newguess
raise ValueError("%r not invertible in %s tries" %
(output, n))
t1 = lambda y: y ^ (y >> 11)
t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680)
t3 = lambda y: y ^ ((y << 15) & 0xefc60000)
t4 = lambda y: y ^ (y >> 18)
def invert_mt(y):
y = invert(t4, y)
y = invert(t3, y)
y = invert(t2, y)
y = invert(t1, y)
return y
def guess_state(vec):
assert len(vec) == 624
return (3,
tuple(map(invert_mt, vec)) + (624,),
None)
Now we can try it:
import random
for i in range(129):
random.random()
That loop was just to move MT into "the middle" of its internal
vector. Now grab values:
vec = [random.getrandbits(32) for i in range(624)]
Note that the `guess_state()` function above _always_ sets the index
to 624. When it becomes obvious _why_ it does so, all mysteries will
vanish ;-)
Now create a distinct generator and force its state to the deduced state:
newrand = random.Random()
newrand.setstate(guess_state(vec))
And some quick sanity checks:
for i in range(1000000):
assert random.random() == newrand.random()
for i in range(1000000):
assert random.getrandbits(32) == newrand.getrandbits(32)
The internal states are _not_ byte-for-byte identical. But they don't
need to be. The artificial `index` bookkeeping variable allows
hundreds of distinct spellings of _semantically_ identical states.
More information about the Python-ideas
mailing list