[Python-ideas] Should our default random number generator be secure?

Tim Peters tim.peters at gmail.com
Sat Sep 12 03:19:19 CEST 2015


[Tim, on recovering MT state from outputs]
>>> https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html

[Marc-Andre]
>> Indeed very nice. Thanks for the pointer.
>>
>> I wonder why untwister doesn't use this. I gave it 1000 32-bit
>> integers, so it should have enough information to recover the
>> seed in a short while, but it's still trying to find the seed.
>> Oh, and it now shows: 5 days 21 hours left. I stopped it there.

As you went on to discover, while the writeup gives enough to convince
you it's possible, there are always details ;-)


> Turns out this will have to be named random.recover_state().
>
> Getting at the seed is too difficult, esp. for strings in Python 3,
> and not really worth the effort anyway.

It's flatly impossible to ever know what the seed was, unless you
_also_ know exactly how many times MT was invoked before the first
output you captured.  Think about that a bit, and I'm sure you'll see
that's obvious.  Even if you did know how many times, it would still
be impossible without more assumptions, since seed arguments can
contain any number of bits.


> While implementing this, I found that there's a bit more trickery
> involved due to the fact that the MT RNG in Python writes the
> 624 words internal state in batches - once every 624 times
> the .getrandbits() function is called.
>
> So you may need up to 624*2 - 1 output values to determine a
> correct array of internal state values.

Don't be too sure about that.  From an information-theoretic view,
"it's obvious" that 624 32-bit outputs is enough - indeed, that's 31
more bits than the internal state actually has. You don't need to
reproduce Python's current internal MT state exactly, you only need to
create _a_ MT state that will produce the same values forever after.
Specifically, the index of the "current" vector element is an artifact
of the implementation, and doesn't need to be reproduced.  You're free
to set that index to anything you like in _your_ MT state - the real
goal is to get the same results.


More information about the Python-ideas mailing list