[Python-ideas] Should our default random number generator be secure?

Sun Sep 13 03:00:17 CEST 2015

[Marc-Andre, puzzling over Tim's MT state-recovering hack]
> ...
> Since the calculation is forward looking, your trick will only
> work if you can make sure that i + 397 doesn't wrap around
> into the previous state before you trigger the recalc in
> newrand.
>
> Which is easy, of course, since you can control the current
> index of newrand and force it to do a recalc with the next
> call to .getrandbits() by setting it to 624.
>
> Clever indeed :-)

I'll suggest a different way to look at it:  suppose you wanted to
reproduce the state at _the start_ of the 624 values captured instead.
Well, we'd do exactly the same thing, except set the index to 0.  Then
it's utterly obvious that your MT instance would spit out exactly the
same 624 outputs as the ones captured.  That's all the internals do
when the index starts at 0:  march through the state vector one word
at a time, spitting out the tempered version of whichever 32-bit word
is current.  And increment the index each time (the only _mutation_ of
any part of the MT internals).

At the end of that, the only change to the internals is that the index
would be left at 624.  Which is exactly what "my code" sets it to.  It
acts exactly the same as if we had just finished generating the 624
captured outputs.

Since we (in our heads) just reproduced enough bits to cover the
entire internal state, it must be the case that we'll continue to
reproduce all future outputs too.  The "wrap around" is a red herring
;-)

> Here's a better way to do the inversion without guess work:

"Better" depends.  Despite the variable named "guess" in the code,
it's not guessing about anything ;-)  It's a single function that
doesn't care (and can't even be told) whether a left or right shift is
being used, what the shift count is, whether a mask is in use, or even
what the word size is.

In those senses it's "better":  it can be used without change for
"anything like this", including, e.g., the 64-bit variant of MT.  Just
paste the C tempering lines into the lambdas.  Nothing about the
inversion function needs to change.

But why it works efficiently is far from obvious.  It _can_ take as
many (but not more) iterations as there are bits in a word, but that's
almost never needed.  IIRC, it can never require more than 8
iterations to invert any of the tempering functions in the 32-bit MT,
and, e.g., always inverts the very weak "lambda y: y ^ (y >> 18)"
32-bit MT transform on the first try.

Nevertheless, you can - as you did - be more efficient by writing
distinct inversion functions for "left shift" and "right shift" cases,
and wiring in the word size.  But the expense of deducing the state is
just plain trivial here either way.  We're not consuming days or hours
here, we're not even consuming an appreciable fraction of a second at
Python speed :-)