[Python-ideas] Should our default random number generator be secure?
M.-A. Lemburg
mal at egenix.com
Sat Sep 12 13:35:48 CEST 2015
On 12.09.2015 13:31, M.-A. Lemburg wrote:
> On 12.09.2015 05:23, Tim Peters wrote:
>> [Marc-Andre]
>> ...
>>>> While implementing this, I found that there's a bit more trickery
>>>> involved due to the fact that the MT RNG in Python writes the
>>>> 624 words internal state in batches - once every 624 times
>>>> the .getrandbits() function is called.
>>>>
>>>> So you may need up to 624*2 - 1 output values to determine a
>>>> correct array of internal state values.
>>
>> [Tim]
>>> Don't be too sure about that. From an information-theoretic view,
>>> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31
>>> more bits than the internal state actually has. You don't need to
>>> reproduce Python's current internal MT state exactly, you only need to
>>> create _a_ MT state that will produce the same values forever after.
>>> Specifically, the index of the "current" vector element is an artifact
>>> of the implementation, and doesn't need to be reproduced. You're free
>>> to set that index to anything you like in _your_ MT state - the real
>>> goal is to get the same results.
>>
>> Concrete proof of concept. First code to reconstruct state from 624
>> consecutive 32-bit outputs:
>>
>> def invert(transform, output, n=100):
>> guess = output
>> for i in range(n):
>> newguess = transform(guess)
>> if newguess == output:
>> return guess
>> guess = newguess
>> raise ValueError("%r not invertible in %s tries" %
>> (output, n))
>>
>> t1 = lambda y: y ^ (y >> 11)
>> t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680)
>> t3 = lambda y: y ^ ((y << 15) & 0xefc60000)
>> t4 = lambda y: y ^ (y >> 18)
>>
>> def invert_mt(y):
>> y = invert(t4, y)
>> y = invert(t3, y)
>> y = invert(t2, y)
>> y = invert(t1, y)
>> return y
>>
>> def guess_state(vec):
>> assert len(vec) == 624
>> return (3,
>> tuple(map(invert_mt, vec)) + (624,),
>> None)
>>
>> Now we can try it:
>>
>> import random
>> for i in range(129):
>> random.random()
>>
>> That loop was just to move MT into "the middle" of its internal
>> vector. Now grab values:
>>
>> vec = [random.getrandbits(32) for i in range(624)]
>>
>> Note that the `guess_state()` function above _always_ sets the index
>> to 624. When it becomes obvious _why_ it does so, all mysteries will
>> vanish ;-)
>>
>> Now create a distinct generator and force its state to the deduced state:
>>
>> newrand = random.Random()
>> newrand.setstate(guess_state(vec))
>>
>> And some quick sanity checks:
>>
>> for i in range(1000000):
>> assert random.random() == newrand.random()
>> for i in range(1000000):
>> assert random.getrandbits(32) == newrand.getrandbits(32)
>>
>> The internal states are _not_ byte-for-byte identical. But they don't
>> need to be. The artificial `index` bookkeeping variable allows
>> hundreds of distinct spellings of _semantically_ identical states.
>
> It's a rolling index, yes, but when creating the vector of output
> values, the complete internal state array will have undergone
> a recalc at one of the iterations.
>
> The guess_state(vec) function will thus return an internal
> state vector that is half state of the previous recalc run,
> half new recalc run, it is not obvious to me why you would
> still be able to get away with not synchronizing to the next
> recalc in order to have a complete state from the current recalc.
>
> Let's see...
>
> The values in the state array are each based on
>
> a) previous state[i]
> b) state[(i + 1) % 624]
> c) state[(i + 397) % 624]
>
> Since the calculation is forward looking, your trick will only
> work if you can make sure that i + 397 doesn't wrap around
> into the previous state before you trigger the recalc in
> newrand.
>
> Which is easy, of course, since you can control the current
> index of newrand and force it to do a recalc with the next
> call to .getrandbits() by setting it to 624.
>
> Clever indeed :-)
>
> Here's a better way to do the inversion without guess work:
>
> # 32-bits all set
> ALL_BITS_SET = 0xffffffffL
>
> def undo_bitshift_right_xor(value, shift, mask=ALL_BITS_SET):
>
> # Set shift high order bits; there's probably a better way to
> # do this, but this does the trick for now
> decoding_mask = (ALL_BITS_SET << (32 - shift)) & ALL_BITS_SET
> decoded_part = 0
> result = 0
> while decoding_mask > 0:
> decoded_part = (value ^ (decoded_part & mask)) & decoding_mask
> result |= decoded_part
> decoded_part >>= shift
> decoding_mask >>= shift
> return result
>
> def undo_bitshift_left_xor(value, shift, mask=ALL_BITS_SET):
>
> # Set shift low order bits
> decoding_mask = ALL_BITS_SET >> (32 - shift)
> decoded_part = 0
> result = 0
> while decoding_mask > 0:
> decoded_part = (value ^ (decoded_part & mask)) & decoding_mask
> result |= decoded_part
> decoded_part = (decoded_part << shift) & ALL_BITS_SET
> decoding_mask = (decoding_mask << shift) & ALL_BITS_SET
> return result
>
> def recover_single_state_value(value):
>
> value = undo_bitshift_right_xor(value, 18)
> value = undo_bitshift_left_xor(value, 15, 0xefc60000L)
> value = undo_bitshift_left_xor(value, 7, 0x9d2c5680L)
> value = undo_bitshift_right_xor(value, 11)
> return value
>
> def guess_state(data):
Hmm, the name doesn't fit anymore, better call it:
def recover_state(data):
> if len(data) < 624:
> raise TypeError('not enough data to recover state')
>
> # Only work with the 624 last entries
> data = data[-624:]
> state = [recover_single_state_value(x)
> for x in data]
> return (3,
> tuple(state) + (624,),
> None)
>
> This is inspired by the work of James Roper, but uses a slightly
> faster approach for the undo functions. Not that it matters much.
> It was fun, that's what matters :-)
>
> Oh, in Python 3, you need to remove the 'L' after the constants.
> Too bad that it doesn't recognize those old annotations anymore.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Sep 12 2015)
>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2015-09-18: PyCon UK 2015 ... 6 days to go
2015-10-21: Python Meeting Duesseldorf ... 39 days to go
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list