[Python-ideas] Should our default random number generator be secure?
M.-A. Lemburg
mal at egenix.com
Fri Sep 11 19:19:00 CEST 2015
On 11.09.2015 18:36, Tim Peters wrote:
> [M.-A. Lemburg <mal at egenix.com>]
>> ...
>> Now, leaving aside this bright future, what's reasonable today ?
>>
>> If you look at tools like untwister:
>>
>> https://github.com/bishopfox/untwister
>>
>> you can get a feeling for how long it takes to deduce the
>> seed from an output sequence. Bare in mind, that in order
>> to be reasonably sure that the seed is correct, the available
>> output sequence has to be long enough.
>>
>> That's a known plain text attack, so you need access to lots
>> of session keys to begin with.
>>
>> The tools is still running on an example set of 1000 32-bit
>> numbers and it says it'll be done in 1.5 hours, i.e. before
>> the sun goes down in my timezone. I'll leave it running to
>> see whether it can find my secret key.
>
> I'm only going to talk about current Python 3, because _any_ backward
> incompatible change is off limits for a bugfix release.
>
> So:
>
> 1. untwister appears _mostly_ to be probing for poor seeding schemes.
> Python 3's default "by magic" seeding is unimpeachable ;-) It's
> computationally infeasible to attack it.
>
> 2. If they knew they were targeting MT, and had 624 consecutive 32-bit
> outputs, they could compute MT's full internal state essentially
> instantly.
How would they do that ? MT's period is too large for
things like rainbow tables.
> #2 is hard to get, though. These "pick a passward" examples are only
> using a relative handful of bits from each 32-bit MT output. Attacks
> with such spotty info are "exponentially harder".
>
>> Untwister is only slightly smarter than bruteforce. Given
>> that MT has a seed size of 32 bits, it's not surprising that
>> a tool can find the seed within a day.
>
> No no no. MT's state is 19937 bits, and current .seed()
> implementations use every bit you pass to .seed().
Ah, right. I was looking at init_genrand() in the C implementation
which only takes a single 32-bit unsigned int as value.
The init_by_array() function does take seeds which use all
available bits.
I guess untwister indeed only tries the 32-bit unsigned int seeding
approach, as it keeps listing things like:
Progress: 50.99% [2190032137 / 4294967295] ~128296.63/sec 4 hours 33 minutes 30 [-]
> By default, current Python seeds the state with 2500 bytes (20000
> bits) from the system .urandom() (if available). That's why it's
> computationally infeasible for "poor seeding" searches to attack the
> default seeding: they have a space of 2**19937-1 (the all-0 state
> can't occur) to search through, each of which is equally likely
> (assuming the system .urandom() is doing _its_ job).
>
> Of course the user can screw that up by using their _own_ seed. But,
> by default, current Pythons already do the best possible seeding job.
>
>
>> Perhaps it's time to switch to a better version of MT, e.g.
>> a 64-bit version (with 64-bit internal state):
>>
>> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html
>>
>> or an even faster SIMD variant with better properties and
>> 128 bit internal state:
>>
>> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html
>>
>> Esp. the latter will help make brute force attacks practically
>> impossible.
>>
>> Tim ?
>
> We already have a 19937-bit internal state, and current seeding
> schemes don';t hide that.
Ouch. I confused internal state with the output size. Sorry.
So we're more than fine already and it's only the cracking
tools that are apparently broken :-)
> I would like to move to a different generator entirely someday, but
> not before some specific better-than-MT alternative gains significant
> traction outside the Python world ("better a follower than a leader"
> in this area).
Another candidate is the new WELL family:
http://www.iro.umontreal.ca/~panneton/WELLRNG.html
This has some nicer properties w/r to booting out of zeroland
(as they call it: too many 0 bits in the seed).
>> BTW: Looking at the sources of the _random module, I found that
>> the seed function uses the hash of non-integers such as e.g.
>> strings passed to it as seeds. Given the hash randomization
>> for strings this will create non-deterministic results, so it's
>> probably wise to only use 32-bit integers as seed values for
>> portability, if you need to rely on seeding the global Python
>> RNG.
>
> None of that applies to Python 3.
Well, it still does for the .seed() C implementation in _random.c,
but since that's overridden in Python 3's Random class, you can't
access it anymore :-)
> `seed()` string inputs go through
> this path now:
>
> if isinstance(a, (str, bytes, bytearray)):
> if isinstance(a, str):
> a = a.encode()
> a += _sha512(a).digest()
> a = int.from_bytes(a, 'big')
> super().seed(a)
>
> IOW, a crypto hash is _appended_ to the string, but no info from the
> original string is lost (but, if you ask me, this particular step is
> useless - it adds no "new entropy"). The whole mess is converted to a
> giant integer, again with no loss of input information. And every bit
> of the giant integer affects what `super().seed(a) does`.
As far as I'm concerned this maps to case closed.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Sep 11 2015)
>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2015-09-18: PyCon UK 2015 ... 7 days to go
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list