[Python-ideas] Should our default random number generator be secure?

Wed Sep 16 10:21:23 CEST 2015

On 16.09.2015 02:43, Tim Peters wrote:
> ...
> 
> [Marc-Andre]
>> Ah, now we're getting somewhere :-)
>>
>> If we accept that non-guessable, but deterministic is a good
>> compromise, then adding a cipher behind MT sounds like a reasonable
>> way forward, even as default.
>>
>> For full crypto strength, people would still have to rely on
>> solutions like /dev/urandom or the OpenSSL one (or reseed the
>> default RNG every now and then). All others get the benefit of
>> non-guessable, but keep the ability to seed the default RNG in
>> Python.
> 
> I expect the only real reason "new entropy" is periodically mixed in
> to OpenBSD's arc4random() is to guard against that a weakness in
> ChaCha20 may be discovered later.  If there were any known
> computationally feasible way whatsoever to distinguish bare-bones
> ChaCha20's output from a "truly random" sequence, it wouldn't be
> called "crypto" to begin with.
> 
> But reseeding MT every now and again is definitely not suitable for
> crypto purposes.  You would need to reseed at least every 624 outputs,
> and from a crypto-strength seed source.  In which case, why bother
> with MT at all?  You could just as well use the crypto source
> directly.
> 
> 
>> Is there some research on this (MT + cipher or hash) ?
> 
> Oh, sure.  MT's creators noted from the start that it would suffice to
> run MT's outputs through a crypto hash (like your favorite flavor of
> SHA).  That's just as vulnerable to "poor seeding" attacks as plain
> MT, but it's computationally infeasible to deduce the state from any
> number of hashed outputs (unless your crypto hash is at  least partly
> invertible, in which case it's not really a crypto hash ;-) ).;
> 
> For other approaches, search for CryptMT.  MT's creators suggested a
> number of other schemes over the years.  The simplest throws away the
> "tempering" part of MT (the 4 lines that map the raw state word into a
> mildly scrambled output word - not because it needs to be thrown away,
> but because they think it would no longer be needed given what
> follows).  Then one byte is obtained via grabbing the next MT 32-bit
> output, folding it into a persistent accumulator via multiplication,
> and just revealing the top byte:
> 
>     accum = some_odd_integer
>     while True:
>         accum *= random.getrandbits(32) | 1
>         yield accum >> 24
> 
> I did see one paper suggesting it was possible to distinguish the
> output of that from a truly random sequence given 2**50 consecutive
> outputs (but that's all - still no way to deduce the state).

> [Tim, on CryptMT]
>> I did see one paper suggesting it was possible to distinguish the
>> output of that from a truly random sequence given 2**50 consecutive
>> outputs (but that's all - still no way to deduce the state).
>
> Sorry:  not 2**50 consecutive outputs (which are bytes), but 2**50
> consecutive output bits, so only 2**47 outputs.

Thanks for the "CryptMT" pointers. I'll do some research after PyCon UK
on this.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/CRYPTMT/index.html

A quick glimpse at

http://www.ecrypt.eu.org/stream/p3ciphers/cryptmt/cryptmt_p3.pdf

suggests that this is a completely new stream cipher, though it
uses the typical elements (key + non-linear filter + feedback loop).

The approach is interesting, though: they propose an PRNG which
can then get used as stream cipher by XOR'ing the PRNG output with
the data stream. So the PRNG implies the cipher, not the other way
around as many other approaches to CSPRNGs.

That's probably also one of its perceived weaknesses: it's different
than the common approach.

On 16.09.2015 02:55, Tim Peters wrote:> [Tim]
>> ...
>> Oh, sure.  MT's creators noted from the start that it would suffice to
>> run MT's outputs through a crypto hash (like your favorite flavor of
>> SHA).  That's just as vulnerable to "poor seeding" attacks as plain
>> MT, but it's computationally infeasible to deduce the state from any
>> number of hashed outputs
>
> Although what's "computationally feasible" may well have changed since
> then!  These days I expect even a modestly endowed attacker could
> afford to store an exhaustive table of the 2**32 possible outputs and
> their corresponding hashes.  Then the hashes are 100% invertible via
> simple lookup, so are no better than not hashing at all.

Simply adding a hash doesn't sound like a good idea.
My initial thought was using a (well studied) stream cipher on
the output, not just a hash on the output.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 16 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________
2015-09-14: Released mxODBC Plone/Zope DA 2.2.3   http://egenix.com/go84
2015-09-18: PyCon UK 2015 ...                               2 days to go
2015-09-26: Python Meeting Duesseldorf Sprint 2015         10 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/