[Python-ideas] Should our default random number generator be secure?

Mon Sep 14 19:15:48 CEST 2015

[This is getting off-topic, so I'll stop after this reply]

On 14.09.2015 14:26, Nathaniel Smith wrote:
> On Mon, Sep 14, 2015 at 3:37 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 14.09.2015 08:38, Nathaniel Smith wrote:
>>> If Tim Peters can get fooled
>>> into thinking something like using MT to generate session ids is
>>> "probably mostly secure", then what chance do the rest of us have?
>>> <wink>
>>
>> I don't think that Tim can get fooled into believing he is a
>> crypto wonk ;-)
>>
>> The thread reveals another misunderstanding:
>>
>>  Broken code doesn't get any better when you change the context
>>  in which it is run.
> 
> As an aphorism this sounds nice, but logically it makes no sense. If
> the broken thing about your code is that it assumes that the output of
> the RNG is unguessable, and you change the context by making the
> output of the RNG unguessable, then now the code it isn't broken.

It's still broken, because it's making wrong assumptions on the
documented context and given that it did in the first place,
suggests that this is not the only aspect of it being broken
(pure speculation, but experience shows that bugs usually
run around in groups ;-)).

> The code would indeed remain broken when run under e.g. older
> interpreters, but this is not an argument that we should make sure
> that it stays broken in the future.
> 
>> By fixing the RNG used in such broken code and making it
>> harder to run attacks, you are only changing the context in which
>> the code is run. The code itself still remains broken.
>>
>> Code which uses the output from an RNG as session id without adding
>> any additional security measures is broken, regardless of what kind
>> of RNG you are using. I bet such code will also take any session id
>> it receives as cookie and trust it without applying extra checks
>> on it.
> 
> Yes, that's... generally the thing you do with session cookies?
> They're shared secret string that you use as keys into some sort of
> server-side session database? What extra checks need to be applied?

You will at least want to add checks that the session id string was
indeed generated by the server and not some bot trying to
find valid session ids, e.g. by signing the session id and
checking the sig on incoming requests.

Other things you can do: fold timeouts into the id, add IP addresses,
browser sigs, request sequence numbers.

You also need to make sure that the session ids are taken from
a large enough set to make it highly unlikely that someone
can guess the id simply in case the number of active
sessions is significant compared to the universe
of possible ids, e.g. 32-bit ids are great for database indexes,
but a pretty bad idea if you have millions of active sessions.

>> Rather than trying to fix up the default RNG in Python by replacing
>> it with a crypto RNG, it's better to open bug reports to get the
>> broken software fixed.
>>
>> Replacing the default Python RNG with a new unstudied crypto one,
>> will likely introduce problems into working code which rightly
>> assumes the proven statistical properties of the MT.
>>
>> Just think of the consequences of adding unwanted bias to simulations.
>> This is far more likely to go unnoticed than a session highjack due
>> to a broken system and can easily cost millions (or earn you
>> millions - it's all probability after all :-)).
> 
> I'm afraid you just don't understand what you're talking about here.
> 
> When it comes to adding bias to simulations, all crypto RNGs have
> *better* statistical properties than MT. A crypto RNG which was merely
> as statistically-well-behaved as MT would be considered totally
> broken, because MT doesn't even pass black-box tests of randomness
> like TestU01.

I am well aware that MT doesn't satisfy all empirical tests
and also that it is not a CSPRNG (see the code Tim and I discussed
in this thread showing how easy it is to synchronize to an existing
MT RNG if you can gain knowledge of 624 output values).

However, it has been extensively studied and it is proven to be
equidistributed which is a key property needed for it to be used as
basis for other derived probability distributions (as it done by the
random module).

For CSPRNGs you can empirically test properties, but due to their
nature not prove e.g. them being equidistributed - even though they
usually will pass standard frequency tests. For real-life purposes,
you're probably right with them not being biased. I'm a mathematician,
though, so like provable more than empirical :-)

The main purpose of CSPRNGs is producing output which you cannot guess,
not to produce output which has provable distribution qualities.
They do this in a more efficient way than having to wait for enough
entropy to be collected - basically making true random number
generators practically usable.

There's a new field which appears to be popular these days:
"Chaotic Pseudo Random Number Generators" (CPRNGs). These are based
on chaotic systems and are great for making better use of available
entropy.

I'm sure we'll have something similar to the MT for these
chaotic systems come out of this research in a while and then Python
should follow this by implementing it in a new module.

Until then, I think it's worthwhile using the existing rand
code in OpenSSL and exposing this through the ssl module:

https://www.openssl.org/docs/man1.0.1/crypto/rand.html

It interfaces to platform hardware true RNGs where available, falls
back to an SHA-1 based 1k pool based generator where needed. It's
being used for SSL session keys, key generation, etc.,
trusted by millions of people and passes the NIST tests.

This paper explains the algorithm in more detail:

http://webpages.uncc.edu/yonwang/papers/lilesorics.pdf

The downside of the OpenSSL implementation is that it can
fail if there isn't enough entropy available.

Here's a slightly better algorithm, but it's just one of many
which you can find when searching for CPRNGs:

https://eprint.iacr.org/2012/471.pdf

>> Now, pointing people who write broken code to a new module which
>> provides a crypto RNG probably isn't much better either. They'd feel
>> instantly secure because it says "crypto" on the box and forget
>> about redesigning their insecure protocol as well. Nothing much you
>> can do about that, I'm afraid.
> 
> Yes, improving the RNG only helps with some problems, not others; it
> might merely make a system harder to attack, rather than impossible to
> attack. But giving people unguessable random numbers by default does
> solve real problems.

Drop the "by default" and I agree, as will probably everyone else
in this thread :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 14 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________
2015-09-14: Released mxODBC Plone/Zope DA 2.2.3   http://egenix.com/go84
2015-09-18: PyCon UK 2015 ...                               4 days to go
2015-09-26: Python Meeting Duesseldorf Sprint 2015         12 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/