[Python-ideas] Python's Source of Randomness and the random.py module Redux

Thu Sep 10 16:21:11 CEST 2015

On September 10, 2015 at 9:44:13 AM, Paul Moore (p.f.moore at gmail.com) wrote:
> On 10 September 2015 at 14:10, Donald Stufft wrote:
> >> I don't understand the phrase "if you needed determinism, it would
> >> hurt you to say so". Could you clarify?
> >
> > I transposed some words, fixed:
> >
> > "If you needed determinism, would it hurt you to say so?""
>  
> Thanks.
>  
> In one sense, no it wouldn't. Nor would it matter to me if "the
> default random number generator" was fast and cryptographically
> secure. What matters is just that I get a load of random (enough)
> numbers.
>  
> What hurts somewhat (not enormously, I'll admit) is up front having to
> think about whether I need to be able to capture a seed and replay it.
> That's nearly always something I'd think of way down the line, as a
> "wouldn't it be nice if I could get the user to send me a reproducible
> test case" or something like that. And of course it's just a matter of
> switching the underlying RNG at that point.
> 
> None of this is hard. But once again, I'm currently using the module
> correctly, as documented.

This is actually exactly why Theo suggested using a modern, userland CSPRNG
because it can generate random numbers faster than /dev/urandom can and, unless
you need deterministic results, there's little downside to doing so. 

There's really two possible ideas here that depends on what sort of balance
we'd want to strike. We can make a default "I don't want to think about it"
implementation of random that is both *generally* secure and fast, however it
won't be deterministic and you won't be able to explicitly seed it. This would
be a backwards compatible change [1] for people who are simply calling these
functions [2]:

    random.getrandbits
    random.randrange
    random.randint
    random.choice
    random.shuffle
    random.sample
    random.random
    random.uniform
    random.triangular
    random.betavariate
    random.expovariate
    random.gammavariate
    random.gauss
    random.lognormvariate
    random.normalvariate
    random.vonmisesvariate
    random.paretovariate
    random.weibullvariate

If this were all that the top level functions in random.py provided we could
simply replace the default and people wouldn't notice, they'd just
automatically get safer randomness whether that's actually useful for their
use case or not.

However, random.py also has these functions:

    random.seed
    random.getstate
    random.setstate
    random.jumpahead

and these functions are where the problem comes. These functions only really
make sense for deterministic sources of random which are not "safe" for use
in security sensitive applications. So pretending for a moment that we've
already decided to do "something" about this, the question boils down to what
do we do about these 4 functions. Either we can change the default to a secure
CSPRNG and break these functions (and the people using them) which is however
easily fixed by changing ``import random`` to
``import random; random = random.DeterministicRandom()`` or we can deprecate
the top level functions and try to guide people to choose up front what kind
of random they need. Either of these solutions will end up with people being
safer and, if we pretend we've agreed to do "something", it comes down to
whether we'd prefer breaking compatability for some people while keeping a
default random generator that is probably good enough for most people, or if
we'd prefer to not break compatability and try to push people to always
deciding what kind of random they want.

Of course, we still haven't decided that we should do "something", I think that
we should because I think that secure by default (or at least, not insecure by
default) is a good situation to be in. Over the history of computing it's been
shown that time and time again that trying to document or educate users is
error prone and doesn't scale, but if you can design APIs to make the "right"
thing obvious and opt-out and require opting in to specialist [3] cases which
require some particular property.

[1] Assuming Theo's claim of the speed of the ChaCha based arc4random function
    is accurate, which I haven't tested but I assume he's smart enough to know
    what he's talking about WRT to speed of it.

[2] I believe anyways, I don't think that any of these rely on the properties
    of MT or a deterministic source of random, just a source of random.

[3] In this case, their are two specialist use cases, those that require
    deterministic results and those that require specific security properties
    that are not satisified by a userland CSPRNG because a userland CSPRNG is
    not as secure as /dev/urandom but is able to be much faster.

>  
> I've omitted most of the rest of your response largely because we're
> probably just going to have to agree to differ. I'm probably too worn
> out being annoyed at the way that everything ends up needing to be
> security related, and the needs of people who won't read the docs
> determines API design, to respond clearly and rationally :-(
>  
> Paul
>  

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA