[Python-ideas] Globally configurable random number generation

Nick Coghlan ncoghlan at gmail.com
Mon Sep 14 15:32:17 CEST 2015


This is an expansion of the random module enhancement idea I
previously posted to Donald's thread:
https://mail.python.org/pipermail/python-ideas/2015-September/035969.html

I'll write it up as a full PEP later, but I think it's just as useful
in this form for now.

= Defining the problem =

We're moving into an era where the easiest way to publish software is
as a web application, with "deployment" to client systems done at
runtime via a web browser. It's regularly the case that "learn to
program" classes (especially those aimed at adults picking up
programming for the first time) will introduce folks to both a web
development framework and how to deploy web applications on a
developer focused service with a free hosting tier, like Heroku or
OpenShift.

It's also the case that we live in an era where there's a lot of
well-intentioned-but-actually-bad advice on the internet when it comes
to generating security sensitive tokens, and the folks receiving that
advice through forums like Stack Overflow aren't necessarily ever
going to see the "don't do that" guidance in the standard library's
random module documentation, or the docs for the cryptography library,
or the docs for a web framework like Flask, Django or Pyramid.

One of the ways we know many of the folks doing web development often
don't take admonitions in documentation seriously is because one of
the most popular web servers for Python on these kinds of services is
Django's "runserver", even though Django's docs specifically say only
to use that for local development. It isn't OK to say "the developers
deserve the consequences that come to them" as in many case, it isn't
the developers that suffer the consequences, but the users of their
applications.

One reason we know weak RNGs can be a problem in practice is because
the same kind of concern exists in PHP web applications, and
https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf
shows how the relative predictability of password reset tokens can be
used to compromise administrator accounts.

Rather than playing whackamole with individual web applications (many
of which will be written by inexperienced developers), or attempting
to demonstrate that a deterministic PRNG is "secure enough" for these
use cases (when the research on PHP and deterministic PRNGs in general
indicates that it isn't), it is proposed to migrate Python to a
default random implementation that *is* known to be secure enough for
these kinds of use cases.

At the same time, deterministic random number generation is still
desirable in many situations, and we also don't want to require that
folks learning Python in the future be required to take a crash course
in web application security theory first. Thus, it is also proposed
that the abstraction used to present these differences to end users
minimise the references to the underlying security concepts.

A key outcome of this proposal is that it will retroactively upgrade a
lot of existing instructions on the internet for generating default
passwords and other sensitive tokens in Python from "actively harmful"
to "not necessarily ideal, but at least not wrong if you're using
Python 3.6+".

This *is* a compatibility break for the sake of correcting default
behaviours that are fine when developing applications for local use,
but problematic from a network service security perspective, just as
happened with the introduction of hash randomisation. Unlike the hash
randomisation change, this one is readily addressed in old versions on
a case by case basis, so it is only proposed to make the change in a
future feature release of Python, not in any current maintenance
releases.

= Core abstraction =

The core concept of this proposal involves classifying random number
generators in Python as follows:

* seedable
* seedless
* system

These terms are chosen to make sense to folks that have *no idea*
about the way different kinds of random number generator work and how
that affects their security properties, but do know whether or not
they need to be able to pass in a particular fixed seed in order to
regenerate the same series of outputs.

The guidance to Python users is then:

* we use the seedless RNG by default as it provides the best balance
of speed and security
* if you need to be able to exactly reproduce output sequences, use
the seedable RNG
* if you know you're doing security sensitive work, use the system RNG
directly to eliminate Python's seedless RNG as a potential source of
vulnerabilities

Importantly, there are relatively simple answers to the following two
questions (which could be added to the Design FAQ):

Q: Why isn't the seedable RNG the default random implementation (any more)?
A: The same properties that make it possible to provide an explicit
seed to the seedable RNG and get a predictable series of outputs make
it inappropriate for tasks like generating session IDs and password
reset tokens in web applications. Since folks continued to use the
default RNG for those cases, even after years of the core development
team, web framework developers and security engineers saying "Don't do
that, use the system RNG instead", we eventually changed the default
behaviour to just make those cases OK.

Q: Why isn't the system RNG the default implementation?
A: Due to the way operating systems work, calling into the kernel to
get a random number is always going to be slower than generating one
within the Python runtime. The default seedless generator provides
most of the same benefits as using the system RNG directly, but is an
order of magnitude faster as it doesn't need to call into the kernel
as often.

= Proposed change for Python 3.6 =

* add a random.SeedlessRandom API that omits the seed(), getstate()
and setstate() methods and uses a cryptographically secure PRNG
internally (such as the ChaCha20 algorithm implemented by OpenBSD)
* rename random.Random to random.SeedableRandom
* make random.Random a subclass of SeedableRandom that deprecates
seed(), getstate() and setstate()
* deprecate the seed(), getstate() and setstate() methods on SystemRandom
* expose the global SeedableRandom instance as random.seedable_random
* expose a global SeedlessRandom instance as random.seedless_random
* expose a global SystemRandom instance as random.system_random
* provide a random.set_default_instance() API that makes it possible
to specify the instance used by the module level methods
* the module level seed(), getstate(), and setstate() functions will
throw RuntimeError if the corresponding method is missing from the
default instance

In 3.6, "random.set_default_instance(random.seedless_random)" will opt
in to the CSPRNG when using the module level functions process wide,
while "from random import seedless_random as random" will do so on a
module by module basis.

"from random import system_random as random" also becomes available as
a simple upgrade path for security sensitive modules.

Appropriate helpers would be added to the six and future projects to
allow single source Python 2/3 projects to easily cope with the change
in behaviour when using the seeded RNG for its intended purposes. For
many projects, compatibility code will consist of the following lines
in a compatibility module:

    try:
        from random import seedable_random as random
    except ImportError:
        import random

It would also be desirable for the seedless random number generator to
be made available as a PyPI package for use on older Python versions.

= Proposed change for Python 3.7 =

* random.Random becomes an alias for random.SeedlessRandom
* the default instance changes to be random.seedless_random

In 3.7, "random.set_default_instance(random.seedable_random)" will opt
back in to the deterministic PRNG when using the module level
functions process wide, while "from random import seedable_random as
random" will do so on a module by module basis.

= Seedable random number generation =

This is what we have today. The MT random implementation supports
explicit seeding, state retrieval, and state restoration. It doesn't
automatically mix in additional system entropy as it operates.

This is the right choice for use cases like computer games, map
generation, and randomising the order of test execution, as in these
situations, it's desirable to be able to reproduce a past sequence
exactly.

= Seedless random number generators =

This is the key proposed new addition: a cryptographically secure,
non-deterministic, userspace PRNG. It's faster than the system RNG as
it avoids the need to make a system API call.

The "seedless" name comes from the fact that the inability to feed in
a fixed seed is the most obvious API difference relative to
deterministic RNGs, and hence provides a mental hook for people to
remember which is which, without needing to know the relevant
background security theory (which is arcane enough to be opaque even
to developers with decades of experience and hence isn't something we
want to be inflicting on folks in the process of learning to program).

= System random number generator =

The only proposed change here is providing a default instance to
enable the "from random import system_random as random" pattern.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list