I moved my os.urandom PEP to the regular GitHub repository so it gets
available on python.org:
Changes since version 2:
* add os.getrandom()
* random.Random now always use non-blocking system urandom and so
remove the "Never use blocking urandom in the random module" section
* add a new "Examples using os.getrandom()" section
* various fixes and minor updates
Title: Make os.urandom() blocking on Linux
Author: Victor Stinner <victor.stinner(a)gmail.com>
Type: Standards Track
Modify ``os.urandom()`` to block on Linux 3.17 and newer until the OS
urandom is initialized to increase the security.
Add also a new ``os.getrandom()`` function (for Linux and Solaris) to be
able to choose how to handle when ``os.urandom()`` is going to block on
Python 3.5.0 was enhanced to use the new ``getrandom()`` syscall
introduced in Linux 3.17 and Solaris 11.3. The problem is that users
started to complain that Python 3.5 blocks at startup on Linux in
virtual machines and embedded devices: see issues `#25420
<http://bugs.python.org/issue25420>`_ and `#26839
On Linux, ``getrandom(0)`` blocks until the kernel initialized urandom
with 128 bits of entropy. The issue #25420 describes a Linux build
platform blocking at ``import random``. The issue #26839 describes a
short Python script used to compute a MD5 hash, systemd-cron, script
called very early in the init process. The system initialization blocks
on this script which blocks on ``getrandom(0)`` to initialize Python.
The Python initilization requires random bytes to implement a
counter-measure against the hash denial-of-service (hash DoS), see:
* `Issue #13703: Hash collision security issue
* `PEP 456: Secure and interchangeable hash algorithm
Importing the ``random`` module creates an instance of
``random.Random``: ``random._inst``. On Python 3.5, random.Random
constructor reads 2500 bytes from ``os.urandom()`` to seed a Mersenne
Twister RNG (random number generator).
Other platforms may be affected by this bug, but in practice, only Linux
systems use Python scripts to initialize the system.
Status in Python 3.5.2
Python 3.5.2 behaves like Python 2.7 and Python 3.4. If the system
urandom is not initialized, the startup does not block, but
``os.urandom()`` can return low-quality entropy (even it is not easily
The following use cases are used to help to choose the right compromise
between security and practicability.
Use Case 1: init script
Use a Python 3 script to initialize the system, like systemd-cron. If
the script blocks, the system initialize is stuck too. The issue #26839
is a good example of this use case.
Use case 1.1: No secret needed
If the init script doesn't have to generate any secure secret, this use
case is already handled correctly in Python 3.5.2: Python startup
doesn't block on system urandom anymore.
Use case 1.2: Secure secret required
If the init script has to generate a secure secret, there is no safe
Falling back to weak entropy is not acceptable, it would
reduce the security of the program.
Python cannot produce itself secure entropy, it can only wait until
system urandom is initialized. But in this use case, the whole system
initialization is blocked by this script, so the system fails to boot.
The real answer is that the system iniitalization must not be blocked by
such script. It is ok to start the script very early at system
initialization, but the script may blocked a few seconds until it is
able to generate the secret.
Reminder: in some cases, the initialization of the system urandom never
occurs and so programs waiting for system urandom blocks forever.
Use Case 2: Web server
Run a Python 3 web server serving web pages using HTTP and HTTPS
protocols. The server is started as soon as possible.
The first target of the hash DoS attack was web server: it's important
that the hash secret cannot be easily guessed by an attacker.
If serving a web page needs a secret to create a cookie, create an
encryption key, ..., the secret must be created with good entropy:
again, it must be hard to guess the secret.
A web server requires security. If a choice must be made between
security and running the server with weak entropy, security is more
important. If there is no good entropy: the server must block or fail
with an error.
The question is if it makes sense to start a web server on a host before
system urandom is initialized.
The issues #25420 and #26839 are restricted to the Python startup, not
to generate a secret before the system urandom is initialized.
Fix system urandom
Load entropy from disk at boot
Collecting entropy can take up to several minutes. To accelerate the
system initialization, operating systems store entropy on disk at
shutdown, and then reload entropy from disk at the boot.
If a system collects enough entropy at least once, the system urandom
will be initialized quickly, as soon as the entropy is reloaded from
Virtual machines don't have a direct access to the hardware and so have
less sources of entropy than bare metal. A solution is to add a
<https://fedoraproject.org/wiki/Features/Virtio_RNG>`_ to pass entropy
from the host to the virtual machine.
A solution for embedded devices is to plug an hardware RNG.
For example, Raspberry Pi have an hardware RNG but it's not used by
default. See: `Hardware RNG on Raspberry Pi
Denial-of-service when reading random
Don't use /dev/random but /dev/urandom
The ``/dev/random`` device should only used for very specific use cases.
Reading from ``/dev/random`` on Linux is likely to block. Users don't
like when an application blocks longer than 5 seconds to generate a
secret. It is only expected for specific cases like generating
explicitly an encryption key.
When the system has no available entropy, choosing between blocking
until entropy is available or falling back on lower quality entropy is a
matter of compromise between security and practicability. The choice
depends on the use case.
On Linux, ``/dev/urandom`` is secure, it should be used instead of
``/dev/random``. See `Myths about /dev/urandom
<http://www.2uo.de/myths-about-urandom/>`_ by Thomas Hühn: "Fact:
/dev/urandom is the preferred source of cryptographic randomness on
getrandom(size, 0) can block forever on Linux
The origin of the Python issue #26839 is the `Debian bug
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=822431>`_: in fact,
``getrandom(size, 0)`` blocks forever on the virtual machine. The system
succeeded to boot because systemd killed the blocked process after 90
Solutions like `Load entropy from disk at boot`_ reduces the risk of
On Linux, reading the ``/dev/urandom`` can return "weak" entropy before
urandom is fully initialized, before the kernel collected 128 bits of
entropy. Linux 3.17 adds a new ``getrandom()`` syscall which allows to
block until urandom is initialized.
On Python 3.5.2, os.urandom() uses the
``getrandom(size, GRND_NONBLOCK)``, but falls back on reading the
non-blocking ``/dev/urandom`` if ``getrandom(size, GRND_NONBLOCK)``
fails with ``EAGAIN``.
Security experts promotes ``os.urandom()`` to genereate cryptographic
keys because it is implemented with a `Cryptographically secure
pseudo-random number generator (CSPRNG)
By the way, ``os.urandom()`` is preferred over ``ssl.RAND_bytes()`` for
This PEP proposes to modify os.urandom() to use ``getrandom()`` in
blocking mode to not return weak entropy, but also ensure that Python
will not block at startup.
Make os.urandom() blocking on Linux
All changes described in this section are specific to the Linux
* Modify os.urandom() to block until system urandom is initialized:
``os.urandom()`` (C function ``_PyOS_URandom()``) is modified to
always call ``getrandom(size, 0)`` (blocking mode) on Linux and
* Add a new private ``_PyOS_URandom_Nonblocking()`` function: try to
call ``getrandom(size, GRND_NONBLOCK)`` on Linux and Solaris, but
falls back on reading ``/dev/urandom`` if it fails with ``EAGAIN``.
* Initialize hash secret from non-blocking system urandom:
``_PyRandom_Init()`` is modified to call
* ``random.Random`` constructor now uses non-blocking system urandom: it
is modified to use internally the new ``_PyOS_URandom_Nonblocking()``
function to seed the RNG.
Add a new os.getrandom() function
A new ``os.getrandom(size, flags=0)`` function is added: use
``getrandom()`` syscall on Linux and ``getrandom()`` C function on
The function comes with 2 new flags:
* ``os.GRND_RANDOM``: read bytes from ``/dev/random`` rather than
* ``os.GRND_NONBLOCK``: raise a BlockingIOError if ``os.getrandom()``
The ``os.getrandom()`` is a thin wrapper on the ``getrandom()``
syscall/C function and so inherit of its behaviour. For example, on
Linux, it can return less bytes than requested if the syscall is
interrupted by a signal.
Examples using os.getrandom()
Example of a portable non-blocking RNG function: try to get random bytes
from the OS urandom, or fallback on the random module.
# getrandom() is only available on Linux and Solaris
if not hasattr(os, 'getrandom'):
result = bytearray()
# need a loop because getrandom() can return less bytes than
# requested for different reasons
data = os.getrandom(size, os.GRND_NONBLOCK)
result += data
size -= len(data)
# OS urandom is not initialized yet:
# fallback on the Python random module
data = bytes(random.randrange(256) for byte in range(size))
result += data
This function *can* block in theory on a platform where
``os.getrandom()`` is not available but ``os.urandom()`` can block.
Example of function waiting *timeout* seconds until the OS urandom is
initialized on Linux or Solaris::
def wait_for_system_rng(timeout, interval=1.0):
if not hasattr(os, 'getrandom'):
deadline = time.monotonic() + timeout
if time.monotonic() > deadline:
raise Exception('OS urandom not initialized after %s seconds'
This function is *not* portable. For example, ``os.urandom()`` can block
on FreeBSD in theory, at the early stage of the system initialization.
Create a best-effort RNG
Simpler example to create a non-blocking RNG on Linux: choose between
``Random.SystemRandom`` and ``Random.Random`` depending if
``getrandom(size)`` would block.
if not hasattr(os, 'getrandom'):
This function is *not* portable. For example, ``random.SystemRandom``
can block on FreeBSD in theory, at the early stage of the system
Leave os.urandom() unchanged, add os.getrandom()
os.urandom() remains unchanged: never block, but it can return weak
entropy if system urandom is not initialized yet.
Only add the new ``os.getrandom()`` function (wrapper to the
``getrandom()`` syscall/C function).
The ``secrets.token_bytes()`` function should be used to write portable
The problem with this change is that it expects that users understand
well security and know well each platforms. Python has the tradition of
hiding "implementation details". For example, ``os.urandom()`` is not a
thin wrapper to the ``/dev/urandom`` device: it uses
``CryptGenRandom()`` on Windows, it uses ``getentropy()`` on OpenBSD, it
tries ``getrandom()`` on Linux and Solaris or falls back on reading
``/dev/urandom``. Python already uses the best available system RNG
depending on the platform.
This PEP does not change the API:
* ``os.urandom()``, ``random.SystemRandom`` and ``secrets`` for security
* ``random`` module (except ``random.SystemRandom``) for all other usages
Raise BlockingIOError in os.urandom()
`PEP 522: Allow BlockingIOError in security sensitive APIs on Linux
Python should not decide for the developer how to handle `The bug`_:
raising immediatly a ``BlockingIOError`` if ``os.urandom()`` is going to
block allows developers to choose how to handle this case:
* catch the exception and falls back to a non-secure entropy source:
read ``/dev/urandom`` on Linux, use the Python ``random`` module
(which is not secure at all), use time, use process identifier, etc.
* don't catch the error, the whole program fails with this fatal
More generally, the exception helps to notify when sometimes goes wrong.
The application can emit a warning when it starts to wait for
For the use case 2 (web server), falling back on non-secure entropy is
not acceptable. The application must handle ``BlockingIOError``: poll
``os.urandom()`` until it completes. Example::
print("Wait for system urandom initialiation: move your "
"mouse, use your keyboard, use your disk, ...")
# Avoid busy-loop: sleep 1 ms
For correctness, all applications which must generate a secure secret
must be modified to handle ``BlockingIOError`` even if `The bug`_ is
The case of applications using ``os.urandom()`` but don't really require
security is not well defined. Maybe these applications should not use
``os.urandom()`` at the first place, but always the non-blocking
``random`` module. If ``os.urandom()`` is used for security, we are back
to the use case 2 described above: `Use Case 2: Web server`_. If a
developer doesn't want to drop ``os.urandom()``, the code should be
return bytes(random.randrange(256) for byte in range(n))
The question is if `The bug`_ is common enough to require that so many
applications have to be modified.
Another simpler choice is to refuse to start before the system urandom
print("Fatal error: the system urandom is not initialized")
print("Wait a bit, and rerun the program later.")
Compared to Python 2.7, Python 3.4 and Python 3.5.2 where os.urandom()
never blocks nor raise an exception on Linux, such behaviour change can
be seen as a major regression.
Add an optional block parameter to os.urandom()
See the `issue #27250: Add os.urandom_block()
Add an optional block parameter to os.urandom(). The default value may
be ``True`` (block by default) or ``False`` (non-blocking).
The first technical issue is to implement ``os.urandom(block=False)`` on
all platforms. Only Linux 3.17 (and newer) and Solaris 11.3 (and newer)
have a well defined non-blocking API (``getrandom(size,
As `Raise BlockingIOError in os.urandom()`_, it doesn't seem worth it to
make the API more complex for a theorical (or at least very rare) use
As `Leave os.urandom() unchanged, add os.getrandom()`_, the problem is
that it makes the API more complex and so more error-prone.
Operating system random functions
``os.urandom()`` uses the following functions:
* `OpenBSD: getentropy()
* `Linux: getrandom()
<http://man7.org/linux/man-pages/man2/getrandom.2.html>`_ (Linux 3.17)
-- see also `A system call for random numbers: getrandom()
* Solaris: `getentropy()
(both need Solaris 11.3)
* UNIX, BSD: /dev/urandom, /dev/random
* Windows: `CryptGenRandom()
On Linux, commands to get the status of ``/dev/random`` (results are
number of bytes)::
$ cat /proc/sys/kernel/random/entropy_avail
$ cat /proc/sys/kernel/random/poolsize
Why using os.urandom()?
Since ``os.urandom()`` is implemented in the kernel, it doesn't have
issues of user-space RNG. For example, it is much harder to get its
state. It is usually built on a CSPRNG, so even if its state is
"stolen", it is hard to compute previously generated numbers. The kernel
has a good knowledge of entropy sources and feed regulary the entropy
That's also why ``os.urandom()`` is preferred over ``ssl.RAND_bytes()``.
This document has been placed in the public domain.
I read the summary of Christian Heimes's talk at the language summit:
"The Python security response team"
Extract: "Some of the problems that have occurred are things like bug
reports being sent to the list, but that couldn't be reproduced, or
distributions not updating their Python packages because it wasn't
clear to them that there was a security fix made in an upstream
release. Heimes suggested that security fixes be clearly marked in the
"News" file that accompanies releases."
I suggest to add a new Security section to Misc/NEWS. So packagers
should be able to quickly identify changes which should be backported
(if they maintain a Python version which is no more supported
upstream, or if you cannot use the latest version).
Christian proposed to simply prefix changes with "[Security]".
What do you think?
I think I've found a better explanation than mere timing or
coincidence for why we haven't had any problem reports regarding
Python 3.5.1 blocking in Fedora or in the Software Collections builds
for RHEL and CentOS: those builds currently aren't even trying to use
the new syscall, and are instead always using the older non-blocking
My assumption in https://bugzilla.redhat.com/show_bug.cgi?id=1350123
is that this is due to those binaries being built against a version of
the kernel that doesn't have that syscall defined, which means the
config script doesn't define HAVE_GETRANDOM_SYSCALL, which means we
compile out the code that tries calling it at runtime.
Does skipping trying the new syscall at runtime just because the build
server is running an older kernel actually make sense? Or we would be
better off defining a different TRY_GETRANDOM_SYSCALL that looks for
some other indicator that this is a build for a platform where
getrandom() might be available at runtime, even if it's not available
at build time.
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia