[Python-ideas] PEP 506: some thoughts on the output length of token_bytes (and why I think it should be split in half)

Sat Mar 10 10:35:47 EST 2018

Hi all,

PEP 506, which introduces the secrets module, states the following:

    Default arguments

    One difficult question is "How many bytes should my token be?". We can
help with this question by providing a
    default amount of entropy for the "token_*" functions. If the nbytes
argument is None or not given, the default
    entropy will be used. This default value should be large enough to be
expected to be secure for medium-security
    uses, but is expected to change in the future, possibly even in a
maintenance release [14].

Currently, the default output size of the secrets.token_* functions is 32
bytes. This is definitely more than enough to
protect against brute-force attacks in almost any feasible scenario, and
that is a good thing. It's always better to
pick a conservative default. If smaller tokens could introduce a security
risk in 1% of use cases, I agree smaller
tokens should not be used.

However, I would like to argue that 32-byte tokens are unnecessarily long
for far more than 99% of use cases. Instead,
I'd suggest using a default length of 16 bytes (128 bits), which already
has a very high security margin.

Brute-force attacks and cryptography
------------------------------------

Most cryptographic systems are inherently vulnerable to brute-force
attacks. For example, if you were somehow capable
of performing 2^128 AES decryption operations, you would be able to crack
AES. If you could perform the same amount of
SHA256 hash iterations, you could find a hash collision.

Doing 2^128 computations of any kind is completely and utterly infeasible.
All (super-)computers in the world
combined would not even come close in a million years. That's why
cryptographers typically use this 128 bits
"security level" as a de facto standard for how many brute-force operations
a cryptosystem should tolerate. This
influences properties such as key length and signature sizes.

However, there is an important difference between "offline" and "online"
brute-force attacks. In an online scenario, an
attacker can not immediately determine whether their guess was correct;
they first have to submit the result to another
party (e.g. a server) which can either accept or reject it. In such a case,
it is acceptable when an attacker needs to
perform 2^64 operations in order to break the system (cryptographic
authenticator CBC-MAC-AES can be broken after this
many forgery attempt, for example). Note that when 2^64 guesses will break
the system, an attacker doing random guesses
is expected to perform the correct guess halfway through (i.e. after about
2^63 attempts).

Basically, a 128 bit security level entails the following assumptions:

- An attacker is not capable of performing 2^64 online guesses.
- An attacker (without a quantum computer) is not capable of performing
2^128 offline guesses.
- An attacker with a large quantum computer, capable of performing Grover's
algorithm, is not capable of doing a search
  through a space of 2^256 elements (which is why quantum-resistant
symmetric ciphers use 32-byte keys).

The birthday paradox
--------------------

When you keep drawing random integers between 0 and 2^N, it will take about
sqrt(2^N) = 2^(N/2) draws before you
encounter a number a second time (see
https://en.wikipedia.org/wiki/Birthday_problem#An_upper_bound). This is
relevant
for hash collisions, for example, and the reason we have SHA256 instead of
SHA128.

If a system would allow an attacker to create a resource that is identified
by a 64-bit UUID, they would only need to
create about 2^32 resources (not infeasible) before they would have two
that share an identifier, which could lead to all
sorts of problems. So this is something that should be taken into account
when generating random tokens.

Use-cases of secret tokens
--------------------------

Let's consider some of the typical security-sensitive applications for
unguessable tokens, and see how long tokens
should be in order to achieve the security level described above:

Authenticity proof
~~~~~~~~~~~~~~~~~~

A user needs to prove that they know a specific secret value, such as a
generated password, license code or
anti-CSRF token.

An 8-byte random secret is sufficient here: guessing the secret only has a
1 in 2^64 chance of success.

Secret index
~~~~~~~~~~~~

The user provides a secret which identifies them or a certain resource. The
application stores many different secret
indices, and someone who knows secret A should not be able to guess secret
B. These secrets could e.g. be session
identifiers, password reset tokens, or secret URL's,

16 random bytes are certainly sufficient here, even in the extreme case
that a server would store 2^64 (more than nine
quintillion) records; since the chance of the attacker guessing one of them
is again 1 in 2^64.

Nonces and salts
~~~~~~~~~~~~~~~~

A nonce is a value that should only be used once, and is often used for
symmetric ciphers or for replay attack protection.
Salts used for password hashing are also an example of nonces. Seeds for
deterministic CSRNG's can also be considered
as (secret) nonces.

When nonces are randomly generated (the easiest way to come up with a
unique one), making them 16 bytes long is
sufficient: due to the birthday problem one would need to generate about
2^64 of them before a collision occurs.

Cryptographic keys
~~~~~~~~~~~~~~~~~~

The required length of a key depends on the cryptographic system for which
it is used. If a crypto library does not
enforce length requirements, I'd consider that a bug in the library.

Well, okay. The HMAC algorithm poses no length requirements on keys and the
Python library reflects that. For HMAC, 16
byte keys are okay, unless you want post-quantum security (which you won't
get anyway when you use HMAC-MD5, which is
still the default).

Except for the far-fetched post-quantum HMAC scenario, I can't really think
of a reasonably realistic situation where
a security issue is introduced because the result of secrets.token_bytes is
16 bytes long instead of 32. Can anyone
else here think of an example?

Wouldn't changing this result in compatibility issues?
------------------------------------------------------

The docs describe the default value of nbytes as "a reasonable default" and
state that
"That default is subject to change at any time, including during
maintenance releases.". Since the secrets module is
still relatively new, this would be a good moment to change it, right?

32 > 16, so what's the problem?
-------------------------------

First of all, I'd like to point out that the three examples in the docs
(https://docs.python.org/3.6/library/secrets.html#generating-tokens)
explicitly generate 16-byte tokens. If the current
default is the most secure solution, why deviate from it here?

I think there a quite some situations where unnecessarily long codes can
cause usability problems: when using a system
where users have to manually type in a random code (I frequently have to do
that when using the password manager on my
phone, for example) it's nice if you can save half of the time they have to
spend on that. Shorter codes can also be
converted to smaller QR codes, and to nicer URLs.

Sure, a user can always choose to just pick a value for nbytes and set
their own length. However, if they want to pick
a secure value, they are being tasked with considering the implications. A
user should be able to rely on the library
picking a good secure default.

I think security functionality should be made as ergonomic as possible,
without compromising security. When security
features unnecessarily create usability issues, people are discouraged from
using them in the first place.

So yeah, that was my argument for making tokens 16 bytes long instead of
32. Please let me know if and why I am
completely wrong:P

Regards,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180310/54bd5642/attachment-0001.html>