PEP 506: some thoughts on the output length of token_bytes (and why I think it should be split in half)

Hi all, PEP 506, which introduces the secrets module, states the following: Default arguments One difficult question is "How many bytes should my token be?". We can help with this question by providing a default amount of entropy for the "token_*" functions. If the nbytes argument is None or not given, the default entropy will be used. This default value should be large enough to be expected to be secure for medium-security uses, but is expected to change in the future, possibly even in a maintenance release [14]. Currently, the default output size of the secrets.token_* functions is 32 bytes. This is definitely more than enough to protect against brute-force attacks in almost any feasible scenario, and that is a good thing. It's always better to pick a conservative default. If smaller tokens could introduce a security risk in 1% of use cases, I agree smaller tokens should not be used. However, I would like to argue that 32-byte tokens are unnecessarily long for far more than 99% of use cases. Instead, I'd suggest using a default length of 16 bytes (128 bits), which already has a very high security margin. Brute-force attacks and cryptography ------------------------------------ Most cryptographic systems are inherently vulnerable to brute-force attacks. For example, if you were somehow capable of performing 2^128 AES decryption operations, you would be able to crack AES. If you could perform the same amount of SHA256 hash iterations, you could find a hash collision. Doing 2^128 computations of any kind is completely and utterly infeasible. All (super-)computers in the world combined would not even come close in a million years. That's why cryptographers typically use this 128 bits "security level" as a de facto standard for how many brute-force operations a cryptosystem should tolerate. This influences properties such as key length and signature sizes. However, there is an important difference between "offline" and "online" brute-force attacks. In an online scenario, an attacker can not immediately determine whether their guess was correct; they first have to submit the result to another party (e.g. a server) which can either accept or reject it. In such a case, it is acceptable when an attacker needs to perform 2^64 operations in order to break the system (cryptographic authenticator CBC-MAC-AES can be broken after this many forgery attempt, for example). Note that when 2^64 guesses will break the system, an attacker doing random guesses is expected to perform the correct guess halfway through (i.e. after about 2^63 attempts). Basically, a 128 bit security level entails the following assumptions: - An attacker is not capable of performing 2^64 online guesses. - An attacker (without a quantum computer) is not capable of performing 2^128 offline guesses. - An attacker with a large quantum computer, capable of performing Grover's algorithm, is not capable of doing a search through a space of 2^256 elements (which is why quantum-resistant symmetric ciphers use 32-byte keys). The birthday paradox -------------------- When you keep drawing random integers between 0 and 2^N, it will take about sqrt(2^N) = 2^(N/2) draws before you encounter a number a second time (see https://en.wikipedia.org/wiki/Birthday_problem#An_upper_bound). This is relevant for hash collisions, for example, and the reason we have SHA256 instead of SHA128. If a system would allow an attacker to create a resource that is identified by a 64-bit UUID, they would only need to create about 2^32 resources (not infeasible) before they would have two that share an identifier, which could lead to all sorts of problems. So this is something that should be taken into account when generating random tokens. Use-cases of secret tokens -------------------------- Let's consider some of the typical security-sensitive applications for unguessable tokens, and see how long tokens should be in order to achieve the security level described above: Authenticity proof ~~~~~~~~~~~~~~~~~~ A user needs to prove that they know a specific secret value, such as a generated password, license code or anti-CSRF token. An 8-byte random secret is sufficient here: guessing the secret only has a 1 in 2^64 chance of success. Secret index ~~~~~~~~~~~~ The user provides a secret which identifies them or a certain resource. The application stores many different secret indices, and someone who knows secret A should not be able to guess secret B. These secrets could e.g. be session identifiers, password reset tokens, or secret URL's, 16 random bytes are certainly sufficient here, even in the extreme case that a server would store 2^64 (more than nine quintillion) records; since the chance of the attacker guessing one of them is again 1 in 2^64. Nonces and salts ~~~~~~~~~~~~~~~~ A nonce is a value that should only be used once, and is often used for symmetric ciphers or for replay attack protection. Salts used for password hashing are also an example of nonces. Seeds for deterministic CSRNG's can also be considered as (secret) nonces. When nonces are randomly generated (the easiest way to come up with a unique one), making them 16 bytes long is sufficient: due to the birthday problem one would need to generate about 2^64 of them before a collision occurs. Cryptographic keys ~~~~~~~~~~~~~~~~~~ The required length of a key depends on the cryptographic system for which it is used. If a crypto library does not enforce length requirements, I'd consider that a bug in the library. Well, okay. The HMAC algorithm poses no length requirements on keys and the Python library reflects that. For HMAC, 16 byte keys are okay, unless you want post-quantum security (which you won't get anyway when you use HMAC-MD5, which is still the default). Except for the far-fetched post-quantum HMAC scenario, I can't really think of a reasonably realistic situation where a security issue is introduced because the result of secrets.token_bytes is 16 bytes long instead of 32. Can anyone else here think of an example? Wouldn't changing this result in compatibility issues? ------------------------------------------------------ The docs describe the default value of nbytes as "a reasonable default" and state that "That default is subject to change at any time, including during maintenance releases.". Since the secrets module is still relatively new, this would be a good moment to change it, right? 32 > 16, so what's the problem? ------------------------------- First of all, I'd like to point out that the three examples in the docs (https://docs.python.org/3.6/library/secrets.html#generating-tokens) explicitly generate 16-byte tokens. If the current default is the most secure solution, why deviate from it here? I think there a quite some situations where unnecessarily long codes can cause usability problems: when using a system where users have to manually type in a random code (I frequently have to do that when using the password manager on my phone, for example) it's nice if you can save half of the time they have to spend on that. Shorter codes can also be converted to smaller QR codes, and to nicer URLs. Sure, a user can always choose to just pick a value for nbytes and set their own length. However, if they want to pick a secure value, they are being tasked with considering the implications. A user should be able to rely on the library picking a good secure default. I think security functionality should be made as ergonomic as possible, without compromising security. When security features unnecessarily create usability issues, people are discouraged from using them in the first place. So yeah, that was my argument for making tokens 16 bytes long instead of 32. Please let me know if and why I am completely wrong:P Regards, Tom

On 11 March 2018 at 01:35, Tom Tervoort <tomtervoort@gmail.com> wrote:
1. For readability (as you say, shorter tokens *are* easier to read) 2. For independence from the default length setting
I think security functionality should be made as ergonomic as possible,
"32 bytes is the conservative default, while 16 bytes is a common explicit override to improve token readability with little or no reduction in pragmatic security" is still pretty easy to use. With the API token generation examples already explicitly specifying 16 bytes, the only change needed to get to that state would be to amend https://docs.python.org/3/library/secrets.html#how-many-bytes-should-tokens-... to mention the readability question, and the fact that 16 is a reasonable choice in most cases. With the current setup, the ideal adoption cycle will see someone going: * It works, yay! * The tokens are a bit hard to read though, how do I improve that? * OK, I don't need the super-conservative 32 bytes, 16 bytes is fine, so I'll pass that in explicitly
I don't think you're completely wrong, and I suspect if anyone had gone through this level of detailed analysis prior to 3.6, we might have made the default 128 bits of entropy instead of 256. As it is though, I think you've made the case for a docs change to make it explicit that 16 bytes of entropy is almost certainly going to be fine for practical purposes (which will benefit all users of 3.6+), but not for a reduction in the actual default (which would require invoking the "Hey, we said we might change it at any time!" caveat on the default length, and we included that caveat because we weren't sure of the potential future security implications of quantum computing, not because 32 byte tokens are harder to read than 16 byte ones) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Mar 13, 2018 at 12:45 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
+1 on the docs change. Is there value in exposing a couple of guiding constants secrets.DEFAULT_ENTROPY and secrets.MINIMUM_ENTROPY ? The former is already present (and would simply be documented as public), and the latter would be "whatever seems like a basic minimum", such that MIN <= DEFAULT, and the two of them could be increased at any time (together or independently). For applications where readability and practicality are more important than absolute best quality security, secrets.token_hex(secrets.MINIMUM_ENTROPY) would be future-proofed while still being shorter than secrets.token_hex(). Unlike the pure docs change, this would only benefit users of 3.8+ (or maybe 3.7+ if this is considered a trivial change), but it'd give a way to avoid hardcoding a length that might later be considered weak. ChrisA

On Sat, 10 Mar 2018 16:35:47 +0100 Tom Tervoort <tomtervoort@gmail.com> wrote:
If you want shorter codes for specific scenarios then it's your responsibility (as an application developer) to adapt the token width *and* ensure that the chosen code length is still non-vulnerable. I think defaulting to 32 bytes for the Python standard library is good as: 1) it's more future-proof, even in the face of algorithm weaknesses which may make available better-than-brute-force methods in the future 2) it teaches developers the value of having sufficient entropy in tokens Also I disagree with the claim that 16 bytes is somehow better for usability. It's still a terribly long random string to type by hand and I hope nobody is inflicting that on their users. (I'm not sure what "smaller QR code" means. Given a QR code is basically a computer analysis-friendly glyph that you show your phone or other device to perform pattern recognition on, why does it matter whether the QR code is "small" or not?)
A user should be able to rely on the library picking a good secure default.
And that's exactly what the library does, apparently! Regards Antoine.

On Tue, Mar 13, 2018 at 1:04 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
A QR code encoding more data requires finer resolution at the same size. That means the camera needs to be closer to it, all else being equal. If you keep stuffing more data into it, eventually the QR code ends up looking like the patented Monitor Grit that we do our best to avoid. :) ChrisA

On Tue, 13 Mar 2018 01:10:33 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Is that important here? I would expect the user to be (physically) close to the QR code. It's not like a QR code containing secret credentials will be posted on a wall in a random street or subway station (at least I hope so :-)). Regards Antoine.

On Tue, Mar 13, 2018 at 2:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Depends what you mean by "secret". Let's suppose you host a video sharing site (we'll call it, say, "me tube") and need to create URLs for videos as they get uploaded. These URLs should be impossible to predict, but easy to share. How long do they need to be? If they're encoded using token_urlsafe (base 64), you get six bits of randomness per character of URL; the default entropy looks like http://metube.example/SoO8IclkLFcfPX2pA7okFHdoSrZjKtrAmDdmFvC2O6Y which is going to make a large and complicated QR code that you have to be very close to. But you don't really need these to be THAT secure. It'd be fine to use token_urlsafe(16) to make something like http://metube.example/9IoJVtQrhic4Xi633mJ7MQ; and our nearest competitor uses even shorter URLs like http://youtu.be/B7xai5u_tnk (about equivalent to token_urlsafe(9)). Let's look at those URLs: 32: http://metube.example/SoO8IclkLFcfPX2pA7okFHdoSrZjKtrAmDdmFvC2O6Y 16: http://metube.example/9IoJVtQrhic4Xi633mJ7MQ 09: http://metube.example/ziCHRKMlr8rX YT: http://youtu.be/B7xai5u_tnk Using the 'secrets' module to generate URLs like this isn't wrong; since these URLs have to be unguessable (you shouldn't be able to type http://metube.example/aaaaac and get someone's secret unlisted video), their identifiers have to be functionally equivalent to session IDs and such. And since advertisers *do* want to put links to their videos onto billboards, QR codes are definitely a thing; and companies won't use metube if its competitor's QR codes can be scanned reliably from two platforms across and ours need to be scanned from right up next to it. As you can see from this analysis, the boundary for "good enough" is incredibly rubbery, but there is definitely value in making shorter URLs. 32: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fmetube.example%2FSoO8IclkLFcfPX2pA7okFHdoSrZjKtrAmDdmFvC2O6Y&chs=180x180&choe=UTF-8&chld=L|2 16: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fmetube.example%2F9IoJVtQrhic4Xi633mJ7MQ&chs=180x180&choe=UTF-8&chld=L|2 09: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fmetube.example%2FziCHRKMlr8rX&chs=180x180&choe=UTF-8&chld=L|2 (and YT: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fyoutu.be%2FB7xai5u_tnk&chs=180x180&choe=UTF-8&chld=L|2 for comparison) The longer the URL, the noisier the image, and thus the nearer you need to be for a reliable scan. ChrisA

On Tue, 13 Mar 2018 05:03:21 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Yeah. So people building such a platform can use a custom token length. Still, I think it's better to have a future-proof default token length. People will know if they need to shorten it for usability reasons. However, if we default to shorter tokens, people won't know whether they need to ask for a longer length for security reasons. "Secure by default, better usability with a simple parameter tweak" sounds like a sane API guideline. Regards Antoine.

On 11 March 2018 at 01:35, Tom Tervoort <tomtervoort@gmail.com> wrote:
1. For readability (as you say, shorter tokens *are* easier to read) 2. For independence from the default length setting
I think security functionality should be made as ergonomic as possible,
"32 bytes is the conservative default, while 16 bytes is a common explicit override to improve token readability with little or no reduction in pragmatic security" is still pretty easy to use. With the API token generation examples already explicitly specifying 16 bytes, the only change needed to get to that state would be to amend https://docs.python.org/3/library/secrets.html#how-many-bytes-should-tokens-... to mention the readability question, and the fact that 16 is a reasonable choice in most cases. With the current setup, the ideal adoption cycle will see someone going: * It works, yay! * The tokens are a bit hard to read though, how do I improve that? * OK, I don't need the super-conservative 32 bytes, 16 bytes is fine, so I'll pass that in explicitly
I don't think you're completely wrong, and I suspect if anyone had gone through this level of detailed analysis prior to 3.6, we might have made the default 128 bits of entropy instead of 256. As it is though, I think you've made the case for a docs change to make it explicit that 16 bytes of entropy is almost certainly going to be fine for practical purposes (which will benefit all users of 3.6+), but not for a reduction in the actual default (which would require invoking the "Hey, we said we might change it at any time!" caveat on the default length, and we included that caveat because we weren't sure of the potential future security implications of quantum computing, not because 32 byte tokens are harder to read than 16 byte ones) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Mar 13, 2018 at 12:45 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
+1 on the docs change. Is there value in exposing a couple of guiding constants secrets.DEFAULT_ENTROPY and secrets.MINIMUM_ENTROPY ? The former is already present (and would simply be documented as public), and the latter would be "whatever seems like a basic minimum", such that MIN <= DEFAULT, and the two of them could be increased at any time (together or independently). For applications where readability and practicality are more important than absolute best quality security, secrets.token_hex(secrets.MINIMUM_ENTROPY) would be future-proofed while still being shorter than secrets.token_hex(). Unlike the pure docs change, this would only benefit users of 3.8+ (or maybe 3.7+ if this is considered a trivial change), but it'd give a way to avoid hardcoding a length that might later be considered weak. ChrisA

On Sat, 10 Mar 2018 16:35:47 +0100 Tom Tervoort <tomtervoort@gmail.com> wrote:
If you want shorter codes for specific scenarios then it's your responsibility (as an application developer) to adapt the token width *and* ensure that the chosen code length is still non-vulnerable. I think defaulting to 32 bytes for the Python standard library is good as: 1) it's more future-proof, even in the face of algorithm weaknesses which may make available better-than-brute-force methods in the future 2) it teaches developers the value of having sufficient entropy in tokens Also I disagree with the claim that 16 bytes is somehow better for usability. It's still a terribly long random string to type by hand and I hope nobody is inflicting that on their users. (I'm not sure what "smaller QR code" means. Given a QR code is basically a computer analysis-friendly glyph that you show your phone or other device to perform pattern recognition on, why does it matter whether the QR code is "small" or not?)
A user should be able to rely on the library picking a good secure default.
And that's exactly what the library does, apparently! Regards Antoine.

On Tue, Mar 13, 2018 at 1:04 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
A QR code encoding more data requires finer resolution at the same size. That means the camera needs to be closer to it, all else being equal. If you keep stuffing more data into it, eventually the QR code ends up looking like the patented Monitor Grit that we do our best to avoid. :) ChrisA

On Tue, 13 Mar 2018 01:10:33 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Is that important here? I would expect the user to be (physically) close to the QR code. It's not like a QR code containing secret credentials will be posted on a wall in a random street or subway station (at least I hope so :-)). Regards Antoine.

On Tue, Mar 13, 2018 at 2:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Depends what you mean by "secret". Let's suppose you host a video sharing site (we'll call it, say, "me tube") and need to create URLs for videos as they get uploaded. These URLs should be impossible to predict, but easy to share. How long do they need to be? If they're encoded using token_urlsafe (base 64), you get six bits of randomness per character of URL; the default entropy looks like http://metube.example/SoO8IclkLFcfPX2pA7okFHdoSrZjKtrAmDdmFvC2O6Y which is going to make a large and complicated QR code that you have to be very close to. But you don't really need these to be THAT secure. It'd be fine to use token_urlsafe(16) to make something like http://metube.example/9IoJVtQrhic4Xi633mJ7MQ; and our nearest competitor uses even shorter URLs like http://youtu.be/B7xai5u_tnk (about equivalent to token_urlsafe(9)). Let's look at those URLs: 32: http://metube.example/SoO8IclkLFcfPX2pA7okFHdoSrZjKtrAmDdmFvC2O6Y 16: http://metube.example/9IoJVtQrhic4Xi633mJ7MQ 09: http://metube.example/ziCHRKMlr8rX YT: http://youtu.be/B7xai5u_tnk Using the 'secrets' module to generate URLs like this isn't wrong; since these URLs have to be unguessable (you shouldn't be able to type http://metube.example/aaaaac and get someone's secret unlisted video), their identifiers have to be functionally equivalent to session IDs and such. And since advertisers *do* want to put links to their videos onto billboards, QR codes are definitely a thing; and companies won't use metube if its competitor's QR codes can be scanned reliably from two platforms across and ours need to be scanned from right up next to it. As you can see from this analysis, the boundary for "good enough" is incredibly rubbery, but there is definitely value in making shorter URLs. 32: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fmetube.example%2FSoO8IclkLFcfPX2pA7okFHdoSrZjKtrAmDdmFvC2O6Y&chs=180x180&choe=UTF-8&chld=L|2 16: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fmetube.example%2F9IoJVtQrhic4Xi633mJ7MQ&chs=180x180&choe=UTF-8&chld=L|2 09: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fmetube.example%2FziCHRKMlr8rX&chs=180x180&choe=UTF-8&chld=L|2 (and YT: https://chart.googleapis.com/chart?cht=qr&chl=http%3A%2F%2Fyoutu.be%2FB7xai5u_tnk&chs=180x180&choe=UTF-8&chld=L|2 for comparison) The longer the URL, the noisier the image, and thus the nearer you need to be for a reliable scan. ChrisA

On Tue, 13 Mar 2018 05:03:21 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Yeah. So people building such a platform can use a custom token length. Still, I think it's better to have a future-proof default token length. People will know if they need to shorten it for usability reasons. However, if we default to shorter tokens, people won't know whether they need to ask for a longer length for security reasons. "Secure by default, better usability with a simple parameter tweak" sounds like a sane API guideline. Regards Antoine.
participants (4)
-
Antoine Pitrou
-
Chris Angelico
-
Nick Coghlan
-
Tom Tervoort