Mailman 3 March 2018 - Python-ideas

Descouraging the implicit string concatenation
by Facundo Batista March 16, 2018

March 16, 2018

Hello! What would you think about formally descouraging the following idiom? long_string = ( "some part of the string " "with more words, actually is the same " "string that the compiler puts together") We should write the following, instead: long_string = ( "some part of the string " + "with more words, actually is the same " + "string that the compiler puts together") I know that "no change to Python itself" is needed, but having a … [View More]

24 48

Implicit string literal concatenation considered harmful?
by Guido van Rossum March 14, 2018

March 14, 2018

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' … [View More]

51 165

Adding shallow argument in filecmp.dircmp
by Robert Vanden Eynde March 14, 2018

March 14, 2018

In module filecmp, the normal functions cmp and cmpfiles do have a shallow attribute. But the dircmp class doesn't provide one, could we add it in the constructor ? In the implementation, it would only change the call to cmpfiles in def phase3 ? It could be useful for performance (same reason, the "shallow" parameter exists in the other two functions), when quickly comparing two directories.

1 0

Pythonic access protection
by Neil Girdhar March 14, 2018

March 14, 2018

Now that Python has type annotations has there been any thought into having access protection annotations. This would be for same reason as type annotations: to give the linter the opportunity to catch bugs early, and to give the programmer a uniform way to specify invariants. Unlike type annotations, this wouldn't need any new syntax, but it should be well thought-out. My rough idea is to have some decorator on methods and properties describing who is allowed to call those methods and … [View More]

1 0

PEP 506: some thoughts on the output length of token_bytes (and why I think it should be split in half)
by Tom Tervoort March 12, 2018

March 12, 2018

Hi all, PEP 506, which introduces the secrets module, states the following: Default arguments One difficult question is "How many bytes should my token be?". We can help with this question by providing a default amount of entropy for the "token_*" functions. If the nbytes argument is None or not given, the default entropy will be used. This default value should be large enough to be expected to be secure for medium-security uses, but is expected to change in the future, … [View More]possibly even in a maintenance release [14]. Currently, the default output size of the secrets.token_* functions is 32 bytes. This is definitely more than enough to protect against brute-force attacks in almost any feasible scenario, and that is a good thing. It's always better to pick a conservative default. If smaller tokens could introduce a security risk in 1% of use cases, I agree smaller tokens should not be used. However, I would like to argue that 32-byte tokens are unnecessarily long for far more than 99% of use cases. Instead, I'd suggest using a default length of 16 bytes (128 bits), which already has a very high security margin. Brute-force attacks and cryptography ------------------------------------ Most cryptographic systems are inherently vulnerable to brute-force attacks. For example, if you were somehow capable of performing 2^128 AES decryption operations, you would be able to crack AES. If you could perform the same amount of SHA256 hash iterations, you could find a hash collision. Doing 2^128 computations of any kind is completely and utterly infeasible. All (super-)computers in the world combined would not even come close in a million years. That's why cryptographers typically use this 128 bits "security level" as a de facto standard for how many brute-force operations a cryptosystem should tolerate. This influences properties such as key length and signature sizes. However, there is an important difference between "offline" and "online" brute-force attacks. In an online scenario, an attacker can not immediately determine whether their guess was correct; they first have to submit the result to another party (e.g. a server) which can either accept or reject it. In such a case, it is acceptable when an attacker needs to perform 2^64 operations in order to break the system (cryptographic authenticator CBC-MAC-AES can be broken after this many forgery attempt, for example). Note that when 2^64 guesses will break the system, an attacker doing random guesses is expected to perform the correct guess halfway through (i.e. after about 2^63 attempts). Basically, a 128 bit security level entails the following assumptions: - An attacker is not capable of performing 2^64 online guesses. - An attacker (without a quantum computer) is not capable of performing 2^128 offline guesses. - An attacker with a large quantum computer, capable of performing Grover's algorithm, is not capable of doing a search through a space of 2^256 elements (which is why quantum-resistant symmetric ciphers use 32-byte keys). The birthday paradox -------------------- When you keep drawing random integers between 0 and 2^N, it will take about sqrt(2^N) = 2^(N/2) draws before you encounter a number a second time (see https://en.wikipedia.org/wiki/Birthday_problem#An_upper_bound). This is relevant for hash collisions, for example, and the reason we have SHA256 instead of SHA128. If a system would allow an attacker to create a resource that is identified by a 64-bit UUID, they would only need to create about 2^32 resources (not infeasible) before they would have two that share an identifier, which could lead to all sorts of problems. So this is something that should be taken into account when generating random tokens. Use-cases of secret tokens -------------------------- Let's consider some of the typical security-sensitive applications for unguessable tokens, and see how long tokens should be in order to achieve the security level described above: Authenticity proof ~~~~~~~~~~~~~~~~~~ A user needs to prove that they know a specific secret value, such as a generated password, license code or anti-CSRF token. An 8-byte random secret is sufficient here: guessing the secret only has a 1 in 2^64 chance of success. Secret index ~~~~~~~~~~~~ The user provides a secret which identifies them or a certain resource. The application stores many different secret indices, and someone who knows secret A should not be able to guess secret B. These secrets could e.g. be session identifiers, password reset tokens, or secret URL's, 16 random bytes are certainly sufficient here, even in the extreme case that a server would store 2^64 (more than nine quintillion) records; since the chance of the attacker guessing one of them is again 1 in 2^64. Nonces and salts ~~~~~~~~~~~~~~~~ A nonce is a value that should only be used once, and is often used for symmetric ciphers or for replay attack protection. Salts used for password hashing are also an example of nonces. Seeds for deterministic CSRNG's can also be considered as (secret) nonces. When nonces are randomly generated (the easiest way to come up with a unique one), making them 16 bytes long is sufficient: due to the birthday problem one would need to generate about 2^64 of them before a collision occurs. Cryptographic keys ~~~~~~~~~~~~~~~~~~ The required length of a key depends on the cryptographic system for which it is used. If a crypto library does not enforce length requirements, I'd consider that a bug in the library. Well, okay. The HMAC algorithm poses no length requirements on keys and the Python library reflects that. For HMAC, 16 byte keys are okay, unless you want post-quantum security (which you won't get anyway when you use HMAC-MD5, which is still the default). Except for the far-fetched post-quantum HMAC scenario, I can't really think of a reasonably realistic situation where a security issue is introduced because the result of secrets.token_bytes is 16 bytes long instead of 32. Can anyone else here think of an example? Wouldn't changing this result in compatibility issues? ------------------------------------------------------ The docs describe the default value of nbytes as "a reasonable default" and state that "That default is subject to change at any time, including during maintenance releases.". Since the secrets module is still relatively new, this would be a good moment to change it, right? 32 > 16, so what's the problem? ------------------------------- First of all, I'd like to point out that the three examples in the docs (https://docs.python.org/3.6/library/secrets.html#generating-tokens) explicitly generate 16-byte tokens. If the current default is the most secure solution, why deviate from it here? I think there a quite some situations where unnecessarily long codes can cause usability problems: when using a system where users have to manually type in a random code (I frequently have to do that when using the password manager on my phone, for example) it's nice if you can save half of the time they have to spend on that. Shorter codes can also be converted to smaller QR codes, and to nicer URLs. Sure, a user can always choose to just pick a value for nbytes and set their own length. However, if they want to pick a secure value, they are being tasked with considering the implications. A user should be able to rely on the library picking a good secure default. I think security functionality should be made as ergonomic as possible, without compromising security. When security features unnecessarily create usability issues, people are discouraged from using them in the first place. So yeah, that was my argument for making tokens 16 bytes long instead of 32. Please let me know if and why I am completely wrong:P Regards, Tom [View Less]

4 8

Add MutableSet.update?
by Lucas Wiman March 10, 2018

March 10, 2018

I was trying to use `collections.abc.MutableSet` today, and noticed that it does not define an `update` method. This is a very useful method that is present on the builtin `set` class, and seems to fit naturally with the other methods. Was omitting this method intentional? Thanks & best wishes, Lucas Wiman

5 5

Argparse subparsers add required keyword
by George Leslie-Waksman March 8, 2018

March 8, 2018

With the stdlib argparse, subparsers can be defined and they can be marked as required (though this is not documented) but they do not support a "required" keyword. I think it would make everything more consistent if the keyword existed. This won't require any functional changes under the hood. Right now, this works (and behaves as expected): parser = argparse.ArgumentParser(...) subparsers = parser.add_subparsers(...) subparsers.required = True but this does not: parser = argparse.… [View More]

1 0

Consider adding reverse argument to heapq methods.
by Neil Girdhar March 8, 2018

March 8, 2018

heapq creates and works with min-heaps. Currently, the only way to do this is to use _heapify_max instead of heapify, heapq._heappop_max instead of heapq.heappop, etc. These methods should be exposed using a reverse keyword argument rather than as private methods just like sort. Best, Neil

2 1

Class autoload
by Jamesie Pic March 5, 2018

March 5, 2018

Hello everybody, I thought perhaps we could allow the usage of a "new" keyword to instanciate an object, ie: obj = new yourmodule.YourClass() In this case, it would behave the same as from yourmodule import YourClass; obj = YourClass(), except that it wouldn't need to be imported. This would also eliminate the need to manage an import list at the beginning of a script in most case. I'm really not proud of this idea but PHP has had autoload for years and when i open scripts with hundred … [View More]

12 14

Re: [Python-ideas] An alternative to PEP 572's Statement-Local Name Bindings
by Greg Ewing March 5, 2018

March 5, 2018

> 2018-03-03 8:40 GMT+01:00 Nick Coghlan <ncoghlan(a)gmail.com>: >> pairs = [(f(y), g(y)) for x in things with bind(h(x)) as y] I don't mucn like "with bind(h(x)) as y" because it's kind of like an abstraction inversion -- you're building something complicated on top of something complicated in order to get something simple, instead of just having the simple thing to begin with. If nothing else, it has a huge runtime cost for the benefit it gives. -- Greg

4 5