I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
In module filecmp, the normal functions cmp and cmpfiles do have a
shallow attribute. But the dircmp class doesn't provide one, could we
add it in the constructor ?
In the implementation, it would only change the call to cmpfiles in def phase3 ?
It could be useful for performance (same reason, the "shallow"
parameter exists in the other two functions), when quickly comparing
Now that Python has type annotations has there been any thought into having
access protection annotations. This would be for same reason as type
annotations: to give the linter the opportunity to catch bugs early, and to
give the programmer a uniform way to specify invariants.
Unlike type annotations, this wouldn't need any new syntax, but it should
be well thought-out. My rough idea is to have some decorator on methods
and properties describing who is allowed to call those methods and access
those properties. The allowed caller list could be things base classes.
I think this would need to be better thought-out, but something like this
would be really useful for me as it's hard to keep track of who is allowed
to call what.
PEP 506, which introduces the secrets module, states the following:
One difficult question is "How many bytes should my token be?". We can
help with this question by providing a
default amount of entropy for the "token_*" functions. If the nbytes
argument is None or not given, the default
entropy will be used. This default value should be large enough to be
expected to be secure for medium-security
uses, but is expected to change in the future, possibly even in a
maintenance release .
Currently, the default output size of the secrets.token_* functions is 32
bytes. This is definitely more than enough to
protect against brute-force attacks in almost any feasible scenario, and
that is a good thing. It's always better to
pick a conservative default. If smaller tokens could introduce a security
risk in 1% of use cases, I agree smaller
tokens should not be used.
However, I would like to argue that 32-byte tokens are unnecessarily long
for far more than 99% of use cases. Instead,
I'd suggest using a default length of 16 bytes (128 bits), which already
has a very high security margin.
Brute-force attacks and cryptography
Most cryptographic systems are inherently vulnerable to brute-force
attacks. For example, if you were somehow capable
of performing 2^128 AES decryption operations, you would be able to crack
AES. If you could perform the same amount of
SHA256 hash iterations, you could find a hash collision.
Doing 2^128 computations of any kind is completely and utterly infeasible.
All (super-)computers in the world
combined would not even come close in a million years. That's why
cryptographers typically use this 128 bits
"security level" as a de facto standard for how many brute-force operations
a cryptosystem should tolerate. This
influences properties such as key length and signature sizes.
However, there is an important difference between "offline" and "online"
brute-force attacks. In an online scenario, an
attacker can not immediately determine whether their guess was correct;
they first have to submit the result to another
party (e.g. a server) which can either accept or reject it. In such a case,
it is acceptable when an attacker needs to
perform 2^64 operations in order to break the system (cryptographic
authenticator CBC-MAC-AES can be broken after this
many forgery attempt, for example). Note that when 2^64 guesses will break
the system, an attacker doing random guesses
is expected to perform the correct guess halfway through (i.e. after about
Basically, a 128 bit security level entails the following assumptions:
- An attacker is not capable of performing 2^64 online guesses.
- An attacker (without a quantum computer) is not capable of performing
2^128 offline guesses.
- An attacker with a large quantum computer, capable of performing Grover's
algorithm, is not capable of doing a search
through a space of 2^256 elements (which is why quantum-resistant
symmetric ciphers use 32-byte keys).
The birthday paradox
When you keep drawing random integers between 0 and 2^N, it will take about
sqrt(2^N) = 2^(N/2) draws before you
encounter a number a second time (see
https://en.wikipedia.org/wiki/Birthday_problem#An_upper_bound). This is
for hash collisions, for example, and the reason we have SHA256 instead of
If a system would allow an attacker to create a resource that is identified
by a 64-bit UUID, they would only need to
create about 2^32 resources (not infeasible) before they would have two
that share an identifier, which could lead to all
sorts of problems. So this is something that should be taken into account
when generating random tokens.
Use-cases of secret tokens
Let's consider some of the typical security-sensitive applications for
unguessable tokens, and see how long tokens
should be in order to achieve the security level described above:
A user needs to prove that they know a specific secret value, such as a
generated password, license code or
An 8-byte random secret is sufficient here: guessing the secret only has a
1 in 2^64 chance of success.
The user provides a secret which identifies them or a certain resource. The
application stores many different secret
indices, and someone who knows secret A should not be able to guess secret
B. These secrets could e.g. be session
identifiers, password reset tokens, or secret URL's,
16 random bytes are certainly sufficient here, even in the extreme case
that a server would store 2^64 (more than nine
quintillion) records; since the chance of the attacker guessing one of them
is again 1 in 2^64.
Nonces and salts
A nonce is a value that should only be used once, and is often used for
symmetric ciphers or for replay attack protection.
Salts used for password hashing are also an example of nonces. Seeds for
deterministic CSRNG's can also be considered
as (secret) nonces.
When nonces are randomly generated (the easiest way to come up with a
unique one), making them 16 bytes long is
sufficient: due to the birthday problem one would need to generate about
2^64 of them before a collision occurs.
The required length of a key depends on the cryptographic system for which
it is used. If a crypto library does not
enforce length requirements, I'd consider that a bug in the library.
Well, okay. The HMAC algorithm poses no length requirements on keys and the
Python library reflects that. For HMAC, 16
byte keys are okay, unless you want post-quantum security (which you won't
get anyway when you use HMAC-MD5, which is
still the default).
Except for the far-fetched post-quantum HMAC scenario, I can't really think
of a reasonably realistic situation where
a security issue is introduced because the result of secrets.token_bytes is
16 bytes long instead of 32. Can anyone
else here think of an example?
Wouldn't changing this result in compatibility issues?
The docs describe the default value of nbytes as "a reasonable default" and
"That default is subject to change at any time, including during
maintenance releases.". Since the secrets module is
still relatively new, this would be a good moment to change it, right?
32 > 16, so what's the problem?
First of all, I'd like to point out that the three examples in the docs
explicitly generate 16-byte tokens. If the current
default is the most secure solution, why deviate from it here?
I think there a quite some situations where unnecessarily long codes can
cause usability problems: when using a system
where users have to manually type in a random code (I frequently have to do
that when using the password manager on my
phone, for example) it's nice if you can save half of the time they have to
spend on that. Shorter codes can also be
converted to smaller QR codes, and to nicer URLs.
Sure, a user can always choose to just pick a value for nbytes and set
their own length. However, if they want to pick
a secure value, they are being tasked with considering the implications. A
user should be able to rely on the library
picking a good secure default.
I think security functionality should be made as ergonomic as possible,
without compromising security. When security
features unnecessarily create usability issues, people are discouraged from
using them in the first place.
So yeah, that was my argument for making tokens 16 bytes long instead of
32. Please let me know if and why I am
I was trying to use `collections.abc.MutableSet` today, and noticed that it
does not define an `update` method. This is a very useful method that is
present on the builtin `set` class, and seems to fit naturally with the
Was omitting this method intentional?
Thanks & best wishes,
With the stdlib argparse, subparsers can be defined and they can be marked
as required (though this is not documented) but they do not support a
"required" keyword. I think it would make everything more consistent if the
This won't require any functional changes under the hood.
Right now, this works (and behaves as expected):
parser = argparse.ArgumentParser(...)
subparsers = parser.add_subparsers(...)
subparsers.required = True
but this does not:
parser = argparse.ArgumentParser(...)
subparsers = parser.add_subparsers(..., required=True)
heapq creates and works with min-heaps. Currently, the only way to do this
is to use _heapify_max instead of heapify, heapq._heappop_max instead of
These methods should be exposed using a reverse keyword argument rather
than as private methods just like sort.
I thought perhaps we could allow the usage of a "new" keyword to
instanciate an object, ie:
obj = new yourmodule.YourClass()
In this case, it would behave the same as from yourmodule import YourClass;
obj = YourClass(), except that it wouldn't need to be imported. This would
also eliminate the need to manage an import list at the beginning of a
script in most case.
I'm really not proud of this idea but PHP has had autoload for years and
when i open scripts with hundred lines of imports it makes me think Python
could do something about this.
Thanks in advance for your feedback
> 2018-03-03 8:40 GMT+01:00 Nick Coghlan <ncoghlan(a)gmail.com>:
>> pairs = [(f(y), g(y)) for x in things with bind(h(x)) as y]
I don't mucn like "with bind(h(x)) as y" because it's kind of
like an abstraction inversion -- you're building something
complicated on top of something complicated in order to get
something simple, instead of just having the simple thing
to begin with. If nothing else, it has a huge runtime cost
for the benefit it gives.
I was thinking: perhaps it would be nice to be able to quicky split a
string, do some slicing, and then obtaining the joined string back.
Say we have the string: "docs.python.org", and we want to change "docs" to
"wiki". Of course, there are a ton of simpler ways to solve this particular
need, but perhaps str could have something like this:
spam = "docs.python.org"
eggs = "wiki." + spam['.'][1:]
A quick implementation to get the idea and try it:
def __getitem__(self, item):
if isinstance(item, str):
return Mystr_helper(self, item)
def __init__(self, obj, sep):
self.obj = obj
self.sep = sep
def __getitem__(self, item):
What are your thoughts?
Greetings from Argentina.