Hello and sorry if this is an old and over-discussed topic. I could not find such discussions here.
This was discussed and rejected at https://bugs.python.org/issue16385, but I am not convinced that the discussion presented all arguments properly. I tried reopening the bug at http://bugs.python.org/issue26910, but was told I should discuss it here instead.
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
These are the arguments presented in favor of the rejection, followed by my rebuttals.
1. "An error is out of the question for compatibility reasons". No real rebuttal here, except I wonder if exceptions are ever made and under what circumstances. I should point out that a warning may also create incompatibilities.
2. "There are ways to rewrite this as a loop on a list". Yes of course, but the entire point of the dictionary literal is to offer a concise and convenient notation for entering a dictionary as program text.
3. "Being able to re-write keys is fundamental to Python dicts and why they can be used for Python's mutable namespaces". This is fine but it applies to the data structure, not the literal constructor.
4. "A code generator could depend on being able to write duplicate keys without having to go back and erase previous output". Yes, but it seems wrong to facilitate automated code generation at the expense of human code writing. For hand-written code, I claim that in most (all?) cases it would be preferable to have the compiler detect key duplications. It is easier for an automated code generator to check for duplications than it is for a human.
5. "There is already pylint". True, but it forces an additional step.
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
(additional search keywords: dict constructor, dict literal)
With Python 3.5, you can unpack in dictionaries:
d = {**foo, **bar}
It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons.
So even if you have good points, it's not possible to do this.
Le 02/05/2016 23:36, Luigi Semenzato a écrit :
Hello and sorry if this is an old and over-discussed topic. I could not find such discussions here.
This was discussed and rejected at https://bugs.python.org/issue16385, but I am not convinced that the discussion presented all arguments properly. I tried reopening the bug at http://bugs.python.org/issue26910, but was told I should discuss it here instead.
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
These are the arguments presented in favor of the rejection, followed by my rebuttals.
- "An error is out of the question for compatibility reasons". No
real rebuttal here, except I wonder if exceptions are ever made and under what circumstances. I should point out that a warning may also create incompatibilities.
- "There are ways to rewrite this as a loop on a list". Yes of
course, but the entire point of the dictionary literal is to offer a concise and convenient notation for entering a dictionary as program text.
- "Being able to re-write keys is fundamental to Python dicts and why
they can be used for Python's mutable namespaces". This is fine but it applies to the data structure, not the literal constructor.
- "A code generator could depend on being able to write duplicate
keys without having to go back and erase previous output". Yes, but it seems wrong to facilitate automated code generation at the expense of human code writing. For hand-written code, I claim that in most (all?) cases it would be preferable to have the compiler detect key duplications. It is easier for an automated code generator to check for duplications than it is for a human.
- "There is already pylint". True, but it forces an additional step.
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
(additional search keywords: dict constructor, dict literal) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
With Python 3.5, you can unpack in dictionaries:
d = {**foo, **bar}
It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons.
So even if you have good points, it's not possible to do this.
Le 02/05/2016 23:36, Luigi Semenzato a écrit :
Hello and sorry if this is an old and over-discussed topic. I could not find such discussions here.
This was discussed and rejected at https://bugs.python.org/issue16385, but I am not convinced that the discussion presented all arguments properly. I tried reopening the bug at http://bugs.python.org/issue26910, but was told I should discuss it here instead.
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
These are the arguments presented in favor of the rejection, followed by my rebuttals.
- "An error is out of the question for compatibility reasons". No
real rebuttal here, except I wonder if exceptions are ever made and under what circumstances. I should point out that a warning may also create incompatibilities.
- "There are ways to rewrite this as a loop on a list". Yes of
course, but the entire point of the dictionary literal is to offer a concise and convenient notation for entering a dictionary as program text.
- "Being able to re-write keys is fundamental to Python dicts and why
they can be used for Python's mutable namespaces". This is fine but it applies to the data structure, not the literal constructor.
- "A code generator could depend on being able to write duplicate
keys without having to go back and erase previous output". Yes, but it seems wrong to facilitate automated code generation at the expense of human code writing. For hand-written code, I claim that in most (all?) cases it would be preferable to have the compiler detect key duplications. It is easier for an automated code generator to check for duplications than it is for a human.
- "There is already pylint". True, but it forces an additional step.
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
(additional search keywords: dict constructor, dict literal) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
It'd still be possible to do it for literal listed keys only if someone wanted to figure out how to do it at parsing time rather than runtime.
On Tue, May 3, 2016 at 12:06 AM Michel Desmoulin desmoulinmichel@gmail.com wrote:
With Python 3.5, you can unpack in dictionaries:
d = {**foo, **bar}
It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons.
So even if you have good points, it's not possible to do this.
Le 02/05/2016 23:36, Luigi Semenzato a écrit :
Hello and sorry if this is an old and over-discussed topic. I could not find such discussions here.
This was discussed and rejected at https://bugs.python.org/issue16385, but I am not convinced that the discussion presented all arguments properly. I tried reopening the bug at http://bugs.python.org/issue26910, but was told I should discuss it here instead.
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
These are the arguments presented in favor of the rejection, followed by my rebuttals.
- "An error is out of the question for compatibility reasons". No
real rebuttal here, except I wonder if exceptions are ever made and under what circumstances. I should point out that a warning may also create incompatibilities.
- "There are ways to rewrite this as a loop on a list". Yes of
course, but the entire point of the dictionary literal is to offer a concise and convenient notation for entering a dictionary as program text.
- "Being able to re-write keys is fundamental to Python dicts and why
they can be used for Python's mutable namespaces". This is fine but it applies to the data structure, not the literal constructor.
- "A code generator could depend on being able to write duplicate
keys without having to go back and erase previous output". Yes, but it seems wrong to facilitate automated code generation at the expense of human code writing. For hand-written code, I claim that in most (all?) cases it would be preferable to have the compiler detect key duplications. It is easier for an automated code generator to check for duplications than it is for a human.
- "There is already pylint". True, but it forces an additional step.
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
(additional search keywords: dict constructor, dict literal) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
That would introduce a very weird special case. All those would work:
foo = 1 bar = "1"
{1: True, 2 - 1: True, foo: True, int(bar): True, int("1"): True, **{1: True}}
But not this:
{1: True, 1: True}
It would be very confusing.
Le 03/05/2016 09:08, Gregory P. Smith a écrit :
It'd still be possible to do it for literal listed keys only if someone wanted to figure out how to do it at parsing time rather than runtime.
On Tue, May 3, 2016 at 12:06 AM Michel Desmoulin <desmoulinmichel@gmail.com mailto:desmoulinmichel@gmail.com> wrote:
With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar} It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons. So even if you have good points, it's not possible to do this. Le 02/05/2016 23:36, Luigi Semenzato a écrit : > Hello and sorry if this is an old and over-discussed topic. I could > not find such discussions here. > > This was discussed and rejected at https://bugs.python.org/issue16385, > but I am not convinced that the discussion presented all arguments > properly. I tried reopening the bug at > http://bugs.python.org/issue26910, but was told I should discuss it > here instead. > > The original problem description: > > lives_in = { 'lion': ['Africa', 'America'], > 'parrot': ['Europe'], > #... 100+ more rows here > 'lion': ['Europe'], > #... 100+ more rows here > } > > The above constructor overwrites the first 'lion' entry silently, > often causing unexpected behavior. > > These are the arguments presented in favor of the rejection, followed > by my rebuttals. > > 1. "An error is out of the question for compatibility reasons". No > real rebuttal here, except I wonder if exceptions are ever made and > under what circumstances. I should point out that a warning may also > create incompatibilities. > > 2. "There are ways to rewrite this as a loop on a list". Yes of > course, but the entire point of the dictionary literal is to offer a > concise and convenient notation for entering a dictionary as program > text. > > 3. "Being able to re-write keys is fundamental to Python dicts and why > they can be used for Python's mutable namespaces". This is fine but > it applies to the data structure, not the literal constructor. > > 4. "A code generator could depend on being able to write duplicate > keys without having to go back and erase previous output". Yes, but > it seems wrong to facilitate automated code generation at the expense > of human code writing. For hand-written code, I claim that in most > (all?) cases it would be preferable to have the compiler detect key > duplications. It is easier for an automated code generator to check > for duplications than it is for a human. > > 5. "There is already pylint". True, but it forces an additional step. > > For context, someone ran into this problem in my team at Google (we > fixed it using pylint). I haven't seen any valid reason (in the bug > or elsewhere) in favor of these constructor semantics. From the > discussions I have seen, it seems just an oversight in the > implementation/specification of dictionary literals. I'd be happy to > hear stronger reasoning in favor of the status quo. > > (additional search keywords: dict constructor, dict literal) > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org <mailto:Python-ideas@python.org> > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
That would introduce a very weird special case. All those would work:
foo = 1 bar = "1"
{1: True, 2 - 1: True, foo: True, int(bar): True, int("1"): True, **{1: True}}
But not this:
{1: True, 1: True}
It would be very confusing.
Le 03/05/2016 09:08, Gregory P. Smith a écrit :
It'd still be possible to do it for literal listed keys only if someone wanted to figure out how to do it at parsing time rather than runtime.
On Tue, May 3, 2016 at 12:06 AM Michel Desmoulin <desmoulinmichel@gmail.com mailto:desmoulinmichel@gmail.com> wrote:
With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar} It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons. So even if you have good points, it's not possible to do this. Le 02/05/2016 23:36, Luigi Semenzato a écrit : > Hello and sorry if this is an old and over-discussed topic. I could > not find such discussions here. > > This was discussed and rejected at https://bugs.python.org/issue16385, > but I am not convinced that the discussion presented all arguments > properly. I tried reopening the bug at > http://bugs.python.org/issue26910, but was told I should discuss it > here instead. > > The original problem description: > > lives_in = { 'lion': ['Africa', 'America'], > 'parrot': ['Europe'], > #... 100+ more rows here > 'lion': ['Europe'], > #... 100+ more rows here > } > > The above constructor overwrites the first 'lion' entry silently, > often causing unexpected behavior. > > These are the arguments presented in favor of the rejection, followed > by my rebuttals. > > 1. "An error is out of the question for compatibility reasons". No > real rebuttal here, except I wonder if exceptions are ever made and > under what circumstances. I should point out that a warning may also > create incompatibilities. > > 2. "There are ways to rewrite this as a loop on a list". Yes of > course, but the entire point of the dictionary literal is to offer a > concise and convenient notation for entering a dictionary as program > text. > > 3. "Being able to re-write keys is fundamental to Python dicts and why > they can be used for Python's mutable namespaces". This is fine but > it applies to the data structure, not the literal constructor. > > 4. "A code generator could depend on being able to write duplicate > keys without having to go back and erase previous output". Yes, but > it seems wrong to facilitate automated code generation at the expense > of human code writing. For hand-written code, I claim that in most > (all?) cases it would be preferable to have the compiler detect key > duplications. It is easier for an automated code generator to check > for duplications than it is for a human. > > 5. "There is already pylint". True, but it forces an additional step. > > For context, someone ran into this problem in my team at Google (we > fixed it using pylint). I haven't seen any valid reason (in the bug > or elsewhere) in favor of these constructor semantics. From the > discussions I have seen, it seems just an oversight in the > implementation/specification of dictionary literals. I'd be happy to > hear stronger reasoning in favor of the status quo. > > (additional search keywords: dict constructor, dict literal) > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org <mailto:Python-ideas@python.org> > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 03/05/2016 08:22, Michel Desmoulin wrote:
That would introduce a very weird special case. All those would work:
foo = 1 bar = "1"
{1: True, 2 - 1: True, foo: True, int(bar): True, int("1"): True, **{1: True}}
But not this:
{1: True, 1: True}
Perfectly true.
It would be very confusing.
It could conceivably confuse someone, but the example quoted by the OP with all literal keys is realistic and yours is contrived. There is practical benefit in spotting duplicate literal keys, if it can be done with either an error or a warning (whether at compile time or run time). There is even a kind of analogy:
dict(lion=1, lion=2) SyntaxError: keyword argument repeated
Is that also confusing?
I am still not seeing good objections to not adding at least a warning.
On Tue, May 3, 2016 at 12:06 AM Michel Desmoulin <desmoulinmichel@gmail.com mailto:desmoulinmichel@gmail.com> wrote:
With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar}
and similarly:
foo = 1 bar = "1"
{1: True, 2 - 1: True, foo: True, int(bar): True, int("1"): True, **{1: True}}
But not this:
{1: True, 1: True}
Probably I didn't make it sufficiently clear in my OP, but this is only for duplicate literal keys in the same constructor, which would cover the vast majority of user errors.
A well-known precedent is the divide-by-zero warning in C. The GNU compiler produces a warning when, after constant folding, an expression contains a division by zero. For instance:
test.c:9:12: warning: division by zero [-Wdiv-by-zero] int z = 1/0;
and this:
test.c:9:12: warning: division by zero [-Wdiv-by-zero] int z = 1/(1 - 1);
but this code doesn't produce a warning:
const int x = 0; int z = 1/x;
For the dict constructor, I don't think that the constant folding is even particularly useful, although it wouldn't hurt.
Implementation-wise, checking for duplicate literals while parsing is just a matter of maintaining a hash table for that scope. This is already done for identifiers, and probably heavily optimized, so duplicate detection wouldn't add much code or be particularly expensive.
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato luigi@semenzato.com wrote:
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
Do you feel that "prefer status quo" is not strong reasoning? http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
Your word choice of "[no] valid reason" is interesting. In the bug tracker discussion, folks argue that the problem is easily solved with code style standards and static analyzers. I thought that was valid.
Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key. Creating a special case for the parser to enforce uniqueness of number and string literals as keys seems more trouble than its worth.
On 05/03/2016 12:21 PM, Michael Selik wrote:
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato wrote:
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
Do you feel that "prefer status quo" is not strong reasoning? http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
It is not strong enough to prevent every good idea, or Python would be a static language (as in, no more changes except maybe bug fixes).
Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key.
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
Creating a special case for the parser to enforce uniqueness of number and string literals as keys
I'm pretty sure the OP would be happy with uniqueness of keys, whether those keys were string literals, numeric literals, or function objects.
seems more trouble than its worth.
Maybe. That is what we are discussing.
-- ~Ethan~
On Tue, May 3, 2016 at 4:00 PM Ethan Furman ethan@stoneleaf.us wrote:
On 05/03/2016 12:21 PM, Michael Selik wrote:
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato wrote:
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
Do you feel that "prefer status quo" is not strong reasoning?
http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
It is not strong enough to prevent every good idea, or Python would be a static language (as in, no more changes except maybe bug fixes).
Of course. I'm no more saying "stop change" than you are saying "all change is good". Luigi, as with many who have ideas for new features, appears to be trying to shift the burden of proof. I think it's well established that the burden of proof (proving it's a good idea) rests on the one advocating change.
Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key.
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
Creating a special case for the parser to enforce uniqueness of number and string literals as keys
I'm pretty sure the OP would be happy with uniqueness of keys, whether those keys were string literals, numeric literals, or function objects.
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
seems more trouble than its worth.
Maybe. That is what we are discussing.
Indeed. Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often Pylint has caught this mistake.
On 05/03/2016 01:43 PM, Michael Selik wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
I'm willing to say that should be done with an existing dict, not in a literal.
I'm pretty sure the OP would be happy with uniqueness of keys, whether those keys were string literals, numeric literals, or function objects.
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
Easy: Don't Do That. ;)
Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often
Pylint
has caught this mistake.
Well, when this happened to me I spent a loooonnnnnngggg time figuring out what the problem is. One just doesn't expect duplicate keys to not raise:
--> dict(one=1, one='uno') File "<stdin>", line 1 SyntaxError: keyword argument repeated
So I guess the current advice is: Don't use dict literals, use dict() instead. And that just seems silly. ;)
-- ~Ethan~
On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
On 05/03/2016 01:43 PM, Michael Selik wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
I'm willing to say that should be done with an existing dict, not in a literal.
But are you willing to say that the compiler should enforce that stylistic judgement?
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
Easy: Don't Do That. ;)
I see your wink, so I presume you're not actually suggesting that the compiler (1) prohibit all function calls in dict displays, or (2) hard-code the function name "random" in a black list.
So what are you actually suggesting? Michael is raising a good point here. If you don't like random as an example, how about:
d = {spam(a): 'a', spam(b): 'BB', spam(c): 'Ccc'}
I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do?
It's one thing to say "duplicate keys should be prohibited", and another to come up with a detailed explanation of what precisely should happen.
Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often Pylint has caught this mistake.
Well, when this happened to me I spent a loooonnnnnngggg time figuring out what the problem is.
Okay, but how often does this happen? Daily? Weekly? Once per career?
What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?
One just doesn't expect duplicate keys to not raise:
--> dict(one=1, one='uno') File "<stdin>", line 1 SyntaxError: keyword argument repeated
Huh, I had completely forgotten that duplicate keyword arguments raise a syntax error. I thought you got a runtime TypeError.
There's a difference though. Keyword argument *names* must be identifiers, not expressions, and duplicates can be recognised by the compiler at compile-time:
py> def spam(**kw): pass ... py> spam(eggs=print("side-effect")) side-effect py> spam(eggs=print("side-effect"), eggs=print('another side-effect')) File "<stdin>", line 1 SyntaxError: keyword argument repeated
Duplicate keys are not, and in general can't be:
py> d = {'eggs': print("side-effect"), ... 'eggs': print('another side-effect')} side-effect another side-effect py> d {'eggs': None}
If you think of the dict constructor as something like:
for key,item in initial_values: self[key] = item
then the current behaviour is perfectly valid, and the advice is, if you don't want duplicate keys, don't use them in the first place.
If you think of it as:
for key,item in initial_values: if key in self: raise TypeError('duplicate key') else: self[key] = item
then you have to deal with the fact that you might only notice a duplicate after mutating your dict, which may include side-effects.
On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
On 05/03/2016 01:43 PM, Michael Selik wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
I'm willing to say that should be done with an existing dict, not in a literal.
But are you willing to say that the compiler should enforce that stylistic judgement?
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
Easy: Don't Do That. ;)
I see your wink, so I presume you're not actually suggesting that the compiler (1) prohibit all function calls in dict displays, or (2) hard-code the function name "random" in a black list.
Indeed.
So what are you actually suggesting? Michael is raising a good point here. If you don't like random as an example, how about:
d = {spam(a): 'a', spam(b): 'BB', spam(c): 'Ccc'}
I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do?
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined).
Also, any dict display that is able to be used for dict creation at compile time (thereby avoiding the run-time logic) should have the same checks and raise the same error if duplicate keys are found (I imagine both run-time and compile-time dict creation from dict displays would use the same underlying function).
It's one thing to say "duplicate keys should be prohibited", and another to come up with a detailed explanation of what precisely should happen.
Hopefully my explanation is detail enough.
Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often Pylint has caught this mistake.
Well, when this happened to me I spent a loooonnnnnngggg time figuring out what the problem is.
Okay, but how often does this happen? Daily? Weekly? Once per career?
Once a year, maybe. The experience is painful enough to remember when the subject arises, but not painful enough to remember to be careful. ;)
What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?
Longer than an hour, shorter than a day.
One just doesn't expect duplicate keys to not raise:
--> dict(one=1, one='uno') File "<stdin>", line 1 SyntaxError: keyword argument repeated
There's a difference though. Keyword argument *names* must be identifiers, not expressions, and duplicates can be recognised by the compiler at compile-time:
I am largely unconcerned by the run-time/compile-time distinction in this case -- whenever it happens, check that the incoming key doesn't already exist.
If you think of it as:
for key,item in initial_values: if key in self: raise TypeError('duplicate key') else: self[key] = item
then you have to deal with the fact that you might only notice a duplicate after mutating your dict, which may include side-effects.
Not sure what you mean by "mutating your dict" -- we're still talking about initial "dict display" to "dict" conversion, yes?
-- ~Ethan~
On Tue, May 3, 2016 at 6:40 PM, Ethan Furman ethan@stoneleaf.us wrote:
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined).
Also, any dict display that is able to be used for dict creation at compile time (thereby avoiding the run-time logic) should have the same checks and raise the same error if duplicate keys are found (I imagine both run-time and compile-time dict creation from dict displays would use the same underlying function).
I agree with Ethan that this is the correct way of looking at this problem, rather than my original, confusing presentation. Duplicate keys in a dict display should be an error, period. In some cases the error can be detected at parse time, which is nicer. In other cases it can be detected only at run time, but it should always be an error.
Also, Steven D'Aprano steve@pearwood.info wrote:
Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them.
That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste.
Thank you, I will gladly take the "reasonable answer" qualification offered here. And I agree that often these issues build down to aesthetics. But it's hard for me to see how the current semantics support the Zen of Python. I wouldn't presume that all original design decisions were infallible, and I still haven't heard a single good argument in favor of *allowing* duplicate keys.
On Tue, May 03, 2016 at 06:58:43PM -0700, Luigi Semenzato wrote:
I still haven't heard a single good argument in favor of *allowing* duplicate keys.
Because duplicate keys are not necessarily an error. Maybe you don't care if there are duplicates. Maybe you expect duplicates, and want the last one seen to win.
Sometimes experienced programmers forget what it is like to be a beginning programmer. One of the most useful things a beginner can do is experiment at the interactive interpreter. The more things the language prohibits, the more errors the beginner gets, the less they learn, and the more intimidating the language is. Duplicate keys are not necessarily an error. It just means one value or the other is dropped.
How do dict displays construct the dict? Do they add keys from left to right, or right to left? I can read the source, if I can find it, and locate the right section of the right file, and if I can read C. Or I can do:
py> {1: "left", 1: "right"} {1: 'right'}
and in one line of code in the Python interactive interpreter I've learned more than in probably an hour of digging through C code.
To a beginner, being able to do things like this without Big Brother shouting at them "No! That's Wrong! Have An Exception!!!" all the time is literally without price. Who cares that it's a useless thing to do in production code? "Useless" is not an error.
Errors, and warnings, should be left for things which really are errors.
Even if you think my example doesn't matter, and you think that duplicate keys are always an error, it may be that this case is not bad enough to prohibit. You don't get this check for free. It requires code, and that code has to run, and it will run for every single dict display whether needed or not. Depending on how it is implemented, it might be quite expensive.
All those programmers who have carefully inspected their source code to ensure that the data used in the dict is correct, with no duplicates and no misspellings and no missed keys, they have no need of this check. But you would force it on them anyway, because one of your colleagues was sloppy and made a mistake and had difficulty debugging it. Okay, I have sympathy for you and your colleague, but that doesn't mean its okay for everyone to carry the cost of checking for your error.
Maybe it is justified. Maybe it isn't.
If you are really serious that you think duplicate keys are an error, then to be consistent you should also want to raise an error in the dict constructor, dict(obj, **kwargs). If not, what is your reasoning for why duplicate keys should be allowed in dict(...)?
Steven D'Aprano steve@pearwood.info writes:
How do dict displays construct the dict? Do they add keys from left to right, or right to left? I can read the source, if I can find it, and locate the right section of the right file, and if I can read C. Or I can do:
py> {1: "left", 1: "right"} {1: 'right'}
and in one line of code in the Python interactive interpreter I've learned more than in probably an hour of digging through C code.
The language reference explicitly says that the order from left to right and duplicate keys are also explicitly supported [1]:
If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.
[1] https://docs.python.org/3/reference/expressions.html#dictionary-displays
Akira
On 04/05/2016 02:40, Ethan Furman wrote:
On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
On 05/03/2016 01:43 PM, Michael Selik wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
I'm willing to say that should be done with an existing dict, not in a literal.
But are you willing to say that the compiler should enforce that stylistic judgement?
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
Easy: Don't Do That. ;)
I see your wink, so I presume you're not actually suggesting that the compiler (1) prohibit all function calls in dict displays, or (2) hard-code the function name "random" in a black list.
Indeed.
So what are you actually suggesting? Michael is raising a good point here. If you don't like random as an example, how about:
d = {spam(a): 'a', spam(b): 'BB', spam(c): 'Ccc'}
I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do?
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined).
Sorry, I favour raising an error for duplicate *literal constant* keys (presumably at compile time). But I think making the above example cause a runtime error if two keys happened to be equal would break too much existing code. Rob
On 4 May 2016 at 11:40, Ethan Furman ethan@stoneleaf.us wrote:
On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do?
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined).
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be.
The key point is that BUILD_MAP already has its own PyDict_SetItem() loop in ceval.c (iterating over stack entries), and hence doesn't rely on the "dict_common_update" code shared by dict.update and the dict constructor(s).
As such, the technical change being requested here (at least for CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check prior to each key insertion, and:
1. Report a DeprecationWarning in 3.6 2. Report a [RuntimeError? ValueError?] in 3.7+
For code generators emitting dict displays, running their content through OrderedDict (or an equivalent if the code generator is written in something other than Python) prior to writing it out will be sufficient to eliminate duplicates.
Since PyDict_Contains() is O(1), this shouldn't introduce a major slowdown in dict display evaluation (although the impact should still be measued prior to making any changes).
All-in-all, the concept gets a +0 from me, although to allow for implementations that don't share CPython's distinction between dict displays and the normal dict constructor, any such behaviour (if introduced) should likely be flagged as an implementation detail that may vary across interpreters (as I would have been -1 if BUILD_MAP's implementation wasn't already independent of dict_common_update).
Cheers, Nick.
On Wed, May 4, 2016 at 12:04 AM Nick Coghlan ncoghlan@gmail.com wrote:
As such, the technical change being requested here (at least for CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check prior to each key insertion, and:
- Report a DeprecationWarning in 3.6
- Report a [RuntimeError? ValueError?] in 3.7+
To be clear, this implementation would potentially raise an error in the random example -- ``{random(): 0, random(): 1}`` -- correct? If so, then code smell or not, I think that's too much breakage.
We could restrict the PyDict_Contains() call to only enforcing uniqueness of string or numeric literals, but then we'd still have some oddball situations:
def a(): return 'a'
{a(): 0, 'a': 1}
On 05/03/2016 09:20 PM, Michael Selik wrote:
On Wed, May 4, 2016 at 12:04 AM Nick wrote:
As such, the technical change being requested here (at least for CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check prior to each key insertion, and:
- Report a DeprecationWarning in 3.6
- Report a [RuntimeError? ValueError?] in 3.7+
To be clear, this implementation would potentially raise an error in the random example -- ``{random(): 0, random(): 1}`` -- correct? If so, then code smell or not, I think that's too much breakage.
Either fix it all the way or don't bother. The rule "duplicates are allowed" or "duplicates are not allowed" is much simpler and easier to remember than "duplicates are allowed except when..." or, contrariwise, "duplicates are not allowed unless ...".
-- ~Ethan~
On 04/05/2016 07:35, Ethan Furman wrote:
On 05/03/2016 09:20 PM, Michael Selik wrote:
On Wed, May 4, 2016 at 12:04 AM Nick wrote:
As such, the technical change being requested here (at least for CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check prior to each key insertion, and:
- Report a DeprecationWarning in 3.6
- Report a [RuntimeError? ValueError?] in 3.7+
To be clear, this implementation would potentially raise an error in the random example -- ``{random(): 0, random(): 1}`` -- correct? If so, then code smell or not, I think that's too much breakage.
Either fix it all the way or don't bother. The rule "duplicates are allowed" or "duplicates are not allowed" is much simpler and easier to remember than "duplicates are allowed except when..." or, contrariwise, "duplicates are not allowed unless ...".
I disagree, on the grounds that 'practicality beats purity' (sorry to repeat myself). Here's a thought. We are rightly concerned with changes breaking existing code. The proposed change could lead to some existing code being *repaired*, by pointing out a mistake that the author doesn't know is there. Rob Cliffe
On 4 May 2016 at 10:05, Rob Cliffe rob.cliffe@btinternet.com wrote:
Either fix it all the way or don't bother. The rule "duplicates are allowed" or "duplicates are not allowed" is much simpler and easier to remember than "duplicates are allowed except when..." or, contrariwise, "duplicates are not allowed unless ...".
I disagree, on the grounds that 'practicality beats purity' (sorry to repeat myself). Here's a thought. We are rightly concerned with changes breaking existing code. The proposed change could lead to some existing code being *repaired*, by pointing out a mistake that the author doesn't know is there.
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names:
colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", }
On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case.
So I think it's important for backward compatibility that this change be *only* for the case of literal values as keys (which is what the OP proposed, of course). And given that it needs to just be for literal values as keys, I personally think that makes it a case of "Special cases aren't special enough to break the rules", so I'm -0 on the proposal (I don't think it's a *bad* thing, in the constants-only form, just that it's not worth the effort - but if someone else makes the effort, I won't object).
Paul
I think "special cases aren't special enough to break the rules" make this a -1, there's no reason for the behaviour change besides enabling bad code.
On 4 May 2016 at 11:48, Paul Moore p.f.moore@gmail.com wrote:
On 4 May 2016 at 10:05, Rob Cliffe rob.cliffe@btinternet.com wrote:
Either fix it all the way or don't bother. The rule "duplicates are
allowed" or
"duplicates are not allowed" is much simpler and easier to remember than "duplicates are allowed except when..." or, contrariwise, "duplicates
are not
allowed unless ...".
I disagree, on the grounds that 'practicality beats purity' (sorry to
repeat
myself). Here's a thought. We are rightly concerned with changes breaking existing code. The proposed change could lead to some existing code being *repaired*, by pointing out a mistake that the author doesn't know is there.
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names:
colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", }
On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case.
So I think it's important for backward compatibility that this change be *only* for the case of literal values as keys (which is what the OP proposed, of course). And given that it needs to just be for literal values as keys, I personally think that makes it a case of "Special cases aren't special enough to break the rules", so I'm -0 on the proposal (I don't think it's a *bad* thing, in the constants-only form, just that it's not worth the effort - but if someone else makes the effort, I won't object).
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 05/04/2016 02:48 AM, Paul Moore wrote:
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names:
colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", }
On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case.
Actually, that code is setting BLUE = "Light Blue". ;)
So I think it's important for backward compatibility that this change be *only* for the case of literal values as keys (which is what the OP proposed, of course). And given that it needs to just be for literal values as keys, I personally think that makes it a case of "Special cases aren't special enough to break the rules", so I'm -0 on the proposal (I don't think it's a *bad* thing, in the constants-only form, just that it's not worth the effort - but if someone else makes the effort, I won't object).
I object. Having only /some/ duplicates checked for is overcomplicating without very good reason.
Perhaps the way forward is to check for all duplicates, but issue a Warning instead of raising an Exception?
To be honest, though, I'm starting to agree with Steven D'Aprano that sufficiently large dicts should be verified by other means, and those other means can also check for duplicates.
-- ~Ethan~
On 4 May 2016 at 13:42, Ethan Furman ethan@stoneleaf.us wrote:
On 05/04/2016 02:48 AM, Paul Moore wrote:
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names:
colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", }
On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case.
Actually, that code is setting BLUE = "Light Blue". ;)
You missed my point, I think. BLUE and LIGHT_BLUE are "constant" values (set once at the top of the program). In some circumstances, the programmer deliberately wants BLUE and LIGHT_BLUE to be the same value (but not in others). The programmer is happy that a constant BLUE will be decoded as "Light Blue" by the colors mapping in that case.
This is IMO perfectly valid code, and having it rejected by an over-zealous "duplicate check" is unacceptable.
As far as I am concerned, this example clearly demonstrates that any duplicate check should only be applied to actual constant literal values used as keys.
So I think it's important for backward compatibility that this change be *only* for the case of literal values as keys (which is what the OP proposed, of course). And given that it needs to just be for literal values as keys, I personally think that makes it a case of "Special cases aren't special enough to break the rules", so I'm -0 on the proposal (I don't think it's a *bad* thing, in the constants-only form, just that it's not worth the effort - but if someone else makes the effort, I won't object).
I object. Having only /some/ duplicates checked for is overcomplicating without very good reason.
Perhaps the way forward is to check for all duplicates, but issue a Warning instead of raising an Exception?
Warning on perfectly good code is just as obnoxious IMO.
To be honest, though, I'm starting to agree with Steven D'Aprano that sufficiently large dicts should be verified by other means, and those other means can also check for duplicates.
As far as I'm aware, the people who have looked at the implementation details seem to be saying that checking for duplicates on literals only is too hard to be practical, and that's the only variation I'd be willing to accept, so I agree - do duplicate checking by other means (assertions in the code being the obvious option).
Paul
On 05/04/2016 07:54 AM, Paul Moore wrote:
On 4 May 2016 at 13:42, Ethan Furman wrote:
On 05/04/2016 02:48 AM, Paul Moore wrote:
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names:
colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", }
On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case.
Actually, that code is setting BLUE = "Light Blue". ;)
You missed my point, I think. BLUE and LIGHT_BLUE are "constant" values (set once at the top of the program).
No, I got your point, and I agree. But one of us is confused about your explanation of that example (it could easily be me). Here's how I think it works:
RED = 1 BLUE = 2 PINK = 3 LIGHT_BLUE = 4 DARK_BLUE = 5
# oops, light blue not supported on this machine, just use blue LIGHT_BLUE = 2
colors = {
RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", }
colors
{1: 'Red', 2: 'Light Blue', 3: 'Pink', 5: 'Dark Blue'}
So my issue was not with your point, but with your explanation of the example of your point.
-- ~Ethan~
On 4 May 2016 at 17:06, Ethan Furman ethan@stoneleaf.us wrote:
colors
{1: 'Red', 2: 'Light Blue', 3: 'Pink', 5: 'Dark Blue'}
So my issue was not with your point, but with your explanation of the example of your point.
Ah, sorry. I hadn't checked and wasn't sure myself whether the first pairing or the last took priority. I tried to be vague enough that it didn't matter that I wasn't sure (my theoretical programmer who relied on this behaviour clearly *would* be sure :-)) but I obviously failed.
The moral of the story is to check my examples before posting :-) Paul
2016-05-04 6:03 GMT+02:00, Nick Coghlan ncoghlan@gmail.com:
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be.
[...]
- Report a DeprecationWarning in 3.6
- Report a [RuntimeError? ValueError?] in 3.7+
But what about SyntaxError instead of RuntimeError? Could it be easily done too?
PS. Probably nobody is interested in my tests with unicode literals which are "NFKC equivalent" but it help me to understand how things are done behind the scene.
def f(**kwd): for i in kwd: print(i, kwd[i])
def tst(): f(ij=1, ij=2) # this is syntax error
def tst2(): f(**{1:1}) # this is runtime error
def tst3(): f(**{'ij':1, 'ij':2}) # this is legal (for me a little unexpected, maybe bug?)
On Wed, May 04, 2016 at 02:03:59PM +1000, Nick Coghlan wrote:
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be.
[...]
All-in-all, the concept gets a +0 from me, although to allow for implementations that don't share CPython's distinction between dict displays and the normal dict constructor, any such behaviour (if introduced) should likely be flagged as an implementation detail that may vary across interpreters (as I would have been -1 if BUILD_MAP's implementation wasn't already independent of dict_common_update).
What do you think of MAL's suggestion of adding a CHECK_MAP op-code and raising a warning?
I'd be much more comfortable with a warning than an error, because I really don't think that duplicate keys is *necessarily* an error. I think Paul Moore is onto something with his example of BLUE/LIGHT_BLUE, even if he got the order the wrong way around :-)
This has been the documented behaviour of dicts since at least 1.5:
https://docs.python.org/release/1.5/ref/ref-7.html#HEADING7-35
and it matches the behaviour of both the dict constructor and dict comprehensions. I don't think anyone has made the case for turning it into an error, although I concede that it's perhaps worthy of a warning.
(At least, I won't object to it being a warning, provided we can turn it off :-)
On 05/04/2016 07:20 AM, Steven D'Aprano wrote:
On Wed, May 04, 2016 at 02:03:59PM +1000, Nick Coghlan wrote:
This has been the documented behaviour of dicts since at least 1.5:
https://docs.python.org/release/1.5/ref/ref-7.html#HEADING7-35
and it matches the behaviour of both the dict constructor and dict comprehensions.
Okay, we had dict displays and dicts, and now you say dict constructor... do you mean dict()? Because:
dict(a=1, a=1)
File "<stdin>", line 1 SyntaxError: keyword argument repeated
is certainly not allowing duplicates.
-- ~Ethan~
On Wed, May 04, 2016 at 09:09:22AM -0700, Ethan Furman wrote:
On 05/04/2016 07:20 AM, Steven D'Aprano wrote:
and it matches the behaviour of both the dict constructor and dict comprehensions.
Okay, we had dict displays and dicts, and now you say dict constructor... do you mean dict()? Because:
I was thinking of dict with a list of tuples as argument:
py> dict([(1, 'a'), (2, 'b'), (1, 'c')]) {1: 'c', 2: 'b'}
or an object with a keys method:
py> class K: ... def keys(self): ... return [1, 2, 1] ... def __getitem__(self, i): ... return 'abcd'[i] ... py> dict(K()) {1: 'b', 2: 'c'}
It's all about typing and reading.
Suppose Python didn't have a special syntax for dict displays. Then folks would likely write this:
dict(( (1, 11), (2, 22), ... ))
but then suppose someone says: "hey, folks do this a lot, let's make it more convenient" and introduce a curly braces notation, and suppose it allows overwriting, like it does now.
The new notation exists *exclusively* to make it easier to type and read dictionaries in the code---by a human. Except that with these semantics it makes it impossible to detect duplicate key errors, which the vast majority of users would prefer to notice (a bold and unproven statement, but a reasonable guess). What's worse, people don't realize the problem until they have many large dict displays, and many bugs. So they feel annoyed and betrayed.
Suppose instead that the new notation does NOT allow overwriting. If folks rely on the overwrite semantics, in either human-written or machine-generated code, they will immediately notice the problem and use dict(), with very little inconvenience to code generators. (And some to the humans, but remember, they are a tiny minority :)
The strict semantics are maybe marginally harder to explain, and maybe marginally harder to implement, but neither should be a show-stopper.
However, that's not where we are. There is the legacy code issue, and the legacy mindset. I don't know how to evaluate these issues, and I don't even know how much experience the community has with backward-incompatible changes. So, while I am fairly positive that it would have been a better choice to start with, I have no idea whether, when, and how this proposal should have any effect.
P.S. The issue of detecting duplicate keys at parse time is completely separate and it was my mistake to focus on it first.
P.P.S. The issue of whether a set display should allow duplicate elements is less important, since allowing them does not have destructive consequences.
On Wed, May 4, 2016, 1:37 PM Luigi Semenzato luigi@semenzato.com wrote:
It's all about typing and reading.
Suppose Python didn't have a special syntax for dict displays. Then folks would likely write this:
dict(( (1, 11), (2, 22), ... ))
but then suppose someone says: "hey, folks do this a lot, let's make it more convenient" and introduce a curly braces notation, and suppose it allows overwriting, like it does now.
The new notation exists *exclusively* to make it easier to type and read dictionaries in the code---by a human. Except that with these semantics it makes it impossible to detect duplicate key errors, which the vast majority of users would prefer to notice (a bold and unproven statement, but a reasonable guess).
I'm part of the minority(?) that doesn't care. I hope that helps adjust your odds ratio for that guess.
What's worse, people don't
realize the problem until they have many large dict displays, and many bugs. So they feel annoyed and betrayed.
Suppose instead that the new notation does NOT allow overwriting. If folks rely on the overwrite semantics, in either human-written or machine-generated code, they will immediately notice the problem and use dict(), with very little inconvenience to code generators. (And some to the humans, but remember, they are a tiny minority :)
Even a tiny minority of the Python community might be thousands of people.
The strict semantics are maybe marginally harder to explain, and maybe marginally harder to implement, but neither should be a show-stopper.
However, that's not where we are. There is the legacy code issue, and the legacy mindset. I don't know how to evaluate these issues, and I don't even know how much experience the community has with backward-incompatible changes.
Well, there's that transition from 2 to 3. You do that a few times and you get a reputation for instability.
I like to imagine a hidden line in the Zen of Python. Between the title and "Beautiful is better..." I mentally insert, "Backwards compatibility is important." How important, well, that's why it's Zen and not Rules.
So, while I am fairly positive that it
would have been a better choice to start with, I have no idea whether, when, and how this proposal should have any effect.
P.S. The issue of detecting duplicate keys at parse time is completely separate and it was my mistake to focus on it first.
P.P.S. The issue of whether a set display should allow duplicate elements is less important, since allowing them does not have destructive consequences. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On May 4, 2016 10:59 AM, "Michael Selik" michael.selik@gmail.com wrote:
On Wed, May 4, 2016, 1:37 PM Luigi Semenzato luigi@semenzato.com wrote:
read dictionaries in the code---by a human. Except that with these semantics it makes it impossible to detect duplicate key errors, which the vast majority of users would prefer to notice (a bold and unproven statement, but a reasonable guess).
I'm part of the minority(?) that doesn't care. I hope that helps adjust
your odds ratio for that guess.
I'm also part of that group (I question whether it is minority) who wishes to allow duplicate keys at the syntax level, but to let linters complain. While *most* duplicate literal keys in dict displays probably are bugs, the legitimate and intended uses are not rare enough to prohibit them altogether.
On 5/4/2016 12:03 AM, Nick Coghlan wrote:
On 4 May 2016 at 11:40, Ethan Furman ethan@stoneleaf.us wrote:
On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do?
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined).
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be.
The key point is that BUILD_MAP already has its own PyDict_SetItem() loop in ceval.c (iterating over stack entries), and hence doesn't rely on the "dict_common_update" code shared by dict.update and the dict constructor(s).
Changing only BUILD_MAP would invalidate current code equivalences and currently sensible optimizations. A toy example:
d1 = {'a': 'a1'} d2 = {f(): 'a2'} d1.update(d2) d1
{'a': 'a2'}
Sensible and comprehensible code transformation rule are important. Currently, the following rather trivial optimization of the above works.
d1 = {'a': 'a1', f(): 'a2'} d1
{'a': 'a2'}
The special rule for dict displays would invalidate this.
In my post yesterday in response to Luigi (where I began 'The thread...'), I gave 4 equivalent other ways to initialize a dict using a Python loop (include a dict comprehension). Using a dict display amounts to un-rolling any of the loops and replacing the Python loop with the C loop buried in ceval. Changing the operation of just that loop would break the current equivalence.
I suspect that the proposed change would introduce more bugs than it exposes.
On 05/05/2016 02:36, Terry Reedy wrote:
On 5/4/2016 12:03 AM, Nick Coghlan wrote:
On 4 May 2016 at 11:40, Ethan Furman ethan@stoneleaf.us wrote:
On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do?
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined).
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be.
The key point is that BUILD_MAP already has its own PyDict_SetItem() loop in ceval.c (iterating over stack entries), and hence doesn't rely on the "dict_common_update" code shared by dict.update and the dict constructor(s).
Changing only BUILD_MAP would invalidate current code equivalences and currently sensible optimizations. A toy example:
d1 = {'a': 'a1'} d2 = {f(): 'a2'} d1.update(d2) d1
{'a': 'a2'}
Sensible and comprehensible code transformation rule are important. Currently, the following rather trivial optimization of the above works.
d1 = {'a': 'a1', f(): 'a2'} d1
{'a': 'a2'}
The special rule for dict displays would invalidate this.
In my post yesterday in response to Luigi (where I began 'The thread...'), I gave 4 equivalent other ways to initialize a dict using a Python loop (include a dict comprehension). Using a dict display amounts to un-rolling any of the loops and replacing the Python loop with the C loop buried in ceval. Changing the operation of just that loop would break the current equivalence.
I suspect that the proposed change would introduce more bugs than it exposes.
The OP mentioned (even if he didn't explicitly produce it, understandable if it was very long) a real-life use case where a warning/error would have aided debugging. I find this case realistic. Can anyone produce a single real-life use case where repeated literal keys needed to be accepted without a warning or error? Here I'm throwing down the gauntlet to those who theorise about what (breakable) code *might* be out in the wild, and asking them to produce just one real-life case. Rob Cliffe
On Wed, May 4, 2016 at 10:40 PM Rob Cliffe rob.cliffe@btinternet.com wrote:
The OP mentioned (even if he didn't explicitly produce it, understandable if it was very long) a real-life use case where a warning/error would have aided debugging. I find this case realistic.
Certainly was. But it occurs relatively rarely and can be solved with linters and by stylistic changes like alphabetizing dict literals by key.
Can anyone produce a single real-life use case where repeated literal
keys needed to be accepted without a warning or error? Here I'm throwing down the gauntlet to those who theorise about what (breakable) code *might* be out in the wild, and asking them to produce just one real-life case.
I thought Paul Moore gave a good example. You can see him and Ethan Furman discussing it in a fork of this thread. There were a few code-generator examples. Were these not realistic enough?
On Thu, May 05, 2016 at 03:38:05AM +0100, Rob Cliffe wrote:
Here I'm throwing down the gauntlet
You can throw down whatever you like, but Guido has pronounced that this change is not going to happen without a PEP. So unless somebody is willing to write a PEP, further discussion is a waste of time.
On 5 May 2016 at 03:38, Rob Cliffe rob.cliffe@btinternet.com wrote:
The OP mentioned (even if he didn't explicitly produce it, understandable if it was very long) a real-life use case where a warning/error would have aided debugging. I find this case realistic. Can anyone produce a single real-life use case where repeated literal keys needed to be accepted without a warning or error? Here I'm throwing down the gauntlet to those who theorise about what (breakable) code *might* be out in the wild, and asking them to produce just one real-life case.
Not literal keys no. If the proposal is *solely* that duplicate literals are detected and objected to (I reserve judgement for now on the question of warning vs error) then I don't think there will be significant code breakage.
But even knowing that, the next steps are:
1. Produce a PEP (Guido has said this is a requirement) 2. Explain how this would be implemented in that PEP - because the only implementation techniques so far mentioned in this thread fail to restrict the check to literals. 3. Manage discussion on said PEP.
I'll await a PEP before contributing any further to the discussion. Paul
Late, but here are a few more notes:
1. The gnumpy package (for Py2) used expressions in a `set` constructor which could evaluate to duplicates. It isn't hard to imagine a similar trick with `dict`, though I don't know if it'd be as useful. From gnumpy.py: ``` _floatTypes = set((types.FloatType, numpy.float64, numpy.float32, getattr(numpy, 'float128', numpy.float64), getattr(numpy, 'float96', numpy.float64))) ```
Refactored a bit: ``` _floatTypes = {float, numpy.float64, numpy.float32, getattr(numpy, 'float128', float), getattr(numpy, 'float96', float)} ```
2. Even if the warning/error is compiler-only, `eval` and `exec` would have the same warning/error, and they are NOT always evaluating dev-written source code. (Though you can argue that `exec` and `eval` on strings are where you should be on high alert with regard to data validation.) On the other hand, a linter is only run on manually-written code.
3. A compile-time check doesn't necessarily give a false sense of security. It just needs the proper wording to indicate that it's a compile-time check. "Warning: Duplicate (constant|literal|string literal) keys detected."
On 5/3/2016 8:23 PM, Steven D'Aprano wrote:
One just doesn't expect duplicate keys to not raise:
--> dict(one=1, one='uno') File "<stdin>", line 1 SyntaxError: keyword argument repeated
Huh, I had completely forgotten that duplicate keyword arguments raise a syntax error. I thought you got a runtime TypeError.
In a sense, it should be.
def f(x): pass
f(1, x=2)
Traceback (most recent call last): File "<pyshell#12>", line 1, in <module> f(1, x=2) TypeError: f() got multiple values for argument 'x'
But the case of multiple values via duplicate keywords can be and is caught when parsing, and parse errors are always (except maybe for ...) SyntaxErrors.
On Tue, May 3, 2016, at 20:23, Steven D'Aprano wrote:
On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
On 05/03/2016 01:43 PM, Michael Selik wrote:
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
Easy: Don't Do That. ;)
I see your wink, so I presume you're not actually suggesting that the compiler (1) prohibit all function calls in dict displays, or (2) hard-code the function name "random" in a black list.
How about prohibit it anyway (on the grounds of the two key expressions being identical - no comment on spam(a) vs spam(b)), and if the user *really* wants it they'll do {random(): x for x in (0, 1)} [or {other: stuff, **{that}}]
On Tue, May 3, 2016 at 1:43 PM, Michael Selik michael.selik@gmail.com wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman ethan@stoneleaf.us wrote:
On 05/03/2016 12:21 PM, Michael Selik wrote:
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato wrote:
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
Do you feel that "prefer status quo" is not strong reasoning?
Correct. In this particular case I think that the change is better. It is simple, helpful, and doesn't seem to have huge implications (if any) in terms of documentation.
http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
It is not strong enough to prevent every good idea, or Python would be a static language (as in, no more changes except maybe bug fixes).
Correct.
Of course. I'm no more saying "stop change" than you are saying "all change is good". Luigi, as with many who have ideas for new features, appears to be trying to shift the burden of proof. I think it's well established that the burden of proof (proving it's a good idea) rests on the one advocating change.
Which gives me one more opportunity to present this case:
A dictionary literal is a convenient input format for structured data, and is used as such in many cases, for instance instead of equivalent JSON, which requires an additional module, an additional parsing step, and learning an additional syntax (a lot of Python programmers don't know JSON). The convenience, however, is hampered by the duplicate key issue.
Some of the strong points of Python are: 1. easy to learn; 2. does the right thing in most cases. It just seems to me that the right thing here would be to signal duplicate literal keys.
Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key.
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
Yes, this is absolutely true, but probably very rare, and arguably not a recommendable coding practice, therefore IMO it offers very little support for the status quo.
Creating a special case for the parser to enforce uniqueness of number and string literals as keys
I'm pretty sure the OP would be happy with uniqueness of keys, whether those keys were string literals, numeric literals, or function objects.
How would you handle an expression that evaluates differently for each call? For example:
{random(): 0, random(): 1}
We discussed this already. My request only applies to keys that are literals. (He's talking about function objects, not function calls.)
seems more trouble than its worth.
Maybe. That is what we are discussing.
Indeed. Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often Pylint has caught this mistake.
Pylint catches this mistake fine. But many Python programmers don't know about pylint either. For instance, a bunch of hardware engineers that wrote some python as a driver for a circuit simulator.
Anyway, thank you for making these points.
2016-05-03 21:21 GMT+02:00, Michael Selik michael.selik@gmail.com: [...]
Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key. Creating a special case for the parser to enforce uniqueness of number and string literals as keys seems more trouble than its worth.
Just for curiosity: keys could be valid identifier, could be different (at least in one POV) and still cause Syntax error as keyword argument repeated.
{'ij':1, 'ij':2} # this is dict with 2 keys and 2 different values
dict(ij=1, ij=2) # SyntaxError: keyword argument repeated
It is because parser is doing NFKC normalization for non-ascii identifiers - see https://www.python.org/dev/peps/pep-3131/
('\u0069\u006a' == 'ij' != 'ij' == '\u0133')
On May 3, 2016 3:21 PM, "Michael Selik" michael.selik@gmail.com wrote:
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato luigi@semenzato.com
wrote:
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
Do you feel that "prefer status quo" is not strong reasoning?
http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
Your word choice of "[no] valid reason" is interesting. In the bug
tracker discussion, folks argue that the problem is easily solved with code style standards and static analyzers. I thought that was valid.
"X is detectable", whether by tools, or by magic imps yelling at you when you do it, is not an argument for X.
When asking for a useful purpose for the current semantics, "prefer the status quo" is not strong enough. "Errors should never pass silently," and there is an argument on the table that duplicate spelled-out keys are probably written in error. He isn't shifting the burden, he's putting the ball in your court.
Literal dicts syntactically may have any expression as the key. Creating
a special case for the parser to enforce uniqueness of number and string literals as keys seems more trouble than its worth.
That is an argument. Literal keys, or a single name repeated in the key list, might be an error, but you can't say the same about expressions. If you only error on literal keys, then there is an inconsistency between literals and expressions.
By the way, unless [[the name is guaranteed not to be rebound in a closure called within the dict display]], multiple items with the same variable name as key isn't necessarily an error:
def f(): x=1 def g(): nonlocal x x += 1 return x return {x: g(), x: g(), x: g()}
print(f()) # {2: 2, 3: 3, 4: 4}
"Don't do that" doesn't say what the compiler behavior should be: should it analyze for nonlocal closures, or should it only check literals?
Hi Luigi,
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
[...]
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
[...]
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
As far as I can see, you haven't specified in *detail* what change you wish to propose. It is difficult for me to judge your proposal when I don't know precisely what you are proposing.
Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else?
What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same?
# duplicate literals, forbidden d = {100: 1, 100: 2}
# duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2}
You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display. (Don't worry, I make the same mistake.) But the point is, a "dict literal" (display) can contain keys which are not themselves literals, as above. Those keys can have arbitrarily complex semantics, including side-effects. What do you expect to happen?
Hi,
On Tue, May 3, 2016 at 4:51 PM, Steven D'Aprano steve@pearwood.info wrote:
Hi Luigi,
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
[...]
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
[...]
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
As far as I can see, you haven't specified in *detail* what change you wish to propose. It is difficult for me to judge your proposal when I don't know precisely what you are proposing.
Yes, and I apologize, because on further reflection I don't even know what is possible within the model of the Python interpreter.
Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else?
Is there such a thing as a SyntaxWarning? From my perspective it would be fine to make it a SyntaxError, but I am not sure it would be overall a good choice for legacy code (i.e. as an error it might break old code, and I don't know how many other things a new language specification is going to break).
It could also be a run-time error, but it might be nicer to detect it earlier. Maybe both.
What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same?
Correct. The errors that I am guessing matter the most are those for which folks copy-paste a key-value pair, where the key is a literal string, intending to change the key, and then forget to change it.
# duplicate literals, forbidden d = {100: 1, 100: 2}
Correct. Possibly caught during parsing.
# duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2}
For sure allowed during parsing, possibly caught at run-time.
You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display.
Yes sorry, bad use of the term. I meant the curly-brace constructor---and only that.
(Don't worry, I make the same mistake.) But the point is, a "dict literal" (display) can contain keys which are not themselves literals, as above. Those keys can have arbitrarily complex semantics, including side-effects. What do you expect to happen?
I'd mainly like to catch the "obvious" duplicates, just like the GNU C compiler catches the "obvious" divisions by zero. And by that I mean occurrences of duplicate string literals as keys. For instance:
At parse time:
{"foo": 333, "foo": 444} # forbidden---must catch {"foo": 333, "f" + "oo": 444} # OK to miss but also OK to catch {"foo": 333, function_returning_foo(): 444} # not caught (i.e. no miracles expected)
At run time, I still slightly suspect that it may be more useful to prohibit duplicate keys in the same constructor than it is to allow them, but I don't have a clear opinion, because of both legacy code and other possible semantic issues.
Thanks!
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/yPH6ukUAQjc/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
On 5/3/2016 8:46 PM, Luigi Semenzato wrote:
On Tue, May 3, 2016 at 4:51 PM, Steven D'Aprano steve@pearwood.info wrote:
The thread seems to partly be based on a confusion between literals and displays, which led to dict displays being called dict literals. A literal defines an int or string at the initial lexing or tokenizing phase of the parser. A display, like comprehensions, abbreviates runtime code for creating a composite collection object.
What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same?
Correct. The errors that I am guessing matter the most are those for which folks copy-paste a key-value pair, where the key is a literal string, intending to change the key, and then forget to change it.
# duplicate literals, forbidden d = {100: 1, 100: 2}
Correct. Possibly caught during parsing.
The above is equivalent to and an abbreviation of
d = {}; d[100] = 1; d[100] = 2
A bit silly, but quite legal. Every value replacement necessarily involves a 'duplicate key'. Both of the above are also equivalent to
d = {a:b for a,b in ((100,1), (100,2))}
This is also equivalent to
d = {} for a, b in ((100,1), (100,2)): d[a] = b
which is close to the python version of how the comprehension would implemented in python.
Here is a 5th version, which I expect is close to the actual CPython code for a dict display.
tem = [100, 1, 100, 2] # as in the top of the virtual machine stack n = 4 # number of items in display = 2 * number of pairs d = {} for i in range(-n, 0, 2): d[tem[i]] = tem[i+1]
We have 5 equivalent snippets, each of which could have the same bug. Why single out just one to raise an exception? My guess is that it is partly from mistakenly thinking of the one as a like a lexer literal rather than as runtime code.
On Wed, May 4, 2016 at 1:23 PM, Terry Reedy tjreedy@udel.edu wrote:
Why single out just one to raise an exception? My guess is that it is partly from mistakenly thinking of the one as a like a lexer literal rather than as runtime code.
But the lexer's definition of "literal" is extremely narrow, and doesn't reflect the way people think about them. Most people's idea of a literal is closer to what ast.literal_eval can accept:
Help on function literal_eval in module ast:
literal_eval(node_or_string) Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
ast.literal_eval("[1,-2,(3+4j)]")
[1, -2, (3+4j)]
Neither -2 nor (3+4j) is technically a literal, and certainly list display isn't a literal, but "literal_eval" is happy to parse them. And that's how people think about them. Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment. Can you show any good code that reuses a key in a single dict display? Wouldn't it at best be code smell?
That doesn't necessarily mean that the compiler should stop you from doing it; but I do think that there's a fundamental difference in expectation here. And if the check happens, it wants to be a compile-time check, not a run-time one.
ChrisA
On Tue, May 3, 2016, at 23:31, Chris Angelico wrote:
Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment.
To play devil's advocate for a minute here...
Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to f(**{'a': 1}, **{'a': 2}) [where 'a' may not be the only key present and they may not be literal^Wdisplays], yet that is prohibited. Why not remove that check, too?
If we can blithely pass the same constant key twice to BUILD_MAP, why *shouldn't* we be able to do the same to CALL_FUNCTION? Any argument against the latter applies equally to the former, as far as I can see.
On Wed, May 4, 2016 at 12:19 AM Random832 random832@fastmail.com wrote:
On Tue, May 3, 2016, at 23:31, Chris Angelico wrote:
Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment.
If we can blithely pass the same constant key twice to BUILD_MAP, why *shouldn't* we be able to do the same to CALL_FUNCTION?
Because keyword arguments can only be valid identifiers, rather than expressions. There's absolutely no reason to use the same identifier as keyword twice, but there could be some oddball reasons to use the same expression as dict key twice.
[I know I've said that before, but it's a long thread. My apologies if you'd already read that.]
On Wed, May 4, 2016 at 12:40 AM, Michael Selik michael.selik@gmail.com wrote:
On Wed, May 4, 2016 at 12:19 AM Random832 random832@fastmail.com wrote:
On Tue, May 3, 2016, at 23:31, Chris Angelico wrote:
Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment.
If we can blithely pass the same constant key twice to BUILD_MAP, why *shouldn't* we be able to do the same to CALL_FUNCTION?
Because keyword arguments can only be valid identifiers, rather than expressions. There's absolutely no reason to use the same identifier as keyword twice, but there could be some oddball reasons to use the same expression as dict key twice.
The uniqueness rule applies to splatted args, too.
def printn(*args, **kwargs): print(*args, **kwargs, sep='\n')
printn('hello', 'world', sep='\n') # => TypeError: print() got multiple values for keyword argument 'sep'
Not a very useful example, but if you want to override or ignore incoming kwargs...
......
For semantics, I see a few possibilities (either error or warning):
0. Detect duplicate literal keys at compile-time.
1. Detect duplicate literals and hereditarily-literal displays and expressions at compile-time. (This includes keys which are tuple displays containing only hereditarily-literal elements, expressions of literals, and, er, implicit string concatenation like `'hello' " " 'world'`. As keys, these things are guaranteed to be immutable.)
2. Also statically analyze for duplicate variables as keys. (I think if it's a local variable that isn't captured by a closure using `nonlocal`, it is safe to assume that it's constant for that dict display.)
3. Constant-folding. (This leads to the dark path of static analysis shipping with the language.)
float('inf'). Analyze at runtime.
1j. Declare that duplicate keys in a dict display causes underdefined behavior, and let interpreters decide how to handle it.
-1. Identical expressions are warned against as duplicate keys.
(By the way, anything that happens for dicts should also happen for sets.)
On 05/03/2016 09:18 PM, Random832 wrote:
To play devil's advocate for a minute here...
Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to
f(**{'a': 1}, **{'a': 2})
Actually, I believe that is allowed now, although I'm not sure how far it goes.
-- ~Ethan~
On Wed, May 4, 2016 at 2:32 AM, Ethan Furman ethan@stoneleaf.us wrote:
On 05/03/2016 09:18 PM, Random832 wrote:
To play devil's advocate for a minute here...
Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to
f(**{'a': 1}, **{'a': 2})
Actually, I believe that is allowed now, although I'm not sure how far it goes.
As of Python 3.5.1, it is still not allowed:
C:\Users\Jonathan>python Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
def func(**kwargs):
... pass ...
func(**{'a': 1}, **{'a': 2})
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: func() got multiple values for keyword argument 'a'
On 05/03/2016 11:39 PM, Jonathan Goble wrote:
On Wed, May 4, 2016 at 2:32 AM, Ethan Furman wrote:
On 05/03/2016 09:18 PM, Random832 wrote:
To play devil's advocate for a minute here...
Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to
f(**{'a': 1}, **{'a': 2})
Actually, I believe that is allowed now, although I'm not sure how far it goes.
As of Python 3.5.1, it is still not allowed:
C:\Users\Jonathan>python Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
def func(**kwargs):
... pass ...
func(**{'a': 1}, **{'a': 2})
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: func() got multiple values for keyword argument 'a'
Ah, right. The PEP was changed:
http://bugs.python.org/issue2292#msg234413
-- ~Ethan~
On Wed, May 04, 2016 at 12:18:59AM -0400, Random832 wrote:
On Tue, May 3, 2016, at 23:31, Chris Angelico wrote:
Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment.
To play devil's advocate for a minute here...
Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to f(**{'a': 1}, **{'a': 2}) [where 'a' may not be the only key present and they may not be literal^Wdisplays], yet that is prohibited. Why not remove that check, too?
(1) Because status quo wins. In the absence of a really compelling reason to remove that check, we don't have to justify prohibing duplicate keyword args, we just have to note that it is already in place and so we shouldn't change it.
(2) Duplicate keys in a dict may have side-effects, which mean they do something. Duplicate keyword arguments don't have side-effects (not in the keys at least). So that's a difference.
(3) The analogy between function calls and dicts is not really that strong. The dict constuctor intentionally allows certain duplicates:
py> dict({'a': None}, a=2) {'a': 2}
for good reason (think of over-riding the values provided in the input dict with keyword arguments), but it's not clear that there is an analogy for function calls. Perhaps there is, and if it is a good enough analogy, we might have to re-think the duplicate kwargs situation.
On Wed, May 4, 2016, at 08:22, Steven D'Aprano wrote:
(1) Because status quo wins. In the absence of a really compelling reason to remove that check, we don't have to justify prohibing duplicate keyword args, we just have to note that it is already in place and so we shouldn't change it.
(2) Duplicate keys in a dict may have side-effects, which mean they do something. Duplicate keyword arguments don't have side-effects (not in the keys at least). So that's a difference.
f(a=1, b=2, **{foo(): bar()}) where foo may return 'a' or 'b', though this is the first that I've heard that the side-effect argument only applies to the keys.
(3) The analogy between function calls and dicts is not really that strong. The dict constuctor intentionally allows certain duplicates:
py> dict({'a': None}, a=2) {'a': 2}
for good reason
but not {**{'a': None}, **{'a': 2}})
(think of over-riding the values provided in the input dict with keyword arguments), but it's not clear that there is an analogy for function calls. Perhaps there is, and if it is a good enough analogy, we might have to re-think the duplicate kwargs situation.
foo(**{'a':None}, a=2) seems like a perfect analogy to me.
On 5/3/2016 11:31 PM, Chris Angelico wrote:
On Wed, May 4, 2016 at 1:23 PM, Terry Reedy tjreedy@udel.edu wrote:
Why single out just one to raise an exception? My guess is that it is partly from mistakenly thinking of the one as a like a lexer literal rather than as runtime code.
This was a preface to the main points that followed, and which were snipped from this followup.
But the lexer's definition of "literal" is extremely narrow, and doesn't reflect the way people think about them. Most people's idea of a literal is closer to what ast.literal_eval can accept:
[snip]
I am obviously aware of that. My main point, not addressed by this response, is that 1. dict displays are currently equivalence to several forms of Python code with more explicit looping, including at least two that are commonly used to initialize dicts; and 2. that changing the semantics of dict displays and only dict displays will break this equivalence.
In my response just not to Nick, I was more explicit that breaking this equivalence now, after 20+ years, will break code that depends on the current equivalence and break people's currently true expectations.
On Tue, May 03, 2016 at 05:46:33PM -0700, Luigi Semenzato wrote:
[...]
Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else?
Is there such a thing as a SyntaxWarning?
Yes there is.
From my perspective it would be fine to make it a SyntaxError, but I am not sure it would be overall a good choice for legacy code (i.e. as an error it might break old code, and I don't know how many other things a new language specification is going to break).
It could also be a run-time error, but it might be nicer to detect it earlier. Maybe both.
There are serious limits to what the compiler can detect at compile-time. So unless you have your heart set on a compile-time SyntaxError (or Warning) it might be best to forget all about compile-time detection, ignore the question of "literals", and just focus on run-time TypeError if a duplicate key is detected.
Otherwise, you will have the situation where Python only detects *some* duplicate keys, and your colleague will be cursing that Python does such a poor job of detecting duplicates. And it will probably be implementation-dependent, e.g. if your Python compiler does constant folding it might detect {1+1: None, 2: None} as duplicates, but if it doesn't have constant folding (or you have turned it off), it won't.
So with implementation-dependent compile-time checks, you can't even guarantee what will be detected. In that case, you might as well use a linter.
As far as I am concerned, a compile-time check is next-to-useless. It will be more annoying than useful, since it will give people a false sense of security, while still missing duplicates.
So I intend to only discuss a run-time solution, which has the advantage that Python will detect a much wider range of duplicates: not just:
{"SPAM": None, "SPAM": None}
for example, but also:
{"SPAM": None, ("spa".upper() + "AM"[1:]): None}
But now we're getting out of the realm of "detect obvious typos and duplicates" and into a more judgemental area. Once you start prohibiting complex expressions or function calls that happen return duplicate keys, you can no longer be *quite* so sure that this is an error.
Maybe the programmer has some reason for allowing duplicates. Not necessarily a *good* reason, but people write all sorts of ugly, bad, fragile, silly code without the compiler giving them an error. "Consenting adults" applies, and Python generally allows us to shoot ourselves in the foot.
Let's say I write something like this:
with open(path) as f: d = { f.write(a): 'alpha', f.write(b): 'beta', f.write(c): 'gamma', f.write(d): 'delta', }
and purely by my bad luck, it turns out that len(b) and len(c) are equal, so that there are two duplicate keys. Personally, I think this is silly code, but there's no rule against silly code in Python, and maybe I have a (good/bad/stupid) reason for writing this. Maybe I don't care that duplicate keys are over-written.
If we introduce your rule, that's the same as saying "this code is so awful, that we have to ban it... but only *sometimes*, when the lengths happen to be equal, the rest of the time it's fine".
Are you okay with saying that? (Not a rhetorical question, and you are allowed to say "Yes".)
So which is worse?
- most of the time the code works fine, but sometimes it fails, raising an exception and leaving the file in a half-written state; or
- the code always works fine, except that sometimes a duplicate key over-writes the previous value, which I may not even care about.
I don't know which is worse. If there is no clear answer that is obviously right, then the status quo wins, even if the status quo is less than perfect.
Even if the status quo is *awful*, it may be that all the alternatives are just as bad.
I think a compile-time check is just enough to give people a false sense of security, and so is *worse* than what we have now. And I'm right on the fence regarding a run-time check.
So to me, my judgement is: with no clear winner, the status quo stays.
What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same?
Correct. The errors that I am guessing matter the most are those for which folks copy-paste a key-value pair, where the key is a literal string, intending to change the key, and then forget to change it.
With respect, I think that is a harmful position to take. That leaves the job half-done: the constructor will complain about *some* duplicates, but not all, and worse, which ones it detects may depend on the implementation you happen to be using!
If it is worth complaining about
{0: None, 0: None}
then it must also be worth complaining about:
{0: None, 0.0: None, int(): None, 1-1: None, len(""): None}
etc. Otherwise, I guarantee that somebody will be pulling their hair out as to why Python only sometimes detects duplicate keys. Better to never detect them at all (so you know you have to test for duplicates yourself) than to give a false sense of security.
On Wed, May 4, 2016 at 10:08 PM, Steven D'Aprano steve@pearwood.info wrote:
Let's say I write something like this:
with open(path) as f: d = { f.write(a): 'alpha', f.write(b): 'beta', f.write(c): 'gamma', f.write(d): 'delta', }
and purely by my bad luck, it turns out that len(b) and len(c) are equal, so that there are two duplicate keys. Personally, I think this is silly code, but there's no rule against silly code in Python, and maybe I have a (good/bad/stupid) reason for writing this. Maybe I don't care that duplicate keys are over-written.
If we introduce your rule, that's the same as saying "this code is so awful, that we have to ban it... but only *sometimes*, when the lengths happen to be equal, the rest of the time it's fine".
Are you okay with saying that? (Not a rhetorical question, and you are allowed to say "Yes".)
My response: Yes. I am okay with saying that. Because the question really is: What is this dictionary being used for? Why are you using keys that might, under some circumstances, overwrite each other? The point of a dict is to retrieve values when given keys. How can you do this, if those keys might collide? I've yet to see any real-world code that does this usefully, in a dict display. And the best "well, maybe if" examples have all involved generated code, so it wouldn't be hard to change your codegen to use this instead:
d = dict(( (f.write(a), 'alpha'), (f.write(b), 'beta'), (f.write(c), 'gamma'), (f.write(d), 'delta'), ))
which, of course, will continue to function as expected.
ChrisA
On 05/04/2016 05:08 AM, Steven D'Aprano wrote:
If it is worth complaining about
{0: None, 0: None}
then it must also be worth complaining about:
{0: None, 0.0: None, int(): None, 1-1: None, len(""): None}
etc. Otherwise, I guarantee that somebody will be pulling their hair out as to why Python only sometimes detects duplicate keys. Better to never detect them at all (so you know you have to test for duplicates yourself) than to give a false sense of security.
Agreed. Either fix it all they way, or leave it alone.
-- ~Ethan~
On 04/05/2016 00:51, Steven D'Aprano wrote:
Hi Luigi,
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
[...]
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
[...]
For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo.
As far as I can see, you haven't specified in *detail* what change you wish to propose. It is difficult for me to judge your proposal when I don't know precisely what you are proposing.
Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else?
What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same?
# duplicate literals, forbidden d = {100: 1, 100: 2}
# duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2}
You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display. (Don't worry, I make the same mistake.) But the point is, a "dict literal" (display) can contain keys which are not themselves literals, as above. Those keys can have arbitrarily complex semantics, including side-effects. What do you expect to happen?
My initial reaction to the OP was negative, as in most contexts where keys are added to dictionaries, repeated keys silently overwrite, and consistency is a virtue. However, the original use case (a long dict literal - (sorry, transcribe that into 'dict display' if you wish; to me 'dict literal' is an evocative and comprehensible term and I will continue to use it, not meaning to offend)) is a completely plausible one, and it now seems to me that detecting duplicates is a case where 'practicality beats purity'. I agree with everything in Luigi's post of 30-May 2016 22:29 (my local time, sorry). I would just add that as someone who has used a linter (pylint) occasionally - I am trying to discipline myself to use it regularly - even if you are aware of it, there is still a barrier to its use: linters tend to produce a huge amount of mostly useless guff which you have to search to find the few nuggets that you really need.
Steven, I am sure that this is not your intention, but it feels as if your requests for clarification are nitpicking and attempts to throw dust in our eyes, and avoiding a serious attempt to address the OP. But let me (presuming,if he will forgive me, to speak for the OP) attempt to answer by making a concrete proposal, as a starting point for discussion:
*In a dictionary literal, it should be a syntax error to have two keys which are literal int or literal basestring values and which are equal.
*(I say basestring, meaning that { 'lion' : 'x', u'lion' : 'y' } would be an error. So of course would { 18L : 'x', 0x12 : 'y' } etc.)
From this starting point, bikeshedding can begin: (1) Is there any other kind of literal (i.e. other than int or string) which should be included? (Float and complex? Decimal?? What have I missed?) (2) Should constant int/string expressions which are folded to constants at compile time also be included i.e. would { 2 : 'x', 1+1 : 'y } also be an error? (Suggestion: This is an implementation detail, undefined. It's irrelevant to the practical case.) (3) Should a runtime error be raised instead of a SyntaxError ? (I can't imagine why, but if it turns out to be easier to implement, fine.) (4) Should a warning be raised instead of an error? (5) Should the action be different if not just the keys, but the values also are constant and equal (i.e. a no-effect repeat), e.g. a warning instead of an error? ( I suggest no. It's still likely to be unintentional, i.e. a mistake that the programmer would want to know about.) (6) Are there any other changes to the proposal which would simplify implementation, while still addressing the original use case? (7) If the proposal is accepted, should it appear complete in the next Python release, or should there be a more gradual adoption process? (8) Can/should anything be done to mitigate breaking automatic code generators which do generate duplicate literal keys?
Later: While I was composing the above, 1 more post arrived from Luigi and from Steven. I must reply to two points from Steven:
(1) "When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways"
No, no, no. Computers detect our errors and we would be helpless if they didn't. If this argument were taken to its logical conclusion, you would say that it is the programmer's responsibility to ensure that an entire (10000-line missile guidance?) program is syntactically correct, and all the compiler need do is either say "OK" or "Syntax Error" (or worse, try to run a syntactically incorrect program). Yes, Luigi's colleague could have made many mistakes, but surely it is a good thing for as many of them as is reasonably possible to be detected by the computer? Then we just have to decide what "reasonably possible" is.
(2) "So unless you have a good answer to the question "Why are duplicate keys so special that the dict constructor has to guard against them, when it doesn't guard against all the other failures we have to check for?", I think the status quo should stand. There is one obvious answer: Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them. That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste."
Sure, it is a matter of opinion. I ask: In the OP, what is the probability that (1) the programmer deliberately included a duplicate literal key (2) the programmer made a mistake? I concede that the answer may be different for automatic code generators (although if I wrote one that generated duplicate literal keys, I wouldn't be proud of it).
Best wishes Rob Cliffe
On Wed, May 4, 2016 at 11:54 AM, Rob Cliffe rob.cliffe@btinternet.com wrote:
In a dictionary literal, it should be a syntax error to have two keys which are literal int or literal basestring values and which are equal.
(I say basestring, meaning that { 'lion' : 'x', u'lion' : 'y' } would be an error. So of course would { 18L : 'x', 0x12 : 'y' } etc.)
(7) If the proposal is accepted, should it appear complete in the next
Python release, or should there be a more gradual adoption process?
This is the very soonest it can possibly appear, which means that notions of "long integer" and "basestring" aren't really applicable - they're Python 2 concepts. Text strings and byte strings in Python 3 are completely different:
{'lion': 'x', b'lion': 'y'}
{'lion': 'x', b'lion': 'y'}
and there aren't any short/long integers. So here's my rewording of your proposal:
*** Identical string or numeric literals used as keys in the same dictionary display MAY cause a SyntaxError. ***
Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined. Implementations are welcome to not check for this at all if they wish.
Valid types for this check would be str, bytes, int, float, and complex - there's no way that the same literal could result in unequal values, so these are all safe.
I'm still not sure whether I'm in favour of the proposal or not, but that's how I'd word it.
ChrisA
On Tue, May 3, 2016, at 22:22, Chris Angelico wrote:
Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined.
For *general* constant folding, maybe, but what's the point of having it if it doesn't apply to negative numbers and nonimaginary complex numbers? How about tuple displays consisting of only numeric and string literals, empty tuple displays, and tuple displays which themselves satisfy this definition?
On Wed, May 4, 2016 at 12:44 PM, Random832 random832@fastmail.com wrote:
On Tue, May 3, 2016, at 22:22, Chris Angelico wrote:
Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined.
For *general* constant folding, maybe, but what's the point of having it if it doesn't apply to negative numbers and nonimaginary complex numbers? How about tuple displays consisting of only numeric and string literals, empty tuple displays, and tuple displays which themselves satisfy this definition?
That's why I say implementation-defined. I would generally expect -5 to count as a literal, but if (2+3) or ("a", "b") doesn't, so be it.
Of course, I would expect 'a' and "a" to be seen as identical. Likewise 100 and 0x64. Beyond that, it's basically up to whatever the implementation finds easy; as long as the keys *would be* folded, and are literals, it's permissible to raise the error.
ChrisA
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
Did your colleague really have 200+ items in the dict? No matter, I suppose. The same principle applies.
When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways:
lives_in = { # missed value 'lion': ['Africa', 'America'], # misspelled value 'parrot': ['Eruope'], # misspelled key 'kangeroo': ['Australia'], # invalid key 'kettle': ['Arctic'], # invalid value 'aardvark': 'South Africa', # missed key # oops, forgot 'tiger' altogether }
Where was your colleague's data validation? I'm sorry that your colleague lost a lot of time debugging this failure, but you might have had exactly the same result from any of the above errors.
Unless somebody can demonstrate that "duplicate keys" is a systematic and common failure among Python programmers, I think that it is perfectly reasonable to put the onus on detecting duplicates on the programmer, just like all those other data errors.
The data validation need not be a big burden. In my own code, unless the dict is so small that I can easily see that it is correct with my own eyes, I always follow it with an assertion:
assert len(lives_in) == 250
which is a cheap test for at least some duplicate, missed or extra keys. But depending on your use-case, it may be that dict is the wrong data structure to use, and you need something that will validate items as they are entered. Unless your dict is small enough that you can see it is correct by eye, you need some sort of check that your data is valid, that you haven't forgotten keys, or misspelled them.
The dict constructor won't and can't do that for you, so you need to do it youself. Once you're doing that, then it is no extra effort to check for duplicates.
So unless you have a good answer to the question "Why are duplicate keys so special that the dict constructor has to guard against them, when it doesn't guard against all the other failures we have to check for?", I think the status quo should stand.
There is one obvious answer:
Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them.
That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste.
On 04.05.2016 03:09, Steven D'Aprano wrote:
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice.
For the implementation, there are two possibilities I can think of:
1. have STORE_MAP run a test for an already existing entry and issue a warning
2. add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal
The first one causes a lot of overhead, esp. for larger mappings such as the ones used by the codecs or other static data tables. Most of the times these are generated anyway, so the warning case would not occur at all, but you'd still have to check for a collision N times.
The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough.
On Wed, May 04, 2016 at 01:56:56PM +0200, M.-A. Lemburg wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice.
For the implementation, there are two possibilities I can think of:
have STORE_MAP run a test for an already existing entry and issue a warning
add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal
The first one causes a lot of overhead, esp. for larger mappings such as the ones used by the codecs or other static data tables. Most of the times these are generated anyway, so the warning case would not occur at all, but you'd still have to check for a collision N times.
The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough.
I'm not sure I understand what you mean by this. Are you suggesting that if I write:
{1: None, 1: None}
Python knows that it got two keys as argument, and so CHECK_MAP can determine that there is a duplicate? (i.e. that 2 keys were arguments, but the finished dict only has length 1). But if I write:
{x: None, x: None}
Python *cannot* tell that there were two keys given (since they are expressions, not constants) and CHECK_MAP can't do anything?
On 04.05.2016 14:15, Steven D'Aprano wrote:
On Wed, May 04, 2016 at 01:56:56PM +0200, M.-A. Lemburg wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice.
For the implementation, there are two possibilities I can think of:
have STORE_MAP run a test for an already existing entry and issue a warning
add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal
The first one causes a lot of overhead, esp. for larger mappings such as the ones used by the codecs or other static data tables. Most of the times these are generated anyway, so the warning case would not occur at all, but you'd still have to check for a collision N times.
The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough.
I'm not sure I understand what you mean by this. Are you suggesting that if I write:
{1: None, 1: None}
Python knows that it got two keys as argument, and so CHECK_MAP can determine that there is a duplicate? (i.e. that 2 keys were arguments, but the finished dict only has length 1). But if I write:
{x: None, x: None}
Python *cannot* tell that there were two keys given (since they are expressions, not constants) and CHECK_MAP can't do anything?
Well, the CHECK_MAP byte code would only check whether the number of assignments you are doing is equal to the number of keys found in the final dict. That's a very coarse test and not perfect. It can also not tell you which key causes the mismatch.
On the plus side, the extra check is a simply size check and doesn't have to be applied per added mapping.
On Wed, May 04, 2016 at 02:37:56PM +0200, M.-A. Lemburg wrote:
On 04.05.2016 14:15, Steven D'Aprano wrote:
On Wed, May 04, 2016 at 01:56:56PM +0200, M.-A. Lemburg wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice.
I agree with you that an error is unjustified but a warning might be.
For the implementation, there are two possibilities I can think of:
- have STORE_MAP run a test for an already existing entry and issue a warning
You have suggested that's too expensive, so I'm happy to rule that out.
- add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal
[...]
The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough.
I'm not sure I understand what you mean by this. Are you suggesting that if I write:
{1: None, 1: None}
Python knows that it got two keys as argument, and so CHECK_MAP can determine that there is a duplicate? (i.e. that 2 keys were arguments, but the finished dict only has length 1). But if I write:
{x: None, x: None}
Python *cannot* tell that there were two keys given (since they are expressions, not constants) and CHECK_MAP can't do anything?
Well, the CHECK_MAP byte code would only check whether the number of assignments you are doing is equal to the number of keys found in the final dict. That's a very coarse test and not perfect. It can also not tell you which key causes the mismatch.
On the plus side, the extra check is a simply size check and doesn't have to be applied per added mapping.
Sorry MAL, perhaps I'm a bit slow tonight, but I'm still lost. {x: None, x: None} -- would CHECK_MAP detect that as a duplicate or not?
If the answer is "Yes", then this seems to me to be a cheap way of detecting the existence of at least one duplicate (but not which key was duplicate). That strikes me as good enough, in which case I would support raising a warning.
On 04.05.2016 16:00, Steven D'Aprano wrote:
On Wed, May 04, 2016 at 02:37:56PM +0200, M.-A. Lemburg wrote:
On 04.05.2016 14:15, Steven D'Aprano wrote:
On Wed, May 04, 2016 at 01:56:56PM +0200, M.-A. Lemburg wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice.
I agree with you that an error is unjustified but a warning might be.
For the implementation, there are two possibilities I can think of:
- have STORE_MAP run a test for an already existing entry and issue a warning
You have suggested that's too expensive, so I'm happy to rule that out.
- add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal
[...]
The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough.
I'm not sure I understand what you mean by this. Are you suggesting that if I write:
{1: None, 1: None}
Python knows that it got two keys as argument, and so CHECK_MAP can determine that there is a duplicate? (i.e. that 2 keys were arguments, but the finished dict only has length 1). But if I write:
{x: None, x: None}
Python *cannot* tell that there were two keys given (since they are expressions, not constants) and CHECK_MAP can't do anything?
Well, the CHECK_MAP byte code would only check whether the number of assignments you are doing is equal to the number of keys found in the final dict. That's a very coarse test and not perfect. It can also not tell you which key causes the mismatch.
On the plus side, the extra check is a simply size check and doesn't have to be applied per added mapping.
Sorry MAL, perhaps I'm a bit slow tonight, but I'm still lost. {x: None, x: None} -- would CHECK_MAP detect that as a duplicate or not?
If the answer is "Yes", then this seems to me to be a cheap way of detecting the existence of at least one duplicate (but not which key was duplicate). That strikes me as good enough, in which case I would support raising a warning.
Yes:
dis.dis('{x:None, x:None}')
1 0 BUILD_MAP 2 3 LOAD_CONST 0 (None) 6 LOAD_NAME 0 (x) 9 STORE_MAP 10 LOAD_CONST 0 (None) 13 LOAD_NAME 0 (x) 16 STORE_MAP 17 RETURN_VALUE
d = {x:None, x:None} len(d)
1
Since 1 != 2 -> issue warning
We'd just have to tell the CHECK_MAP byte code to check for len(map) == 2. The byte code compiler will be able to determine this and add the correct byte code entry.
After following the long thread, I'm solidly -1, at least on an error. A warning wouldn't be so bad, but would still feel like a nuisance not a feature. Such a warning should belong to linters, not to the language itself. Use those to automatically catch statically detectible probably-bad patterns like duplicate literals as keys of dictionary displays.
The thing that nudged me to -1 is code something like the following, which I've actually written for completely valid use-cases:
def code_generate(data_source, output=open('data.py','w')): """Write data to loadable Python module
data_source may contain multiple occurrences of each key.
Later occurrences are "better" than earlier ones
""" print("data = {", file=output) for key, val in data_source: print(" %s: %s," % (key, val), file=output) print("}", file=output)
I really don't want the Python interpreter to complain about my perfectly valid 'data.py' module that I auto-generated knowing full well it might have duplicate literal keys. Yes, of course I could write a more complex function that explicitly pruned keys if they were duplicates before writing to 'data.py'. But I do not want to have to think about that when a well defined semantics of Python since version 0.9 does not require that extra work.
On Wed, May 4, 2016 at 4:56 AM, M.-A. Lemburg mal@egenix.com wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice.
On Thu, May 5, 2016 at 12:36 AM, David Mertz mertz@gnosis.cx wrote:
The thing that nudged me to -1 is code something like the following, which I've actually written for completely valid use-cases:
def code_generate(data_source, output=open('data.py','w')): """Write data to loadable Python module
data_source may contain multiple occurrences of each key. Later occurrences are "better" than earlier ones """ print("data = {", file=output) for key, val in data_source: print(" %s: %s," % (key, val), file=output) print("}", file=output)
I really don't want the Python interpreter to complain about my perfectly valid 'data.py' module that I auto-generated knowing full well it might have duplicate literal keys. Yes, of course I could write a more complex function that explicitly pruned keys if they were duplicates before writing to 'data.py'.
Or, much more simply, you could just dump it straight out like this:
print("data = dict(%r)" % list(data_source), file=output)
If the dict display starts rejecting duplicates, these kinds of code generators should be easily tweakable. The only risk would be if duplicates can *theoretically* happen, but are incredibly rare, such that the code runs for years without anyone noticing.
This is a way of catching compile-time-detectable errors. Python has generally been pretty helpful with that kind of thing, even to the point of special-casing print-used-as-a-statement to tell you to put parens around it. While I fully understand that there are reasons against doing this (including "that's the job of linters, not the core language"), I do _not_ understand the apparent attitude that it's a fundamentally bad thing to do. Most cases where a dict display is used, it's intended to have unique keys.
ChrisA
On Wed, May 4, 2016 at 10:59 AM Chris Angelico rosuav@gmail.com wrote:
On Thu, May 5, 2016 at 12:36 AM, David Mertz mertz@gnosis.cx wrote:
The thing that nudged me to -1 is code something like the following,
which
I've actually written for completely valid use-cases: def code_generate(data_source, output=open('data.py','w')):
This is a way of catching compile-time-detectable errors. Python has generally been pretty helpful with that kind of thing, even to the point of special-casing print-used-as-a-statement to tell you to put parens around it. While I fully understand that there are reasons against doing this (including "that's the job of linters, not the core language"), I do _not_ understand the apparent attitude that it's a fundamentally bad thing to do. Most cases where a dict display is used, it's intended to have unique keys.
You accidentally snipped out Mertz' statement that he didn't want the interpreter to complain about "probably-bad" patterns. This implies the distinction that linters should complain about probably-bad things while the interpreter only complains about always-bad things.
I guess you could argue that warnings exist so that the interpreter can complain about probably-bad things as well. If that's what you'd like, we shouldn't call it a "compile-time-detectable error" but instead, uh, ... hard to find the right vocabulary here.
All this discussion is certainly very educational for the participants, but I want to remind everyone that this is not going to happen without a PEP.
Personally I'm -0 on the whole business. Some quick notes:
- In ABC, this was considered, and the rule was that {k1: v1; k2: v2} was allowed to have k1==k2 as long as v1==v2. Otherwise it was an error. (But ABC did not have comprehensions.)
- Franklin? brought up sets and suggested the same thing should happen there. I'm not so sure though. When teaching Python it's nice to be able to demonstrate the fundamental property of sets by showing it:
{1, 1, 1}
{1}
On Wed, May 4, 2016 at 11:19 AM Guido van Rossum guido@python.org wrote:
- In ABC, this was considered, and the rule was that {k1: v1; k2: v2} was
allowed to have k1==k2 as long as v1==v2. Otherwise it was an error. (But ABC did not have comprehensions.)
Was that enforced by comparing the expressions or the resulting values?
I honestly don't recall and I assume the docs are vague, but "comparing expressions" was not a thing in ABC. I'm sure they compared the values. (This is why I showed it with variables instead of constants.)
On Wed, May 4, 2016 at 8:29 AM, Michael Selik michael.selik@gmail.com wrote:
On Wed, May 4, 2016 at 11:19 AM Guido van Rossum guido@python.org wrote:
- In ABC, this was considered, and the rule was that {k1: v1; k2: v2} was
allowed to have k1==k2 as long as v1==v2. Otherwise it was an error. (But ABC did not have comprehensions.)
Was that enforced by comparing the expressions or the resulting values?
On 5/3/2016 9:09 PM, Steven D'Aprano wrote:
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
The original problem description:
lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here }
The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior.
Did your colleague really have 200+ items in the dict? No matter, I suppose. The same principle applies.
When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways:
I often use large literal dicts, with literal string keys. There are many times (a couple times a month) I add a duplicate key because I am prone to making mistakes like that. Now, I am lucky because I purchased an IDE that highlights those duplicates as errors. If I had to rely on a separate linter; with my duplicate keys hiding in a long list of other trivial formatting mistakes, then the time cost of reviewing the linting output on every code change is greater than the time cost of simply debugging the program when I see it inevitably malfunction. Instead, I would use data validation in my own code.
The data validation need not be a big burden. In my own code, unless the dict is so small that I can easily see that it is correct with my own eyes, I always follow it with an assertion:
assert len(lives_in) == 250
I would not use this check, it is error prone, given the number of times I update the dicts during development. Either I will loose count, or the line math is wrong (because of extra spaces), or I would count the duplicate. A better consistency check requires I construct all literal dicts like:
def add_item(dict_, key, value): if key in dict_: raise Exception() dict_[key] = value
lives_in = {} add_item(lives_in, 'lion', ['Africa', 'America']) add_item(lives_in, 'parrot', ['Europe']) # ... 100+ more rows here add_item(lives_in, 'lion', ['Europe']) # ... 100+ more rows here
Which is inelegant, but necessary to minimize the programmer time wasted trying to detect these duplicate keys manually.
On Thu, May 5, 2016 at 1:27 AM, Kyle Lahnakoski klahnakoski@mozilla.com wrote:
Now, I am lucky because I purchased an IDE that highlights those duplicates as errors. If I had to rely on a separate linter...
For the record, an IDE with the capability to highlight this kind of error *is* a linter. When language discussions result in "let a linter pick that up", that includes smart editors.
ChrisA
On 5/4/2016 11:45 AM, Chris Angelico wrote:
On Thu, May 5, 2016 at 1:27 AM, Kyle Lahnakoski klahnakoski@mozilla.com wrote:
Now, I am lucky because I purchased an IDE that highlights those duplicates as errors. If I had to rely on a separate linter...
For the record, an IDE with the capability to highlight this kind of error *is* a linter. When language discussions result in "let a linter pick that up", that includes smart editors.
Thinking about examples [1], it turns out I was mistaken. My IDE does NOT have an *integrated* linter that detects duplicate literal keys! My IDE does detect duplicate literal keys for Javascript, which I obviously got confused.
Oh dear! That means my dicts are full of duplicates!
https://github.com/klahnakoski/TestLog-ETL/blob/d7e8a99f7532a0551c98dfdccefe...