dictionary constructor should not allow duplicate keys

Hello and sorry if this is an old and over-discussed topic. I could not find such discussions here. This was discussed and rejected at https://bugs.python.org/issue16385, but I am not convinced that the discussion presented all arguments properly. I tried reopening the bug at http://bugs.python.org/issue26910, but was told I should discuss it here instead. The original problem description: lives_in = { 'lion': ['Africa', 'America'], 'parrot': ['Europe'], #... 100+ more rows here 'lion': ['Europe'], #... 100+ more rows here } The above constructor overwrites the first 'lion' entry silently, often causing unexpected behavior. These are the arguments presented in favor of the rejection, followed by my rebuttals. 1. "An error is out of the question for compatibility reasons". No real rebuttal here, except I wonder if exceptions are ever made and under what circumstances. I should point out that a warning may also create incompatibilities. 2. "There are ways to rewrite this as a loop on a list". Yes of course, but the entire point of the dictionary literal is to offer a concise and convenient notation for entering a dictionary as program text. 3. "Being able to re-write keys is fundamental to Python dicts and why they can be used for Python's mutable namespaces". This is fine but it applies to the data structure, not the literal constructor. 4. "A code generator could depend on being able to write duplicate keys without having to go back and erase previous output". Yes, but it seems wrong to facilitate automated code generation at the expense of human code writing. For hand-written code, I claim that in most (all?) cases it would be preferable to have the compiler detect key duplications. It is easier for an automated code generator to check for duplications than it is for a human. 5. "There is already pylint". True, but it forces an additional step. For context, someone ran into this problem in my team at Google (we fixed it using pylint). I haven't seen any valid reason (in the bug or elsewhere) in favor of these constructor semantics. From the discussions I have seen, it seems just an oversight in the implementation/specification of dictionary literals. I'd be happy to hear stronger reasoning in favor of the status quo. (additional search keywords: dict constructor, dict literal)

With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar} It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons. So even if you have good points, it's not possible to do this. Le 02/05/2016 23:36, Luigi Semenzato a écrit :

With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar} It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons. So even if you have good points, it's not possible to do this. Le 02/05/2016 23:36, Luigi Semenzato a écrit :

I am still not seeing good objections to not adding at least a warning.
and similarly:
Probably I didn't make it sufficiently clear in my OP, but this is only for duplicate literal keys in the same constructor, which would cover the vast majority of user errors. A well-known precedent is the divide-by-zero warning in C. The GNU compiler produces a warning when, after constant folding, an expression contains a division by zero. For instance: test.c:9:12: warning: division by zero [-Wdiv-by-zero] int z = 1/0; and this: test.c:9:12: warning: division by zero [-Wdiv-by-zero] int z = 1/(1 - 1); but this code doesn't produce a warning: const int x = 0; int z = 1/x; For the dict constructor, I don't think that the constant folding is even particularly useful, although it wouldn't hurt. Implementation-wise, checking for duplicate literals while parsing is just a matter of maintaining a hash table for that scope. This is already done for identifiers, and probably heavily optimized, so duplicate detection wouldn't add much code or be particularly expensive.

On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato <luigi@semenzato.com> wrote:
Do you feel that "prefer status quo" is not strong reasoning? http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm... Your word choice of "[no] valid reason" is interesting. In the bug tracker discussion, folks argue that the problem is easily solved with code style standards and static analyzers. I thought that was valid. Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key. Creating a special case for the parser to enforce uniqueness of number and string literals as keys seems more trouble than its worth.

On 05/03/2016 12:21 PM, Michael Selik wrote:
It is not strong enough to prevent every good idea, or Python would be a static language (as in, no more changes except maybe bug fixes).
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
Creating a special case for the parser to enforce uniqueness of number and string literals as keys
I'm pretty sure the OP would be happy with uniqueness of keys, whether those keys were string literals, numeric literals, or function objects.
seems more trouble than its worth.
Maybe. That is what we are discussing. -- ~Ethan~

On Tue, May 3, 2016 at 4:00 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Of course. I'm no more saying "stop change" than you are saying "all change is good". Luigi, as with many who have ideas for new features, appears to be trying to shift the burden of proof. I think it's well established that the burden of proof (proving it's a good idea) rests on the one advocating change.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
How would you handle an expression that evaluates differently for each call? For example: {random(): 0, random(): 1}
seems more trouble than its worth.
Maybe. That is what we are discussing.
Indeed. Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often Pylint has caught this mistake.

On 05/03/2016 01:43 PM, Michael Selik wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
I'm willing to say that should be done with an existing dict, not in a literal.
Easy: Don't Do That. ;)
Well, when this happened to me I spent a loooonnnnnngggg time figuring out what the problem is. One just doesn't expect duplicate keys to not raise: --> dict(one=1, one='uno') File "<stdin>", line 1 SyntaxError: keyword argument repeated So I guess the current advice is: Don't use dict literals, use dict() instead. And that just seems silly. ;) -- ~Ethan~

On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
But are you willing to say that the compiler should enforce that stylistic judgement?
I see your wink, so I presume you're not actually suggesting that the compiler (1) prohibit all function calls in dict displays, or (2) hard-code the function name "random" in a black list. So what are you actually suggesting? Michael is raising a good point here. If you don't like random as an example, how about: d = {spam(a): 'a', spam(b): 'BB', spam(c): 'Ccc'} I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do? It's one thing to say "duplicate keys should be prohibited", and another to come up with a detailed explanation of what precisely should happen.
Okay, but how often does this happen? Daily? Weekly? Once per career? What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?
Huh, I had completely forgotten that duplicate keyword arguments raise a syntax error. I thought you got a runtime TypeError. There's a difference though. Keyword argument *names* must be identifiers, not expressions, and duplicates can be recognised by the compiler at compile-time: py> def spam(**kw): pass ... py> spam(eggs=print("side-effect")) side-effect py> spam(eggs=print("side-effect"), eggs=print('another side-effect')) File "<stdin>", line 1 SyntaxError: keyword argument repeated Duplicate keys are not, and in general can't be: py> d = {'eggs': print("side-effect"), ... 'eggs': print('another side-effect')} side-effect another side-effect py> d {'eggs': None} If you think of the dict constructor as something like: for key,item in initial_values: self[key] = item then the current behaviour is perfectly valid, and the advice is, if you don't want duplicate keys, don't use them in the first place. If you think of it as: for key,item in initial_values: if key in self: raise TypeError('duplicate key') else: self[key] = item then you have to deal with the fact that you might only notice a duplicate after mutating your dict, which may include side-effects. -- Steve

On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
Indeed.
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined). Also, any dict display that is able to be used for dict creation at compile time (thereby avoiding the run-time logic) should have the same checks and raise the same error if duplicate keys are found (I imagine both run-time and compile-time dict creation from dict displays would use the same underlying function).
It's one thing to say "duplicate keys should be prohibited", and another to come up with a detailed explanation of what precisely should happen.
Hopefully my explanation is detail enough.
Once a year, maybe. The experience is painful enough to remember when the subject arises, but not painful enough to remember to be careful. ;)
What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?
Longer than an hour, shorter than a day.
I am largely unconcerned by the run-time/compile-time distinction in this case -- whenever it happens, check that the incoming key doesn't already exist.
Not sure what you mean by "mutating your dict" -- we're still talking about initial "dict display" to "dict" conversion, yes? -- ~Ethan~

On Tue, May 3, 2016 at 6:40 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
I agree with Ethan that this is the correct way of looking at this problem, rather than my original, confusing presentation. Duplicate keys in a dict display should be an error, period. In some cases the error can be detected at parse time, which is nicer. In other cases it can be detected only at run time, but it should always be an error. Also, Steven D'Aprano <steve@pearwood.info> wrote:
Thank you, I will gladly take the "reasonable answer" qualification offered here. And I agree that often these issues build down to aesthetics. But it's hard for me to see how the current semantics support the Zen of Python. I wouldn't presume that all original design decisions were infallible, and I still haven't heard a single good argument in favor of *allowing* duplicate keys.

On Tue, May 03, 2016 at 06:58:43PM -0700, Luigi Semenzato wrote:
I still haven't heard a single good argument in favor of *allowing* duplicate keys.
Because duplicate keys are not necessarily an error. Maybe you don't care if there are duplicates. Maybe you expect duplicates, and want the last one seen to win. Sometimes experienced programmers forget what it is like to be a beginning programmer. One of the most useful things a beginner can do is experiment at the interactive interpreter. The more things the language prohibits, the more errors the beginner gets, the less they learn, and the more intimidating the language is. Duplicate keys are not necessarily an error. It just means one value or the other is dropped. How do dict displays construct the dict? Do they add keys from left to right, or right to left? I can read the source, if I can find it, and locate the right section of the right file, and if I can read C. Or I can do: py> {1: "left", 1: "right"} {1: 'right'} and in one line of code in the Python interactive interpreter I've learned more than in probably an hour of digging through C code. To a beginner, being able to do things like this without Big Brother shouting at them "No! That's Wrong! Have An Exception!!!" all the time is literally without price. Who cares that it's a useless thing to do in production code? "Useless" is not an error. Errors, and warnings, should be left for things which really are errors. Even if you think my example doesn't matter, and you think that duplicate keys are always an error, it may be that this case is not bad enough to prohibit. You don't get this check for free. It requires code, and that code has to run, and it will run for every single dict display whether needed or not. Depending on how it is implemented, it might be quite expensive. All those programmers who have carefully inspected their source code to ensure that the data used in the dict is correct, with no duplicates and no misspellings and no missed keys, they have no need of this check. But you would force it on them anyway, because one of your colleagues was sloppy and made a mistake and had difficulty debugging it. Okay, I have sympathy for you and your colleague, but that doesn't mean its okay for everyone to carry the cost of checking for your error. Maybe it is justified. Maybe it isn't. If you are really serious that you think duplicate keys are an error, then to be consistent you should also want to raise an error in the dict constructor, dict(obj, **kwargs). If not, what is your reasoning for why duplicate keys should be allowed in dict(...)? -- Steve

Steven D'Aprano <steve@pearwood.info> writes:
The language reference explicitly says that the order from left to right and duplicate keys are also explicitly supported [1]: If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given. [1] https://docs.python.org/3/reference/expressions.html#dictionary-displays Akira

On 4 May 2016 at 11:40, Ethan Furman <ethan@stoneleaf.us> wrote:
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be. The key point is that BUILD_MAP already has its own PyDict_SetItem() loop in ceval.c (iterating over stack entries), and hence doesn't rely on the "dict_common_update" code shared by dict.update and the dict constructor(s). As such, the technical change being requested here (at least for CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check prior to each key insertion, and: 1. Report a DeprecationWarning in 3.6 2. Report a [RuntimeError? ValueError?] in 3.7+ For code generators emitting dict displays, running their content through OrderedDict (or an equivalent if the code generator is written in something other than Python) prior to writing it out will be sufficient to eliminate duplicates. Since PyDict_Contains() is O(1), this shouldn't introduce a major slowdown in dict display evaluation (although the impact should still be measued prior to making any changes). All-in-all, the concept gets a +0 from me, although to allow for implementations that don't share CPython's distinction between dict displays and the normal dict constructor, any such behaviour (if introduced) should likely be flagged as an implementation detail that may vary across interpreters (as I would have been -1 if BUILD_MAP's implementation wasn't already independent of dict_common_update). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, May 4, 2016 at 12:04 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
To be clear, this implementation would potentially raise an error in the random example -- ``{random(): 0, random(): 1}`` -- correct? If so, then code smell or not, I think that's too much breakage. We could restrict the PyDict_Contains() call to only enforcing uniqueness of string or numeric literals, but then we'd still have some oddball situations: def a(): return 'a' {a(): 0, 'a': 1}

On 05/03/2016 09:20 PM, Michael Selik wrote:
Either fix it all the way or don't bother. The rule "duplicates are allowed" or "duplicates are not allowed" is much simpler and easier to remember than "duplicates are allowed except when..." or, contrariwise, "duplicates are not allowed unless ...". -- ~Ethan~

On 04/05/2016 07:35, Ethan Furman wrote:
I disagree, on the grounds that 'practicality beats purity' (sorry to repeat myself). Here's a thought. We are rightly concerned with changes breaking existing code. The proposed change could lead to some existing code being *repaired*, by pointing out a mistake that the author doesn't know is there. Rob Cliffe

On 4 May 2016 at 10:05, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names: colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", } On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case. So I think it's important for backward compatibility that this change be *only* for the case of literal values as keys (which is what the OP proposed, of course). And given that it needs to just be for literal values as keys, I personally think that makes it a case of "Special cases aren't special enough to break the rules", so I'm -0 on the proposal (I don't think it's a *bad* thing, in the constants-only form, just that it's not worth the effort - but if someone else makes the effort, I won't object). Paul

On 05/04/2016 02:48 AM, Paul Moore wrote:
Actually, that code is setting BLUE = "Light Blue". ;)
I object. Having only /some/ duplicates checked for is overcomplicating without very good reason. Perhaps the way forward is to check for all duplicates, but issue a Warning instead of raising an Exception? To be honest, though, I'm starting to agree with Steven D'Aprano that sufficiently large dicts should be verified by other means, and those other means can also check for duplicates. -- ~Ethan~

On 4 May 2016 at 13:42, Ethan Furman <ethan@stoneleaf.us> wrote:
You missed my point, I think. BLUE and LIGHT_BLUE are "constant" values (set once at the top of the program). In some circumstances, the programmer deliberately wants BLUE and LIGHT_BLUE to be the same value (but not in others). The programmer is happy that a constant BLUE will be decoded as "Light Blue" by the colors mapping in that case. This is IMO perfectly valid code, and having it rejected by an over-zealous "duplicate check" is unacceptable. As far as I am concerned, this example clearly demonstrates that any duplicate check should only be applied to actual constant literal values used as keys.
Warning on perfectly good code is just as obnoxious IMO.
As far as I'm aware, the people who have looked at the implementation details seem to be saying that checking for duplicates on literals only is too hard to be practical, and that's the only variation I'd be willing to accept, so I agree - do duplicate checking by other means (assertions in the code being the obvious option). Paul

On 05/04/2016 07:54 AM, Paul Moore wrote:
On 4 May 2016 at 13:42, Ethan Furman wrote:
On 05/04/2016 02:48 AM, Paul Moore wrote:
No, I got your point, and I agree. But one of us is confused about your explanation of that example (it could easily be me). Here's how I think it works: RED = 1 BLUE = 2 PINK = 3 LIGHT_BLUE = 4 DARK_BLUE = 5 # oops, light blue not supported on this machine, just use blue LIGHT_BLUE = 2
colors {1: 'Red', 2: 'Light Blue', 3: 'Pink', 5: 'Dark Blue'}
So my issue was not with your point, but with your explanation of the example of your point. -- ~Ethan~

On 4 May 2016 at 17:06, Ethan Furman <ethan@stoneleaf.us> wrote:
Ah, sorry. I hadn't checked and wasn't sure myself whether the first pairing or the last took priority. I tried to be vague enough that it didn't matter that I wasn't sure (my theoretical programmer who relied on this behaviour clearly *would* be sure :-)) but I obviously failed. The moral of the story is to check my examples before posting :-) Paul

2016-05-04 6:03 GMT+02:00, Nick Coghlan <ncoghlan@gmail.com>:
[...]
1. Report a DeprecationWarning in 3.6 2. Report a [RuntimeError? ValueError?] in 3.7+
But what about SyntaxError instead of RuntimeError? Could it be easily done too? PS. Probably nobody is interested in my tests with unicode literals which are "NFKC equivalent" but it help me to understand how things are done behind the scene. def f(**kwd): for i in kwd: print(i, kwd[i]) def tst(): f(ij=1, ij=2) # this is syntax error def tst2(): f(**{1:1}) # this is runtime error def tst3(): f(**{'ij':1, 'ij':2}) # this is legal (for me a little unexpected, maybe bug?)

On Wed, May 04, 2016 at 02:03:59PM +1000, Nick Coghlan wrote:
What do you think of MAL's suggestion of adding a CHECK_MAP op-code and raising a warning? I'd be much more comfortable with a warning than an error, because I really don't think that duplicate keys is *necessarily* an error. I think Paul Moore is onto something with his example of BLUE/LIGHT_BLUE, even if he got the order the wrong way around :-) This has been the documented behaviour of dicts since at least 1.5: https://docs.python.org/release/1.5/ref/ref-7.html#HEADING7-35 and it matches the behaviour of both the dict constructor and dict comprehensions. I don't think anyone has made the case for turning it into an error, although I concede that it's perhaps worthy of a warning. (At least, I won't object to it being a warning, provided we can turn it off :-) -- Steve

On Wed, May 04, 2016 at 09:09:22AM -0700, Ethan Furman wrote:
On 05/04/2016 07:20 AM, Steven D'Aprano wrote:
I was thinking of dict with a list of tuples as argument: py> dict([(1, 'a'), (2, 'b'), (1, 'c')]) {1: 'c', 2: 'b'} or an object with a keys method: py> class K: ... def keys(self): ... return [1, 2, 1] ... def __getitem__(self, i): ... return 'abcd'[i] ... py> dict(K()) {1: 'b', 2: 'c'} -- Steve

It's all about typing and reading. Suppose Python didn't have a special syntax for dict displays. Then folks would likely write this: dict(( (1, 11), (2, 22), ... )) but then suppose someone says: "hey, folks do this a lot, let's make it more convenient" and introduce a curly braces notation, and suppose it allows overwriting, like it does now. The new notation exists *exclusively* to make it easier to type and read dictionaries in the code---by a human. Except that with these semantics it makes it impossible to detect duplicate key errors, which the vast majority of users would prefer to notice (a bold and unproven statement, but a reasonable guess). What's worse, people don't realize the problem until they have many large dict displays, and many bugs. So they feel annoyed and betrayed. Suppose instead that the new notation does NOT allow overwriting. If folks rely on the overwrite semantics, in either human-written or machine-generated code, they will immediately notice the problem and use dict(), with very little inconvenience to code generators. (And some to the humans, but remember, they are a tiny minority :) The strict semantics are maybe marginally harder to explain, and maybe marginally harder to implement, but neither should be a show-stopper. However, that's not where we are. There is the legacy code issue, and the legacy mindset. I don't know how to evaluate these issues, and I don't even know how much experience the community has with backward-incompatible changes. So, while I am fairly positive that it would have been a better choice to start with, I have no idea whether, when, and how this proposal should have any effect. P.S. The issue of detecting duplicate keys at parse time is completely separate and it was my mistake to focus on it first. P.P.S. The issue of whether a set display should allow duplicate elements is less important, since allowing them does not have destructive consequences.

On Wed, May 4, 2016, 1:37 PM Luigi Semenzato <luigi@semenzato.com> wrote:
I'm part of the minority(?) that doesn't care. I hope that helps adjust your odds ratio for that guess. What's worse, people don't
Even a tiny minority of the Python community might be thousands of people.
Well, there's that transition from 2 to 3. You do that a few times and you get a reputation for instability. I like to imagine a hidden line in the Zen of Python. Between the title and "Beautiful is better..." I mentally insert, "Backwards compatibility is important." How important, well, that's why it's Zen and not Rules. So, while I am fairly positive that it

On May 4, 2016 10:59 AM, "Michael Selik" <michael.selik@gmail.com> wrote:
your odds ratio for that guess. I'm also part of that group (I question whether it is minority) who wishes to allow duplicate keys at the syntax level, but to let linters complain. While *most* duplicate literal keys in dict displays probably are bugs, the legitimate and intended uses are not rare enough to prohibit them altogether.

On 5/4/2016 12:03 AM, Nick Coghlan wrote:
Changing only BUILD_MAP would invalidate current code equivalences and currently sensible optimizations. A toy example:
Sensible and comprehensible code transformation rule are important. Currently, the following rather trivial optimization of the above works.
The special rule for dict displays would invalidate this. In my post yesterday in response to Luigi (where I began 'The thread...'), I gave 4 equivalent other ways to initialize a dict using a Python loop (include a dict comprehension). Using a dict display amounts to un-rolling any of the loops and replacing the Python loop with the C loop buried in ceval. Changing the operation of just that loop would break the current equivalence. I suspect that the proposed change would introduce more bugs than it exposes. -- Terry Jan Reedy

On Wed, May 4, 2016 at 10:40 PM Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Certainly was. But it occurs relatively rarely and can be solved with linters and by stylistic changes like alphabetizing dict literals by key. Can anyone produce a single real-life use case where repeated literal
I thought Paul Moore gave a good example. You can see him and Ethan Furman discussing it in a fork of this thread. There were a few code-generator examples. Were these not realistic enough?

On Thu, May 05, 2016 at 03:38:05AM +0100, Rob Cliffe wrote:
Here I'm throwing down the gauntlet
You can throw down whatever you like, but Guido has pronounced that this change is not going to happen without a PEP. So unless somebody is willing to write a PEP, further discussion is a waste of time. -- Steve

On 5 May 2016 at 03:38, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Not literal keys no. If the proposal is *solely* that duplicate literals are detected and objected to (I reserve judgement for now on the question of warning vs error) then I don't think there will be significant code breakage. But even knowing that, the next steps are: 1. Produce a PEP (Guido has said this is a requirement) 2. Explain how this would be implemented in that PEP - because the only implementation techniques so far mentioned in this thread fail to restrict the check to literals. 3. Manage discussion on said PEP. I'll await a PEP before contributing any further to the discussion. Paul

Late, but here are a few more notes: 1. The gnumpy package (for Py2) used expressions in a `set` constructor which could evaluate to duplicates. It isn't hard to imagine a similar trick with `dict`, though I don't know if it'd be as useful. From gnumpy.py: ``` _floatTypes = set((types.FloatType, numpy.float64, numpy.float32, getattr(numpy, 'float128', numpy.float64), getattr(numpy, 'float96', numpy.float64))) ``` Refactored a bit: ``` _floatTypes = {float, numpy.float64, numpy.float32, getattr(numpy, 'float128', float), getattr(numpy, 'float96', float)} ``` 2. Even if the warning/error is compiler-only, `eval` and `exec` would have the same warning/error, and they are NOT always evaluating dev-written source code. (Though you can argue that `exec` and `eval` on strings are where you should be on high alert with regard to data validation.) On the other hand, a linter is only run on manually-written code. 3. A compile-time check doesn't necessarily give a false sense of security. It just needs the proper wording to indicate that it's a compile-time check. "Warning: Duplicate (constant|literal|string literal) keys detected."

On Tue, May 3, 2016, at 20:23, Steven D'Aprano wrote:
How about prohibit it anyway (on the grounds of the two key expressions being identical - no comment on spam(a) vs spam(b)), and if the user *really* wants it they'll do {random(): x for x in (0, 1)} [or {other: stuff, **{that}}]

On Tue, May 3, 2016 at 1:43 PM, Michael Selik <michael.selik@gmail.com> wrote:
Correct. In this particular case I think that the change is better. It is simple, helpful, and doesn't seem to have huge implications (if any) in terms of documentation.
Correct.
Which gives me one more opportunity to present this case: A dictionary literal is a convenient input format for structured data, and is used as such in many cases, for instance instead of equivalent JSON, which requires an additional module, an additional parsing step, and learning an additional syntax (a lot of Python programmers don't know JSON). The convenience, however, is hampered by the duplicate key issue. Some of the strong points of Python are: 1. easy to learn; 2. does the right thing in most cases. It just seems to me that the right thing here would be to signal duplicate literal keys.
Yes, this is absolutely true, but probably very rare, and arguably not a recommendable coding practice, therefore IMO it offers very little support for the status quo.
We discussed this already. My request only applies to keys that are literals. (He's talking about function objects, not function calls.)
Pylint catches this mistake fine. But many Python programmers don't know about pylint either. For instance, a bunch of hardware engineers that wrote some python as a driver for a circuit simulator. Anyway, thank you for making these points.

2016-05-03 21:21 GMT+02:00, Michael Selik <michael.selik@gmail.com>: [...]
Just for curiosity: keys could be valid identifier, could be different (at least in one POV) and still cause Syntax error as keyword argument repeated. {'ij':1, 'ij':2} # this is dict with 2 keys and 2 different values dict(ij=1, ij=2) # SyntaxError: keyword argument repeated It is because parser is doing NFKC normalization for non-ascii identifiers - see https://www.python.org/dev/peps/pep-3131/ ('\u0069\u006a' == 'ij' != 'ij' == '\u0133')

On May 3, 2016 3:21 PM, "Michael Selik" <michael.selik@gmail.com> wrote:
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato <luigi@semenzato.com>
wrote:
http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
Your word choice of "[no] valid reason" is interesting. In the bug
Literal dicts syntactically may have any expression as the key. Creating a special case for the parser to enforce uniqueness of number and string
tracker discussion, folks argue that the problem is easily solved with code style standards and static analyzers. I thought that was valid. "X is detectable", whether by tools, or by magic imps yelling at you when you do it, is not an argument for X. When asking for a useful purpose for the current semantics, "prefer the status quo" is not strong enough. "Errors should never pass silently," and there is an argument on the table that duplicate spelled-out keys are probably written in error. He isn't shifting the burden, he's putting the ball in your court. literals as keys seems more trouble than its worth. That is an argument. Literal keys, or a single name repeated in the key list, might be an error, but you can't say the same about expressions. If you only error on literal keys, then there is an inconsistency between literals and expressions. By the way, unless [[the name is guaranteed not to be rebound in a closure called within the dict display]], multiple items with the same variable name as key isn't necessarily an error: def f(): x=1 def g(): nonlocal x x += 1 return x return {x: g(), x: g(), x: g()} print(f()) # {2: 2, 3: 3, 4: 4} "Don't do that" doesn't say what the compiler behavior should be: should it analyze for nonlocal closures, or should it only check literals?

Hi Luigi, On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote: [...]
As far as I can see, you haven't specified in *detail* what change you wish to propose. It is difficult for me to judge your proposal when I don't know precisely what you are proposing. Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else? What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same? # duplicate literals, forbidden d = {100: 1, 100: 2} # duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2} You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display. (Don't worry, I make the same mistake.) But the point is, a "dict literal" (display) can contain keys which are not themselves literals, as above. Those keys can have arbitrarily complex semantics, including side-effects. What do you expect to happen? -- Steve

Hi, On Tue, May 3, 2016 at 4:51 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Yes, and I apologize, because on further reflection I don't even know what is possible within the model of the Python interpreter.
Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else?
Is there such a thing as a SyntaxWarning? From my perspective it would be fine to make it a SyntaxError, but I am not sure it would be overall a good choice for legacy code (i.e. as an error it might break old code, and I don't know how many other things a new language specification is going to break). It could also be a run-time error, but it might be nicer to detect it earlier. Maybe both.
Correct. The errors that I am guessing matter the most are those for which folks copy-paste a key-value pair, where the key is a literal string, intending to change the key, and then forget to change it.
# duplicate literals, forbidden d = {100: 1, 100: 2}
Correct. Possibly caught during parsing.
# duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2}
For sure allowed during parsing, possibly caught at run-time.
You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display.
Yes sorry, bad use of the term. I meant the curly-brace constructor---and only that.
I'd mainly like to catch the "obvious" duplicates, just like the GNU C compiler catches the "obvious" divisions by zero. And by that I mean occurrences of duplicate string literals as keys. For instance: At parse time: {"foo": 333, "foo": 444} # forbidden---must catch {"foo": 333, "f" + "oo": 444} # OK to miss but also OK to catch {"foo": 333, function_returning_foo(): 444} # not caught (i.e. no miracles expected) At run time, I still slightly suspect that it may be more useful to prohibit duplicate keys in the same constructor than it is to allow them, but I don't have a clear opinion, because of both legacy code and other possible semantic issues. Thanks!

On 5/3/2016 8:46 PM, Luigi Semenzato wrote:
On Tue, May 3, 2016 at 4:51 PM, Steven D'Aprano <steve@pearwood.info> wrote:
The thread seems to partly be based on a confusion between literals and displays, which led to dict displays being called dict literals. A literal defines an int or string at the initial lexing or tokenizing phase of the parser. A display, like comprehensions, abbreviates runtime code for creating a composite collection object.
The above is equivalent to and an abbreviation of d = {}; d[100] = 1; d[100] = 2 A bit silly, but quite legal. Every value replacement necessarily involves a 'duplicate key'. Both of the above are also equivalent to d = {a:b for a,b in ((100,1), (100,2))} This is also equivalent to d = {} for a, b in ((100,1), (100,2)): d[a] = b which is close to the python version of how the comprehension would implemented in python. Here is a 5th version, which I expect is close to the actual CPython code for a dict display. tem = [100, 1, 100, 2] # as in the top of the virtual machine stack n = 4 # number of items in display = 2 * number of pairs d = {} for i in range(-n, 0, 2): d[tem[i]] = tem[i+1] We have 5 equivalent snippets, each of which could have the same bug. Why single out just one to raise an exception? My guess is that it is partly from mistakenly thinking of the one as a like a lexer literal rather than as runtime code. -- Terry Jan Reedy

On Wed, May 4, 2016 at 1:23 PM, Terry Reedy <tjreedy@udel.edu> wrote:
But the lexer's definition of "literal" is extremely narrow, and doesn't reflect the way people think about them. Most people's idea of a literal is closer to what ast.literal_eval can accept: Help on function literal_eval in module ast: literal_eval(node_or_string) Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
ast.literal_eval("[1,-2,(3+4j)]") [1, -2, (3+4j)]
Neither -2 nor (3+4j) is technically a literal, and certainly list display isn't a literal, but "literal_eval" is happy to parse them. And that's how people think about them. Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment. Can you show any good code that reuses a key in a single dict display? Wouldn't it at best be code smell? That doesn't necessarily mean that the compiler should stop you from doing it; but I do think that there's a fundamental difference in expectation here. And if the check happens, it wants to be a compile-time check, not a run-time one. ChrisA

On Tue, May 3, 2016, at 23:31, Chris Angelico wrote:
To play devil's advocate for a minute here... Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to f(**{'a': 1}, **{'a': 2}) [where 'a' may not be the only key present and they may not be literal^Wdisplays], yet that is prohibited. Why not remove that check, too? If we can blithely pass the same constant key twice to BUILD_MAP, why *shouldn't* we be able to do the same to CALL_FUNCTION? Any argument against the latter applies equally to the former, as far as I can see.

On Wed, May 4, 2016 at 12:19 AM Random832 <random832@fastmail.com> wrote:
Because keyword arguments can only be valid identifiers, rather than expressions. There's absolutely no reason to use the same identifier as keyword twice, but there could be some oddball reasons to use the same expression as dict key twice. [I know I've said that before, but it's a long thread. My apologies if you'd already read that.]

On Wed, May 4, 2016 at 12:40 AM, Michael Selik <michael.selik@gmail.com> wrote:
The uniqueness rule applies to splatted args, too. def printn(*args, **kwargs): print(*args, **kwargs, sep='\n') printn('hello', 'world', sep='\n') # => TypeError: print() got multiple values for keyword argument 'sep' Not a very useful example, but if you want to override or ignore incoming kwargs... ...... For semantics, I see a few possibilities (either error or warning): 0. Detect duplicate literal keys at compile-time. 1. Detect duplicate literals and hereditarily-literal displays and expressions at compile-time. (This includes keys which are tuple displays containing only hereditarily-literal elements, expressions of literals, and, er, implicit string concatenation like `'hello' " " 'world'`. As keys, these things are guaranteed to be immutable.) 2. Also statically analyze for duplicate variables as keys. (I think if it's a local variable that isn't captured by a closure using `nonlocal`, it is safe to assume that it's constant for that dict display.) 3. Constant-folding. (This leads to the dark path of static analysis shipping with the language.) float('inf'). Analyze at runtime. 1j. Declare that duplicate keys in a dict display causes underdefined behavior, and let interpreters decide how to handle it. -1. Identical expressions are warned against as duplicate keys. (By the way, anything that happens for dicts should also happen for sets.)

On Wed, May 4, 2016 at 2:32 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
As of Python 3.5.1, it is still not allowed: C:\Users\Jonathan>python Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

On 05/03/2016 11:39 PM, Jonathan Goble wrote:
On Wed, May 4, 2016 at 2:32 AM, Ethan Furman wrote:
On 05/03/2016 09:18 PM, Random832 wrote:
Ah, right. The PEP was changed: http://bugs.python.org/issue2292#msg234413 -- ~Ethan~

On Wed, May 04, 2016 at 12:18:59AM -0400, Random832 wrote:
(1) Because status quo wins. In the absence of a really compelling reason to remove that check, we don't have to justify prohibing duplicate keyword args, we just have to note that it is already in place and so we shouldn't change it. (2) Duplicate keys in a dict may have side-effects, which mean they do something. Duplicate keyword arguments don't have side-effects (not in the keys at least). So that's a difference. (3) The analogy between function calls and dicts is not really that strong. The dict constuctor intentionally allows certain duplicates: py> dict({'a': None}, a=2) {'a': 2} for good reason (think of over-riding the values provided in the input dict with keyword arguments), but it's not clear that there is an analogy for function calls. Perhaps there is, and if it is a good enough analogy, we might have to re-think the duplicate kwargs situation. -- Steve

On Wed, May 4, 2016, at 08:22, Steven D'Aprano wrote:
f(a=1, b=2, **{foo(): bar()}) where foo may return 'a' or 'b', though this is the first that I've heard that the side-effect argument only applies to the keys.
but not {**{'a': None}, **{'a': 2}})
foo(**{'a':None}, a=2) seems like a perfect analogy to me.

On 5/3/2016 11:31 PM, Chris Angelico wrote:
This was a preface to the main points that followed, and which were snipped from this followup.
I am obviously aware of that. My main point, not addressed by this response, is that 1. dict displays are currently equivalence to several forms of Python code with more explicit looping, including at least two that are commonly used to initialize dicts; and 2. that changing the semantics of dict displays and only dict displays will break this equivalence. In my response just not to Nick, I was more explicit that breaking this equivalence now, after 20+ years, will break code that depends on the current equivalence and break people's currently true expectations. -- Terry Jan Reedy

On Tue, May 03, 2016 at 05:46:33PM -0700, Luigi Semenzato wrote: [...]
Yes there is.
There are serious limits to what the compiler can detect at compile-time. So unless you have your heart set on a compile-time SyntaxError (or Warning) it might be best to forget all about compile-time detection, ignore the question of "literals", and just focus on run-time TypeError if a duplicate key is detected. Otherwise, you will have the situation where Python only detects *some* duplicate keys, and your colleague will be cursing that Python does such a poor job of detecting duplicates. And it will probably be implementation-dependent, e.g. if your Python compiler does constant folding it might detect {1+1: None, 2: None} as duplicates, but if it doesn't have constant folding (or you have turned it off), it won't. So with implementation-dependent compile-time checks, you can't even guarantee what will be detected. In that case, you might as well use a linter. As far as I am concerned, a compile-time check is next-to-useless. It will be more annoying than useful, since it will give people a false sense of security, while still missing duplicates. So I intend to only discuss a run-time solution, which has the advantage that Python will detect a much wider range of duplicates: not just: {"SPAM": None, "SPAM": None} for example, but also: {"SPAM": None, ("spa".upper() + "AM"[1:]): None} But now we're getting out of the realm of "detect obvious typos and duplicates" and into a more judgemental area. Once you start prohibiting complex expressions or function calls that happen return duplicate keys, you can no longer be *quite* so sure that this is an error. Maybe the programmer has some reason for allowing duplicates. Not necessarily a *good* reason, but people write all sorts of ugly, bad, fragile, silly code without the compiler giving them an error. "Consenting adults" applies, and Python generally allows us to shoot ourselves in the foot. Let's say I write something like this: with open(path) as f: d = { f.write(a): 'alpha', f.write(b): 'beta', f.write(c): 'gamma', f.write(d): 'delta', } and purely by my bad luck, it turns out that len(b) and len(c) are equal, so that there are two duplicate keys. Personally, I think this is silly code, but there's no rule against silly code in Python, and maybe I have a (good/bad/stupid) reason for writing this. Maybe I don't care that duplicate keys are over-written. If we introduce your rule, that's the same as saying "this code is so awful, that we have to ban it... but only *sometimes*, when the lengths happen to be equal, the rest of the time it's fine". Are you okay with saying that? (Not a rhetorical question, and you are allowed to say "Yes".) So which is worse? - most of the time the code works fine, but sometimes it fails, raising an exception and leaving the file in a half-written state; or - the code always works fine, except that sometimes a duplicate key over-writes the previous value, which I may not even care about. I don't know which is worse. If there is no clear answer that is obviously right, then the status quo wins, even if the status quo is less than perfect. Even if the status quo is *awful*, it may be that all the alternatives are just as bad. I think a compile-time check is just enough to give people a false sense of security, and so is *worse* than what we have now. And I'm right on the fence regarding a run-time check. So to me, my judgement is: with no clear winner, the status quo stays.
With respect, I think that is a harmful position to take. That leaves the job half-done: the constructor will complain about *some* duplicates, but not all, and worse, which ones it detects may depend on the implementation you happen to be using! If it is worth complaining about {0: None, 0: None} then it must also be worth complaining about: {0: None, 0.0: None, int(): None, 1-1: None, len(""): None} etc. Otherwise, I guarantee that somebody will be pulling their hair out as to why Python only sometimes detects duplicate keys. Better to never detect them at all (so you know you have to test for duplicates yourself) than to give a false sense of security. -- Steve

On Wed, May 4, 2016 at 10:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:
My response: Yes. I am okay with saying that. Because the question really is: What is this dictionary being used for? Why are you using keys that might, under some circumstances, overwrite each other? The point of a dict is to retrieve values when given keys. How can you do this, if those keys might collide? I've yet to see any real-world code that does this usefully, in a dict display. And the best "well, maybe if" examples have all involved generated code, so it wouldn't be hard to change your codegen to use this instead: d = dict(( (f.write(a), 'alpha'), (f.write(b), 'beta'), (f.write(c), 'gamma'), (f.write(d), 'delta'), )) which, of course, will continue to function as expected. ChrisA

On 04/05/2016 00:51, Steven D'Aprano wrote:
My initial reaction to the OP was negative, as in most contexts where keys are added to dictionaries, repeated keys silently overwrite, and consistency is a virtue. However, the original use case (a long dict literal - (sorry, transcribe that into 'dict display' if you wish; to me 'dict literal' is an evocative and comprehensible term and I will continue to use it, not meaning to offend)) is a completely plausible one, and it now seems to me that detecting duplicates is a case where 'practicality beats purity'. I agree with everything in Luigi's post of 30-May 2016 22:29 (my local time, sorry). I would just add that as someone who has used a linter (pylint) occasionally - I am trying to discipline myself to use it regularly - even if you are aware of it, there is still a barrier to its use: linters tend to produce a huge amount of mostly useless guff which you have to search to find the few nuggets that you really need. Steven, I am sure that this is not your intention, but it feels as if your requests for clarification are nitpicking and attempts to throw dust in our eyes, and avoiding a serious attempt to address the OP. But let me (presuming,if he will forgive me, to speak for the OP) attempt to answer by making a concrete proposal, as a starting point for discussion: *In a dictionary literal, it should be a syntax error to have two keys which are literal int or literal basestring values and which are equal. *(I say basestring, meaning that { 'lion' : 'x', u'lion' : 'y' } would be an error. So of course would { 18L : 'x', 0x12 : 'y' } etc.) From this starting point, bikeshedding can begin: (1) Is there any other kind of literal (i.e. other than int or string) which should be included? (Float and complex? Decimal?? What have I missed?) (2) Should constant int/string expressions which are folded to constants at compile time also be included i.e. would { 2 : 'x', 1+1 : 'y } also be an error? (Suggestion: This is an implementation detail, undefined. It's irrelevant to the practical case.) (3) Should a runtime error be raised instead of a SyntaxError ? (I can't imagine why, but if it turns out to be easier to implement, fine.) (4) Should a warning be raised instead of an error? (5) Should the action be different if not just the keys, but the values also are constant and equal (i.e. a no-effect repeat), e.g. a warning instead of an error? ( I suggest no. It's still likely to be unintentional, i.e. a mistake that the programmer would want to know about.) (6) Are there any other changes to the proposal which would simplify implementation, while still addressing the original use case? (7) If the proposal is accepted, should it appear complete in the next Python release, or should there be a more gradual adoption process? (8) Can/should anything be done to mitigate breaking automatic code generators which do generate duplicate literal keys? Later: While I was composing the above, 1 more post arrived from Luigi and from Steven. I must reply to two points from Steven: (1) "When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways" No, no, no. Computers detect our errors and we would be helpless if they didn't. If this argument were taken to its logical conclusion, you would say that it is the programmer's responsibility to ensure that an entire (10000-line missile guidance?) program is syntactically correct, and all the compiler need do is either say "OK" or "Syntax Error" (or worse, try to run a syntactically incorrect program). Yes, Luigi's colleague could have made many mistakes, but surely it is a good thing for as many of them as is reasonably possible to be detected by the computer? Then we just have to decide what "reasonably possible" is. (2) "So unless you have a good answer to the question "Why are duplicate keys so special that the dict constructor has to guard against them, when it doesn't guard against all the other failures we have to check for?", I think the status quo should stand. There is one obvious answer: Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them. That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste." Sure, it is a matter of opinion. I ask: In the OP, what is the probability that (1) the programmer deliberately included a duplicate literal key (2) the programmer made a mistake? I concede that the answer may be different for automatic code generators (although if I wrote one that generated duplicate literal keys, I wouldn't be proud of it). Best wishes Rob Cliffe

On Wed, May 4, 2016 at 11:54 AM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
This is the very soonest it can possibly appear, which means that notions of "long integer" and "basestring" aren't really applicable - they're Python 2 concepts. Text strings and byte strings in Python 3 are completely different:
{'lion': 'x', b'lion': 'y'} {'lion': 'x', b'lion': 'y'}
and there aren't any short/long integers. So here's my rewording of your proposal: *** Identical string or numeric literals used as keys in the same dictionary display MAY cause a SyntaxError. *** Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined. Implementations are welcome to not check for this at all if they wish. Valid types for this check would be str, bytes, int, float, and complex - there's no way that the same literal could result in unequal values, so these are all safe. I'm still not sure whether I'm in favour of the proposal or not, but that's how I'd word it. ChrisA

On Tue, May 3, 2016, at 22:22, Chris Angelico wrote:
Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined.
For *general* constant folding, maybe, but what's the point of having it if it doesn't apply to negative numbers and nonimaginary complex numbers? How about tuple displays consisting of only numeric and string literals, empty tuple displays, and tuple displays which themselves satisfy this definition?

On Wed, May 4, 2016 at 12:44 PM, Random832 <random832@fastmail.com> wrote:
That's why I say implementation-defined. I would generally expect -5 to count as a literal, but if (2+3) or ("a", "b") doesn't, so be it. Of course, I would expect 'a' and "a" to be seen as identical. Likewise 100 and 0x64. Beyond that, it's basically up to whatever the implementation finds easy; as long as the keys *would be* folded, and are literals, it's permissible to raise the error. ChrisA

On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
Did your colleague really have 200+ items in the dict? No matter, I suppose. The same principle applies. When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways: lives_in = { # missed value 'lion': ['Africa', 'America'], # misspelled value 'parrot': ['Eruope'], # misspelled key 'kangeroo': ['Australia'], # invalid key 'kettle': ['Arctic'], # invalid value 'aardvark': 'South Africa', # missed key # oops, forgot 'tiger' altogether } Where was your colleague's data validation? I'm sorry that your colleague lost a lot of time debugging this failure, but you might have had exactly the same result from any of the above errors. Unless somebody can demonstrate that "duplicate keys" is a systematic and common failure among Python programmers, I think that it is perfectly reasonable to put the onus on detecting duplicates on the programmer, just like all those other data errors. The data validation need not be a big burden. In my own code, unless the dict is so small that I can easily see that it is correct with my own eyes, I always follow it with an assertion: assert len(lives_in) == 250 which is a cheap test for at least some duplicate, missed or extra keys. But depending on your use-case, it may be that dict is the wrong data structure to use, and you need something that will validate items as they are entered. Unless your dict is small enough that you can see it is correct by eye, you need some sort of check that your data is valid, that you haven't forgotten keys, or misspelled them. The dict constructor won't and can't do that for you, so you need to do it youself. Once you're doing that, then it is no extra effort to check for duplicates. So unless you have a good answer to the question "Why are duplicate keys so special that the dict constructor has to guard against them, when it doesn't guard against all the other failures we have to check for?", I think the status quo should stand. There is one obvious answer: Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them. That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste. -- Steve

On 04.05.2016 03:09, Steven D'Aprano wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice. For the implementation, there are two possibilities I can think of: 1. have STORE_MAP run a test for an already existing entry and issue a warning 2. add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal The first one causes a lot of overhead, esp. for larger mappings such as the ones used by the codecs or other static data tables. Most of the times these are generated anyway, so the warning case would not occur at all, but you'd still have to check for a collision N times. The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 04 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Wed, May 04, 2016 at 01:56:56PM +0200, M.-A. Lemburg wrote:
I'm not sure I understand what you mean by this. Are you suggesting that if I write: {1: None, 1: None} Python knows that it got two keys as argument, and so CHECK_MAP can determine that there is a duplicate? (i.e. that 2 keys were arguments, but the finished dict only has length 1). But if I write: {x: None, x: None} Python *cannot* tell that there were two keys given (since they are expressions, not constants) and CHECK_MAP can't do anything? -- Steve

On 04.05.2016 14:15, Steven D'Aprano wrote:
Well, the CHECK_MAP byte code would only check whether the number of assignments you are doing is equal to the number of keys found in the final dict. That's a very coarse test and not perfect. It can also not tell you which key causes the mismatch. On the plus side, the extra check is a simply size check and doesn't have to be applied per added mapping. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 04 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Wed, May 04, 2016 at 02:37:56PM +0200, M.-A. Lemburg wrote:
I agree with you that an error is unjustified but a warning might be.
You have suggested that's too expensive, so I'm happy to rule that out.
Sorry MAL, perhaps I'm a bit slow tonight, but I'm still lost. {x: None, x: None} -- would CHECK_MAP detect that as a duplicate or not? If the answer is "Yes", then this seems to me to be a cheap way of detecting the existence of at least one duplicate (but not which key was duplicate). That strikes me as good enough, in which case I would support raising a warning. -- Steve

On 04.05.2016 16:00, Steven D'Aprano wrote:
Yes:
Since 1 != 2 -> issue warning We'd just have to tell the CHECK_MAP byte code to check for len(map) == 2. The byte code compiler will be able to determine this and add the correct byte code entry. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 04 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

After following the long thread, I'm solidly -1, at least on an error. A warning wouldn't be so bad, but would still feel like a nuisance not a feature. Such a warning should belong to linters, not to the language itself. Use those to automatically catch statically detectible probably-bad patterns like duplicate literals as keys of dictionary displays. The thing that nudged me to -1 is code something like the following, which I've actually written for completely valid use-cases: def code_generate(data_source, output=open('data.py','w')): """Write data to loadable Python module data_source may contain multiple occurrences of each key. Later occurrences are "better" than earlier ones """ print("data = {", file=output) for key, val in data_source: print(" %s: %s," % (key, val), file=output) print("}", file=output) I really don't want the Python interpreter to complain about my perfectly valid 'data.py' module that I auto-generated knowing full well it might have duplicate literal keys. Yes, of course I could write a more complex function that explicitly pruned keys if they were duplicates before writing to 'data.py'. But I do not want to have to think about that when a well defined semantics of Python since version 0.9 does not require that extra work. On Wed, May 4, 2016 at 4:56 AM, M.-A. Lemburg <mal@egenix.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, May 5, 2016 at 12:36 AM, David Mertz <mertz@gnosis.cx> wrote:
Or, much more simply, you could just dump it straight out like this: print("data = dict(%r)" % list(data_source), file=output) If the dict display starts rejecting duplicates, these kinds of code generators should be easily tweakable. The only risk would be if duplicates can *theoretically* happen, but are incredibly rare, such that the code runs for years without anyone noticing. This is a way of catching compile-time-detectable errors. Python has generally been pretty helpful with that kind of thing, even to the point of special-casing print-used-as-a-statement to tell you to put parens around it. While I fully understand that there are reasons against doing this (including "that's the job of linters, not the core language"), I do _not_ understand the apparent attitude that it's a fundamentally bad thing to do. Most cases where a dict display is used, it's intended to have unique keys. ChrisA

On Wed, May 4, 2016 at 10:59 AM Chris Angelico <rosuav@gmail.com> wrote:
You accidentally snipped out Mertz' statement that he didn't want the interpreter to complain about "probably-bad" patterns. This implies the distinction that linters should complain about probably-bad things while the interpreter only complains about always-bad things. I guess you could argue that warnings exist so that the interpreter can complain about probably-bad things as well. If that's what you'd like, we shouldn't call it a "compile-time-detectable error" but instead, uh, ... hard to find the right vocabulary here.

All this discussion is certainly very educational for the participants, but I want to remind everyone that this is not going to happen without a PEP. Personally I'm -0 on the whole business. Some quick notes: - In ABC, this was considered, and the rule was that {k1: v1; k2: v2} was allowed to have k1==k2 as long as v1==v2. Otherwise it was an error. (But ABC did not have comprehensions.) - Franklin? brought up sets and suggested the same thing should happen there. I'm not so sure though. When teaching Python it's nice to be able to demonstrate the fundamental property of sets by showing it:
{1, 1, 1} {1}
-- --Guido van Rossum (python.org/~guido)

I honestly don't recall and I assume the docs are vague, but "comparing expressions" was not a thing in ABC. I'm sure they compared the values. (This is why I showed it with variables instead of constants.) On Wed, May 4, 2016 at 8:29 AM, Michael Selik <michael.selik@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 5/3/2016 9:09 PM, Steven D'Aprano wrote:
I often use large literal dicts, with literal string keys. There are many times (a couple times a month) I add a duplicate key because I am prone to making mistakes like that. Now, I am lucky because I purchased an IDE that highlights those duplicates as errors. If I had to rely on a separate linter; with my duplicate keys hiding in a long list of other trivial formatting mistakes, then the time cost of reviewing the linting output on every code change is greater than the time cost of simply debugging the program when I see it inevitably malfunction. Instead, I would use data validation in my own code.
I would not use this check, it is error prone, given the number of times I update the dicts during development. Either I will loose count, or the line math is wrong (because of extra spaces), or I would count the duplicate. A better consistency check requires I construct all literal dicts like: def add_item(dict_, key, value): if key in dict_: raise Exception() dict_[key] = value lives_in = {} add_item(lives_in, 'lion', ['Africa', 'America']) add_item(lives_in, 'parrot', ['Europe']) # ... 100+ more rows here add_item(lives_in, 'lion', ['Europe']) # ... 100+ more rows here Which is inelegant, but necessary to minimize the programmer time wasted trying to detect these duplicate keys manually.

On 5/4/2016 11:45 AM, Chris Angelico wrote:
Thinking about examples [1], it turns out I was mistaken. My IDE does NOT have an *integrated* linter that detects duplicate literal keys! My IDE does detect duplicate literal keys for Javascript, which I obviously got confused. Oh dear! That means my dicts are full of duplicates! https://github.com/klahnakoski/TestLog-ETL/blob/d7e8a99f7532a0551c98dfdccefe...

With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar} It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons. So even if you have good points, it's not possible to do this. Le 02/05/2016 23:36, Luigi Semenzato a écrit :

With Python 3.5, you can unpack in dictionaries: d = {**foo, **bar} It's now the defacto way of merging 2 dicts, and it requires that you can use twice the same key for ovious reasons. So even if you have good points, it's not possible to do this. Le 02/05/2016 23:36, Luigi Semenzato a écrit :

I am still not seeing good objections to not adding at least a warning.
and similarly:
Probably I didn't make it sufficiently clear in my OP, but this is only for duplicate literal keys in the same constructor, which would cover the vast majority of user errors. A well-known precedent is the divide-by-zero warning in C. The GNU compiler produces a warning when, after constant folding, an expression contains a division by zero. For instance: test.c:9:12: warning: division by zero [-Wdiv-by-zero] int z = 1/0; and this: test.c:9:12: warning: division by zero [-Wdiv-by-zero] int z = 1/(1 - 1); but this code doesn't produce a warning: const int x = 0; int z = 1/x; For the dict constructor, I don't think that the constant folding is even particularly useful, although it wouldn't hurt. Implementation-wise, checking for duplicate literals while parsing is just a matter of maintaining a hash table for that scope. This is already done for identifiers, and probably heavily optimized, so duplicate detection wouldn't add much code or be particularly expensive.

That would introduce a very weird special case. All those would work: foo = 1 bar = "1" {1: True, 2 - 1: True, foo: True, int(bar): True, int("1"): True, **{1: True}} But not this: {1: True, 1: True} It would be very confusing. Le 03/05/2016 09:08, Gregory P. Smith a écrit :

On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato <luigi@semenzato.com> wrote:
Do you feel that "prefer status quo" is not strong reasoning? http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm... Your word choice of "[no] valid reason" is interesting. In the bug tracker discussion, folks argue that the problem is easily solved with code style standards and static analyzers. I thought that was valid. Rob Cliffe mentioned the fact that repeated keyword arguments causes a syntax error. I see the parallel here, except that keyword arguments must be valid identifiers. Literal dicts syntactically may have any expression as the key. Creating a special case for the parser to enforce uniqueness of number and string literals as keys seems more trouble than its worth.

On 05/03/2016 12:21 PM, Michael Selik wrote:
It is not strong enough to prevent every good idea, or Python would be a static language (as in, no more changes except maybe bug fixes).
Which seems irrelevant to your argument: a duplicate key is a duplicate key whether it's 123 or 'xyz'.
Creating a special case for the parser to enforce uniqueness of number and string literals as keys
I'm pretty sure the OP would be happy with uniqueness of keys, whether those keys were string literals, numeric literals, or function objects.
seems more trouble than its worth.
Maybe. That is what we are discussing. -- ~Ethan~

On Tue, May 3, 2016 at 4:00 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Of course. I'm no more saying "stop change" than you are saying "all change is good". Luigi, as with many who have ideas for new features, appears to be trying to shift the burden of proof. I think it's well established that the burden of proof (proving it's a good idea) rests on the one advocating change.
If an expression includes an impure function, then the duplication of assignment to that key may have a desirable side-effect.
How would you handle an expression that evaluates differently for each call? For example: {random(): 0, random(): 1}
seems more trouble than its worth.
Maybe. That is what we are discussing.
Indeed. Let me flip the original request: I'd like to hear stronger arguments for change, please. I'd be particularly interested in hearing how often Pylint has caught this mistake.

On 05/03/2016 01:43 PM, Michael Selik wrote:
On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
I'm willing to say that should be done with an existing dict, not in a literal.
Easy: Don't Do That. ;)
Well, when this happened to me I spent a loooonnnnnngggg time figuring out what the problem is. One just doesn't expect duplicate keys to not raise: --> dict(one=1, one='uno') File "<stdin>", line 1 SyntaxError: keyword argument repeated So I guess the current advice is: Don't use dict literals, use dict() instead. And that just seems silly. ;) -- ~Ethan~

On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
But are you willing to say that the compiler should enforce that stylistic judgement?
I see your wink, so I presume you're not actually suggesting that the compiler (1) prohibit all function calls in dict displays, or (2) hard-code the function name "random" in a black list. So what are you actually suggesting? Michael is raising a good point here. If you don't like random as an example, how about: d = {spam(a): 'a', spam(b): 'BB', spam(c): 'Ccc'} I'm intentionally not giving you the values of a, b or c, or telling you what spam() returns. Now you have the same information available to you as the compiler has at compile time. What do you intend to do? It's one thing to say "duplicate keys should be prohibited", and another to come up with a detailed explanation of what precisely should happen.
Okay, but how often does this happen? Daily? Weekly? Once per career? What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?
Huh, I had completely forgotten that duplicate keyword arguments raise a syntax error. I thought you got a runtime TypeError. There's a difference though. Keyword argument *names* must be identifiers, not expressions, and duplicates can be recognised by the compiler at compile-time: py> def spam(**kw): pass ... py> spam(eggs=print("side-effect")) side-effect py> spam(eggs=print("side-effect"), eggs=print('another side-effect')) File "<stdin>", line 1 SyntaxError: keyword argument repeated Duplicate keys are not, and in general can't be: py> d = {'eggs': print("side-effect"), ... 'eggs': print('another side-effect')} side-effect another side-effect py> d {'eggs': None} If you think of the dict constructor as something like: for key,item in initial_values: self[key] = item then the current behaviour is perfectly valid, and the advice is, if you don't want duplicate keys, don't use them in the first place. If you think of it as: for key,item in initial_values: if key in self: raise TypeError('duplicate key') else: self[key] = item then you have to deal with the fact that you might only notice a duplicate after mutating your dict, which may include side-effects. -- Steve

On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
Indeed.
Since the dict created by that dict display happens at run time, I am suggesting that during the process of creating that dict that any keys, however generated or retrieved, that are duplicates of keys already in the dict, cause an appropriate error (to be determined). Also, any dict display that is able to be used for dict creation at compile time (thereby avoiding the run-time logic) should have the same checks and raise the same error if duplicate keys are found (I imagine both run-time and compile-time dict creation from dict displays would use the same underlying function).
It's one thing to say "duplicate keys should be prohibited", and another to come up with a detailed explanation of what precisely should happen.
Hopefully my explanation is detail enough.
Once a year, maybe. The experience is painful enough to remember when the subject arises, but not painful enough to remember to be careful. ;)
What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?
Longer than an hour, shorter than a day.
I am largely unconcerned by the run-time/compile-time distinction in this case -- whenever it happens, check that the incoming key doesn't already exist.
Not sure what you mean by "mutating your dict" -- we're still talking about initial "dict display" to "dict" conversion, yes? -- ~Ethan~

On Tue, May 3, 2016 at 6:40 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
I agree with Ethan that this is the correct way of looking at this problem, rather than my original, confusing presentation. Duplicate keys in a dict display should be an error, period. In some cases the error can be detected at parse time, which is nicer. In other cases it can be detected only at run time, but it should always be an error. Also, Steven D'Aprano <steve@pearwood.info> wrote:
Thank you, I will gladly take the "reasonable answer" qualification offered here. And I agree that often these issues build down to aesthetics. But it's hard for me to see how the current semantics support the Zen of Python. I wouldn't presume that all original design decisions were infallible, and I still haven't heard a single good argument in favor of *allowing* duplicate keys.

On Tue, May 03, 2016 at 06:58:43PM -0700, Luigi Semenzato wrote:
I still haven't heard a single good argument in favor of *allowing* duplicate keys.
Because duplicate keys are not necessarily an error. Maybe you don't care if there are duplicates. Maybe you expect duplicates, and want the last one seen to win. Sometimes experienced programmers forget what it is like to be a beginning programmer. One of the most useful things a beginner can do is experiment at the interactive interpreter. The more things the language prohibits, the more errors the beginner gets, the less they learn, and the more intimidating the language is. Duplicate keys are not necessarily an error. It just means one value or the other is dropped. How do dict displays construct the dict? Do they add keys from left to right, or right to left? I can read the source, if I can find it, and locate the right section of the right file, and if I can read C. Or I can do: py> {1: "left", 1: "right"} {1: 'right'} and in one line of code in the Python interactive interpreter I've learned more than in probably an hour of digging through C code. To a beginner, being able to do things like this without Big Brother shouting at them "No! That's Wrong! Have An Exception!!!" all the time is literally without price. Who cares that it's a useless thing to do in production code? "Useless" is not an error. Errors, and warnings, should be left for things which really are errors. Even if you think my example doesn't matter, and you think that duplicate keys are always an error, it may be that this case is not bad enough to prohibit. You don't get this check for free. It requires code, and that code has to run, and it will run for every single dict display whether needed or not. Depending on how it is implemented, it might be quite expensive. All those programmers who have carefully inspected their source code to ensure that the data used in the dict is correct, with no duplicates and no misspellings and no missed keys, they have no need of this check. But you would force it on them anyway, because one of your colleagues was sloppy and made a mistake and had difficulty debugging it. Okay, I have sympathy for you and your colleague, but that doesn't mean its okay for everyone to carry the cost of checking for your error. Maybe it is justified. Maybe it isn't. If you are really serious that you think duplicate keys are an error, then to be consistent you should also want to raise an error in the dict constructor, dict(obj, **kwargs). If not, what is your reasoning for why duplicate keys should be allowed in dict(...)? -- Steve

Steven D'Aprano <steve@pearwood.info> writes:
The language reference explicitly says that the order from left to right and duplicate keys are also explicitly supported [1]: If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given. [1] https://docs.python.org/3/reference/expressions.html#dictionary-displays Akira

On 4 May 2016 at 11:40, Ethan Furman <ethan@stoneleaf.us> wrote:
I was curious as to whether or not it was technically feasible to implement this "duplicate keys" check solely for dict displays in CPython without impacting other dict use cases, and it turns out it should be. The key point is that BUILD_MAP already has its own PyDict_SetItem() loop in ceval.c (iterating over stack entries), and hence doesn't rely on the "dict_common_update" code shared by dict.update and the dict constructor(s). As such, the technical change being requested here (at least for CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check prior to each key insertion, and: 1. Report a DeprecationWarning in 3.6 2. Report a [RuntimeError? ValueError?] in 3.7+ For code generators emitting dict displays, running their content through OrderedDict (or an equivalent if the code generator is written in something other than Python) prior to writing it out will be sufficient to eliminate duplicates. Since PyDict_Contains() is O(1), this shouldn't introduce a major slowdown in dict display evaluation (although the impact should still be measued prior to making any changes). All-in-all, the concept gets a +0 from me, although to allow for implementations that don't share CPython's distinction between dict displays and the normal dict constructor, any such behaviour (if introduced) should likely be flagged as an implementation detail that may vary across interpreters (as I would have been -1 if BUILD_MAP's implementation wasn't already independent of dict_common_update). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, May 4, 2016 at 12:04 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
To be clear, this implementation would potentially raise an error in the random example -- ``{random(): 0, random(): 1}`` -- correct? If so, then code smell or not, I think that's too much breakage. We could restrict the PyDict_Contains() call to only enforcing uniqueness of string or numeric literals, but then we'd still have some oddball situations: def a(): return 'a' {a(): 0, 'a': 1}

On 05/03/2016 09:20 PM, Michael Selik wrote:
Either fix it all the way or don't bother. The rule "duplicates are allowed" or "duplicates are not allowed" is much simpler and easier to remember than "duplicates are allowed except when..." or, contrariwise, "duplicates are not allowed unless ...". -- ~Ethan~

On 04/05/2016 07:35, Ethan Furman wrote:
I disagree, on the grounds that 'practicality beats purity' (sorry to repeat myself). Here's a thought. We are rightly concerned with changes breaking existing code. The proposed change could lead to some existing code being *repaired*, by pointing out a mistake that the author doesn't know is there. Rob Cliffe

On 4 May 2016 at 10:05, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
The problem is that it could also break code that works at the moment. Consider a mapping of constants to names: colors = { RED: "Red", BLUE: "Blue", PINK: "Pink", LIGHT_BLUE: "Light Blue", DARK_BLUE: "Dark Blue", } On a certain output device, there may be no "light blue", and in that case the code sets LIGHT_BLUE = BLUE. That's entirely reasonable, and the author may be perfectly happy with the current behavior in that case. So I think it's important for backward compatibility that this change be *only* for the case of literal values as keys (which is what the OP proposed, of course). And given that it needs to just be for literal values as keys, I personally think that makes it a case of "Special cases aren't special enough to break the rules", so I'm -0 on the proposal (I don't think it's a *bad* thing, in the constants-only form, just that it's not worth the effort - but if someone else makes the effort, I won't object). Paul

On 05/04/2016 02:48 AM, Paul Moore wrote:
Actually, that code is setting BLUE = "Light Blue". ;)
I object. Having only /some/ duplicates checked for is overcomplicating without very good reason. Perhaps the way forward is to check for all duplicates, but issue a Warning instead of raising an Exception? To be honest, though, I'm starting to agree with Steven D'Aprano that sufficiently large dicts should be verified by other means, and those other means can also check for duplicates. -- ~Ethan~

On 4 May 2016 at 13:42, Ethan Furman <ethan@stoneleaf.us> wrote:
You missed my point, I think. BLUE and LIGHT_BLUE are "constant" values (set once at the top of the program). In some circumstances, the programmer deliberately wants BLUE and LIGHT_BLUE to be the same value (but not in others). The programmer is happy that a constant BLUE will be decoded as "Light Blue" by the colors mapping in that case. This is IMO perfectly valid code, and having it rejected by an over-zealous "duplicate check" is unacceptable. As far as I am concerned, this example clearly demonstrates that any duplicate check should only be applied to actual constant literal values used as keys.
Warning on perfectly good code is just as obnoxious IMO.
As far as I'm aware, the people who have looked at the implementation details seem to be saying that checking for duplicates on literals only is too hard to be practical, and that's the only variation I'd be willing to accept, so I agree - do duplicate checking by other means (assertions in the code being the obvious option). Paul

On 05/04/2016 07:54 AM, Paul Moore wrote:
On 4 May 2016 at 13:42, Ethan Furman wrote:
On 05/04/2016 02:48 AM, Paul Moore wrote:
No, I got your point, and I agree. But one of us is confused about your explanation of that example (it could easily be me). Here's how I think it works: RED = 1 BLUE = 2 PINK = 3 LIGHT_BLUE = 4 DARK_BLUE = 5 # oops, light blue not supported on this machine, just use blue LIGHT_BLUE = 2
colors {1: 'Red', 2: 'Light Blue', 3: 'Pink', 5: 'Dark Blue'}
So my issue was not with your point, but with your explanation of the example of your point. -- ~Ethan~

On 4 May 2016 at 17:06, Ethan Furman <ethan@stoneleaf.us> wrote:
Ah, sorry. I hadn't checked and wasn't sure myself whether the first pairing or the last took priority. I tried to be vague enough that it didn't matter that I wasn't sure (my theoretical programmer who relied on this behaviour clearly *would* be sure :-)) but I obviously failed. The moral of the story is to check my examples before posting :-) Paul

2016-05-04 6:03 GMT+02:00, Nick Coghlan <ncoghlan@gmail.com>:
[...]
1. Report a DeprecationWarning in 3.6 2. Report a [RuntimeError? ValueError?] in 3.7+
But what about SyntaxError instead of RuntimeError? Could it be easily done too? PS. Probably nobody is interested in my tests with unicode literals which are "NFKC equivalent" but it help me to understand how things are done behind the scene. def f(**kwd): for i in kwd: print(i, kwd[i]) def tst(): f(ij=1, ij=2) # this is syntax error def tst2(): f(**{1:1}) # this is runtime error def tst3(): f(**{'ij':1, 'ij':2}) # this is legal (for me a little unexpected, maybe bug?)

On Wed, May 04, 2016 at 02:03:59PM +1000, Nick Coghlan wrote:
What do you think of MAL's suggestion of adding a CHECK_MAP op-code and raising a warning? I'd be much more comfortable with a warning than an error, because I really don't think that duplicate keys is *necessarily* an error. I think Paul Moore is onto something with his example of BLUE/LIGHT_BLUE, even if he got the order the wrong way around :-) This has been the documented behaviour of dicts since at least 1.5: https://docs.python.org/release/1.5/ref/ref-7.html#HEADING7-35 and it matches the behaviour of both the dict constructor and dict comprehensions. I don't think anyone has made the case for turning it into an error, although I concede that it's perhaps worthy of a warning. (At least, I won't object to it being a warning, provided we can turn it off :-) -- Steve

On Wed, May 04, 2016 at 09:09:22AM -0700, Ethan Furman wrote:
On 05/04/2016 07:20 AM, Steven D'Aprano wrote:
I was thinking of dict with a list of tuples as argument: py> dict([(1, 'a'), (2, 'b'), (1, 'c')]) {1: 'c', 2: 'b'} or an object with a keys method: py> class K: ... def keys(self): ... return [1, 2, 1] ... def __getitem__(self, i): ... return 'abcd'[i] ... py> dict(K()) {1: 'b', 2: 'c'} -- Steve

It's all about typing and reading. Suppose Python didn't have a special syntax for dict displays. Then folks would likely write this: dict(( (1, 11), (2, 22), ... )) but then suppose someone says: "hey, folks do this a lot, let's make it more convenient" and introduce a curly braces notation, and suppose it allows overwriting, like it does now. The new notation exists *exclusively* to make it easier to type and read dictionaries in the code---by a human. Except that with these semantics it makes it impossible to detect duplicate key errors, which the vast majority of users would prefer to notice (a bold and unproven statement, but a reasonable guess). What's worse, people don't realize the problem until they have many large dict displays, and many bugs. So they feel annoyed and betrayed. Suppose instead that the new notation does NOT allow overwriting. If folks rely on the overwrite semantics, in either human-written or machine-generated code, they will immediately notice the problem and use dict(), with very little inconvenience to code generators. (And some to the humans, but remember, they are a tiny minority :) The strict semantics are maybe marginally harder to explain, and maybe marginally harder to implement, but neither should be a show-stopper. However, that's not where we are. There is the legacy code issue, and the legacy mindset. I don't know how to evaluate these issues, and I don't even know how much experience the community has with backward-incompatible changes. So, while I am fairly positive that it would have been a better choice to start with, I have no idea whether, when, and how this proposal should have any effect. P.S. The issue of detecting duplicate keys at parse time is completely separate and it was my mistake to focus on it first. P.P.S. The issue of whether a set display should allow duplicate elements is less important, since allowing them does not have destructive consequences.

On Wed, May 4, 2016, 1:37 PM Luigi Semenzato <luigi@semenzato.com> wrote:
I'm part of the minority(?) that doesn't care. I hope that helps adjust your odds ratio for that guess. What's worse, people don't
Even a tiny minority of the Python community might be thousands of people.
Well, there's that transition from 2 to 3. You do that a few times and you get a reputation for instability. I like to imagine a hidden line in the Zen of Python. Between the title and "Beautiful is better..." I mentally insert, "Backwards compatibility is important." How important, well, that's why it's Zen and not Rules. So, while I am fairly positive that it

On May 4, 2016 10:59 AM, "Michael Selik" <michael.selik@gmail.com> wrote:
your odds ratio for that guess. I'm also part of that group (I question whether it is minority) who wishes to allow duplicate keys at the syntax level, but to let linters complain. While *most* duplicate literal keys in dict displays probably are bugs, the legitimate and intended uses are not rare enough to prohibit them altogether.

On 5/4/2016 12:03 AM, Nick Coghlan wrote:
Changing only BUILD_MAP would invalidate current code equivalences and currently sensible optimizations. A toy example:
Sensible and comprehensible code transformation rule are important. Currently, the following rather trivial optimization of the above works.
The special rule for dict displays would invalidate this. In my post yesterday in response to Luigi (where I began 'The thread...'), I gave 4 equivalent other ways to initialize a dict using a Python loop (include a dict comprehension). Using a dict display amounts to un-rolling any of the loops and replacing the Python loop with the C loop buried in ceval. Changing the operation of just that loop would break the current equivalence. I suspect that the proposed change would introduce more bugs than it exposes. -- Terry Jan Reedy

On Wed, May 4, 2016 at 10:40 PM Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Certainly was. But it occurs relatively rarely and can be solved with linters and by stylistic changes like alphabetizing dict literals by key. Can anyone produce a single real-life use case where repeated literal
I thought Paul Moore gave a good example. You can see him and Ethan Furman discussing it in a fork of this thread. There were a few code-generator examples. Were these not realistic enough?

On Thu, May 05, 2016 at 03:38:05AM +0100, Rob Cliffe wrote:
Here I'm throwing down the gauntlet
You can throw down whatever you like, but Guido has pronounced that this change is not going to happen without a PEP. So unless somebody is willing to write a PEP, further discussion is a waste of time. -- Steve

On 5 May 2016 at 03:38, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Not literal keys no. If the proposal is *solely* that duplicate literals are detected and objected to (I reserve judgement for now on the question of warning vs error) then I don't think there will be significant code breakage. But even knowing that, the next steps are: 1. Produce a PEP (Guido has said this is a requirement) 2. Explain how this would be implemented in that PEP - because the only implementation techniques so far mentioned in this thread fail to restrict the check to literals. 3. Manage discussion on said PEP. I'll await a PEP before contributing any further to the discussion. Paul

Late, but here are a few more notes: 1. The gnumpy package (for Py2) used expressions in a `set` constructor which could evaluate to duplicates. It isn't hard to imagine a similar trick with `dict`, though I don't know if it'd be as useful. From gnumpy.py: ``` _floatTypes = set((types.FloatType, numpy.float64, numpy.float32, getattr(numpy, 'float128', numpy.float64), getattr(numpy, 'float96', numpy.float64))) ``` Refactored a bit: ``` _floatTypes = {float, numpy.float64, numpy.float32, getattr(numpy, 'float128', float), getattr(numpy, 'float96', float)} ``` 2. Even if the warning/error is compiler-only, `eval` and `exec` would have the same warning/error, and they are NOT always evaluating dev-written source code. (Though you can argue that `exec` and `eval` on strings are where you should be on high alert with regard to data validation.) On the other hand, a linter is only run on manually-written code. 3. A compile-time check doesn't necessarily give a false sense of security. It just needs the proper wording to indicate that it's a compile-time check. "Warning: Duplicate (constant|literal|string literal) keys detected."

On Tue, May 3, 2016, at 20:23, Steven D'Aprano wrote:
How about prohibit it anyway (on the grounds of the two key expressions being identical - no comment on spam(a) vs spam(b)), and if the user *really* wants it they'll do {random(): x for x in (0, 1)} [or {other: stuff, **{that}}]

On Tue, May 3, 2016 at 1:43 PM, Michael Selik <michael.selik@gmail.com> wrote:
Correct. In this particular case I think that the change is better. It is simple, helpful, and doesn't seem to have huge implications (if any) in terms of documentation.
Correct.
Which gives me one more opportunity to present this case: A dictionary literal is a convenient input format for structured data, and is used as such in many cases, for instance instead of equivalent JSON, which requires an additional module, an additional parsing step, and learning an additional syntax (a lot of Python programmers don't know JSON). The convenience, however, is hampered by the duplicate key issue. Some of the strong points of Python are: 1. easy to learn; 2. does the right thing in most cases. It just seems to me that the right thing here would be to signal duplicate literal keys.
Yes, this is absolutely true, but probably very rare, and arguably not a recommendable coding practice, therefore IMO it offers very little support for the status quo.
We discussed this already. My request only applies to keys that are literals. (He's talking about function objects, not function calls.)
Pylint catches this mistake fine. But many Python programmers don't know about pylint either. For instance, a bunch of hardware engineers that wrote some python as a driver for a circuit simulator. Anyway, thank you for making these points.

2016-05-03 21:21 GMT+02:00, Michael Selik <michael.selik@gmail.com>: [...]
Just for curiosity: keys could be valid identifier, could be different (at least in one POV) and still cause Syntax error as keyword argument repeated. {'ij':1, 'ij':2} # this is dict with 2 keys and 2 different values dict(ij=1, ij=2) # SyntaxError: keyword argument repeated It is because parser is doing NFKC normalization for non-ascii identifiers - see https://www.python.org/dev/peps/pep-3131/ ('\u0069\u006a' == 'ij' != 'ij' == '\u0133')

On May 3, 2016 3:21 PM, "Michael Selik" <michael.selik@gmail.com> wrote:
On Mon, May 2, 2016 at 5:36 PM Luigi Semenzato <luigi@semenzato.com>
wrote:
http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm...
Your word choice of "[no] valid reason" is interesting. In the bug
Literal dicts syntactically may have any expression as the key. Creating a special case for the parser to enforce uniqueness of number and string
tracker discussion, folks argue that the problem is easily solved with code style standards and static analyzers. I thought that was valid. "X is detectable", whether by tools, or by magic imps yelling at you when you do it, is not an argument for X. When asking for a useful purpose for the current semantics, "prefer the status quo" is not strong enough. "Errors should never pass silently," and there is an argument on the table that duplicate spelled-out keys are probably written in error. He isn't shifting the burden, he's putting the ball in your court. literals as keys seems more trouble than its worth. That is an argument. Literal keys, or a single name repeated in the key list, might be an error, but you can't say the same about expressions. If you only error on literal keys, then there is an inconsistency between literals and expressions. By the way, unless [[the name is guaranteed not to be rebound in a closure called within the dict display]], multiple items with the same variable name as key isn't necessarily an error: def f(): x=1 def g(): nonlocal x x += 1 return x return {x: g(), x: g(), x: g()} print(f()) # {2: 2, 3: 3, 4: 4} "Don't do that" doesn't say what the compiler behavior should be: should it analyze for nonlocal closures, or should it only check literals?

Hi Luigi, On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote: [...]
As far as I can see, you haven't specified in *detail* what change you wish to propose. It is difficult for me to judge your proposal when I don't know precisely what you are proposing. Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else? What counts as "duplicate keys"? I presume that you mean that two keys count as duplicate if they hash to the same value, and are equal. But you keep mentioning "literals" -- does this mean you care more about whether they look the same rather than are the same? # duplicate literals, forbidden d = {100: 1, 100: 2} # duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2} You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display. (Don't worry, I make the same mistake.) But the point is, a "dict literal" (display) can contain keys which are not themselves literals, as above. Those keys can have arbitrarily complex semantics, including side-effects. What do you expect to happen? -- Steve

Hi, On Tue, May 3, 2016 at 4:51 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Yes, and I apologize, because on further reflection I don't even know what is possible within the model of the Python interpreter.
Should duplicate keys be a SyntaxError at compile time, or a TypeError at runtime? Or something else?
Is there such a thing as a SyntaxWarning? From my perspective it would be fine to make it a SyntaxError, but I am not sure it would be overall a good choice for legacy code (i.e. as an error it might break old code, and I don't know how many other things a new language specification is going to break). It could also be a run-time error, but it might be nicer to detect it earlier. Maybe both.
Correct. The errors that I am guessing matter the most are those for which folks copy-paste a key-value pair, where the key is a literal string, intending to change the key, and then forget to change it.
# duplicate literals, forbidden d = {100: 1, 100: 2}
Correct. Possibly caught during parsing.
# duplicate non-literals, allowed d = {100: 1, len("ab")*50: 2}
For sure allowed during parsing, possibly caught at run-time.
You keep mentioning "dictionary literal", but there actually is no such thing in Python. I think you mean a dict display.
Yes sorry, bad use of the term. I meant the curly-brace constructor---and only that.
I'd mainly like to catch the "obvious" duplicates, just like the GNU C compiler catches the "obvious" divisions by zero. And by that I mean occurrences of duplicate string literals as keys. For instance: At parse time: {"foo": 333, "foo": 444} # forbidden---must catch {"foo": 333, "f" + "oo": 444} # OK to miss but also OK to catch {"foo": 333, function_returning_foo(): 444} # not caught (i.e. no miracles expected) At run time, I still slightly suspect that it may be more useful to prohibit duplicate keys in the same constructor than it is to allow them, but I don't have a clear opinion, because of both legacy code and other possible semantic issues. Thanks!

On 5/3/2016 8:46 PM, Luigi Semenzato wrote:
On Tue, May 3, 2016 at 4:51 PM, Steven D'Aprano <steve@pearwood.info> wrote:
The thread seems to partly be based on a confusion between literals and displays, which led to dict displays being called dict literals. A literal defines an int or string at the initial lexing or tokenizing phase of the parser. A display, like comprehensions, abbreviates runtime code for creating a composite collection object.
The above is equivalent to and an abbreviation of d = {}; d[100] = 1; d[100] = 2 A bit silly, but quite legal. Every value replacement necessarily involves a 'duplicate key'. Both of the above are also equivalent to d = {a:b for a,b in ((100,1), (100,2))} This is also equivalent to d = {} for a, b in ((100,1), (100,2)): d[a] = b which is close to the python version of how the comprehension would implemented in python. Here is a 5th version, which I expect is close to the actual CPython code for a dict display. tem = [100, 1, 100, 2] # as in the top of the virtual machine stack n = 4 # number of items in display = 2 * number of pairs d = {} for i in range(-n, 0, 2): d[tem[i]] = tem[i+1] We have 5 equivalent snippets, each of which could have the same bug. Why single out just one to raise an exception? My guess is that it is partly from mistakenly thinking of the one as a like a lexer literal rather than as runtime code. -- Terry Jan Reedy

On Wed, May 4, 2016 at 1:23 PM, Terry Reedy <tjreedy@udel.edu> wrote:
But the lexer's definition of "literal" is extremely narrow, and doesn't reflect the way people think about them. Most people's idea of a literal is closer to what ast.literal_eval can accept: Help on function literal_eval in module ast: literal_eval(node_or_string) Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
ast.literal_eval("[1,-2,(3+4j)]") [1, -2, (3+4j)]
Neither -2 nor (3+4j) is technically a literal, and certainly list display isn't a literal, but "literal_eval" is happy to parse them. And that's how people think about them. Duplicate keys in dict display is more similar to duplicate keyword arguments in a function call than to reassignment. Can you show any good code that reuses a key in a single dict display? Wouldn't it at best be code smell? That doesn't necessarily mean that the compiler should stop you from doing it; but I do think that there's a fundamental difference in expectation here. And if the check happens, it wants to be a compile-time check, not a run-time one. ChrisA

On Tue, May 3, 2016, at 23:31, Chris Angelico wrote:
To play devil's advocate for a minute here... Why *don't* we allow duplicate keyword arguments in a function call, anyway? CALL_FUNCTION doesn't seem to actually care, if I create this situation by monkey-patching the calling function's consts to get duplicate keywords. All the arguments that have been raised here for *allowing* duplicate keyword arguments seem to apply just as well to f(**{'a': 1}, **{'a': 2}) [where 'a' may not be the only key present and they may not be literal^Wdisplays], yet that is prohibited. Why not remove that check, too? If we can blithely pass the same constant key twice to BUILD_MAP, why *shouldn't* we be able to do the same to CALL_FUNCTION? Any argument against the latter applies equally to the former, as far as I can see.

On Wed, May 4, 2016 at 12:19 AM Random832 <random832@fastmail.com> wrote:
Because keyword arguments can only be valid identifiers, rather than expressions. There's absolutely no reason to use the same identifier as keyword twice, but there could be some oddball reasons to use the same expression as dict key twice. [I know I've said that before, but it's a long thread. My apologies if you'd already read that.]

On Wed, May 4, 2016 at 12:40 AM, Michael Selik <michael.selik@gmail.com> wrote:
The uniqueness rule applies to splatted args, too. def printn(*args, **kwargs): print(*args, **kwargs, sep='\n') printn('hello', 'world', sep='\n') # => TypeError: print() got multiple values for keyword argument 'sep' Not a very useful example, but if you want to override or ignore incoming kwargs... ...... For semantics, I see a few possibilities (either error or warning): 0. Detect duplicate literal keys at compile-time. 1. Detect duplicate literals and hereditarily-literal displays and expressions at compile-time. (This includes keys which are tuple displays containing only hereditarily-literal elements, expressions of literals, and, er, implicit string concatenation like `'hello' " " 'world'`. As keys, these things are guaranteed to be immutable.) 2. Also statically analyze for duplicate variables as keys. (I think if it's a local variable that isn't captured by a closure using `nonlocal`, it is safe to assume that it's constant for that dict display.) 3. Constant-folding. (This leads to the dark path of static analysis shipping with the language.) float('inf'). Analyze at runtime. 1j. Declare that duplicate keys in a dict display causes underdefined behavior, and let interpreters decide how to handle it. -1. Identical expressions are warned against as duplicate keys. (By the way, anything that happens for dicts should also happen for sets.)

On Wed, May 4, 2016 at 2:32 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
As of Python 3.5.1, it is still not allowed: C:\Users\Jonathan>python Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

On 05/03/2016 11:39 PM, Jonathan Goble wrote:
On Wed, May 4, 2016 at 2:32 AM, Ethan Furman wrote:
On 05/03/2016 09:18 PM, Random832 wrote:
Ah, right. The PEP was changed: http://bugs.python.org/issue2292#msg234413 -- ~Ethan~

On Wed, May 04, 2016 at 12:18:59AM -0400, Random832 wrote:
(1) Because status quo wins. In the absence of a really compelling reason to remove that check, we don't have to justify prohibing duplicate keyword args, we just have to note that it is already in place and so we shouldn't change it. (2) Duplicate keys in a dict may have side-effects, which mean they do something. Duplicate keyword arguments don't have side-effects (not in the keys at least). So that's a difference. (3) The analogy between function calls and dicts is not really that strong. The dict constuctor intentionally allows certain duplicates: py> dict({'a': None}, a=2) {'a': 2} for good reason (think of over-riding the values provided in the input dict with keyword arguments), but it's not clear that there is an analogy for function calls. Perhaps there is, and if it is a good enough analogy, we might have to re-think the duplicate kwargs situation. -- Steve

On Wed, May 4, 2016, at 08:22, Steven D'Aprano wrote:
f(a=1, b=2, **{foo(): bar()}) where foo may return 'a' or 'b', though this is the first that I've heard that the side-effect argument only applies to the keys.
but not {**{'a': None}, **{'a': 2}})
foo(**{'a':None}, a=2) seems like a perfect analogy to me.

On 5/3/2016 11:31 PM, Chris Angelico wrote:
This was a preface to the main points that followed, and which were snipped from this followup.
I am obviously aware of that. My main point, not addressed by this response, is that 1. dict displays are currently equivalence to several forms of Python code with more explicit looping, including at least two that are commonly used to initialize dicts; and 2. that changing the semantics of dict displays and only dict displays will break this equivalence. In my response just not to Nick, I was more explicit that breaking this equivalence now, after 20+ years, will break code that depends on the current equivalence and break people's currently true expectations. -- Terry Jan Reedy

On Tue, May 03, 2016 at 05:46:33PM -0700, Luigi Semenzato wrote: [...]
Yes there is.
There are serious limits to what the compiler can detect at compile-time. So unless you have your heart set on a compile-time SyntaxError (or Warning) it might be best to forget all about compile-time detection, ignore the question of "literals", and just focus on run-time TypeError if a duplicate key is detected. Otherwise, you will have the situation where Python only detects *some* duplicate keys, and your colleague will be cursing that Python does such a poor job of detecting duplicates. And it will probably be implementation-dependent, e.g. if your Python compiler does constant folding it might detect {1+1: None, 2: None} as duplicates, but if it doesn't have constant folding (or you have turned it off), it won't. So with implementation-dependent compile-time checks, you can't even guarantee what will be detected. In that case, you might as well use a linter. As far as I am concerned, a compile-time check is next-to-useless. It will be more annoying than useful, since it will give people a false sense of security, while still missing duplicates. So I intend to only discuss a run-time solution, which has the advantage that Python will detect a much wider range of duplicates: not just: {"SPAM": None, "SPAM": None} for example, but also: {"SPAM": None, ("spa".upper() + "AM"[1:]): None} But now we're getting out of the realm of "detect obvious typos and duplicates" and into a more judgemental area. Once you start prohibiting complex expressions or function calls that happen return duplicate keys, you can no longer be *quite* so sure that this is an error. Maybe the programmer has some reason for allowing duplicates. Not necessarily a *good* reason, but people write all sorts of ugly, bad, fragile, silly code without the compiler giving them an error. "Consenting adults" applies, and Python generally allows us to shoot ourselves in the foot. Let's say I write something like this: with open(path) as f: d = { f.write(a): 'alpha', f.write(b): 'beta', f.write(c): 'gamma', f.write(d): 'delta', } and purely by my bad luck, it turns out that len(b) and len(c) are equal, so that there are two duplicate keys. Personally, I think this is silly code, but there's no rule against silly code in Python, and maybe I have a (good/bad/stupid) reason for writing this. Maybe I don't care that duplicate keys are over-written. If we introduce your rule, that's the same as saying "this code is so awful, that we have to ban it... but only *sometimes*, when the lengths happen to be equal, the rest of the time it's fine". Are you okay with saying that? (Not a rhetorical question, and you are allowed to say "Yes".) So which is worse? - most of the time the code works fine, but sometimes it fails, raising an exception and leaving the file in a half-written state; or - the code always works fine, except that sometimes a duplicate key over-writes the previous value, which I may not even care about. I don't know which is worse. If there is no clear answer that is obviously right, then the status quo wins, even if the status quo is less than perfect. Even if the status quo is *awful*, it may be that all the alternatives are just as bad. I think a compile-time check is just enough to give people a false sense of security, and so is *worse* than what we have now. And I'm right on the fence regarding a run-time check. So to me, my judgement is: with no clear winner, the status quo stays.
With respect, I think that is a harmful position to take. That leaves the job half-done: the constructor will complain about *some* duplicates, but not all, and worse, which ones it detects may depend on the implementation you happen to be using! If it is worth complaining about {0: None, 0: None} then it must also be worth complaining about: {0: None, 0.0: None, int(): None, 1-1: None, len(""): None} etc. Otherwise, I guarantee that somebody will be pulling their hair out as to why Python only sometimes detects duplicate keys. Better to never detect them at all (so you know you have to test for duplicates yourself) than to give a false sense of security. -- Steve

On Wed, May 4, 2016 at 10:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:
My response: Yes. I am okay with saying that. Because the question really is: What is this dictionary being used for? Why are you using keys that might, under some circumstances, overwrite each other? The point of a dict is to retrieve values when given keys. How can you do this, if those keys might collide? I've yet to see any real-world code that does this usefully, in a dict display. And the best "well, maybe if" examples have all involved generated code, so it wouldn't be hard to change your codegen to use this instead: d = dict(( (f.write(a), 'alpha'), (f.write(b), 'beta'), (f.write(c), 'gamma'), (f.write(d), 'delta'), )) which, of course, will continue to function as expected. ChrisA

On 04/05/2016 00:51, Steven D'Aprano wrote:
My initial reaction to the OP was negative, as in most contexts where keys are added to dictionaries, repeated keys silently overwrite, and consistency is a virtue. However, the original use case (a long dict literal - (sorry, transcribe that into 'dict display' if you wish; to me 'dict literal' is an evocative and comprehensible term and I will continue to use it, not meaning to offend)) is a completely plausible one, and it now seems to me that detecting duplicates is a case where 'practicality beats purity'. I agree with everything in Luigi's post of 30-May 2016 22:29 (my local time, sorry). I would just add that as someone who has used a linter (pylint) occasionally - I am trying to discipline myself to use it regularly - even if you are aware of it, there is still a barrier to its use: linters tend to produce a huge amount of mostly useless guff which you have to search to find the few nuggets that you really need. Steven, I am sure that this is not your intention, but it feels as if your requests for clarification are nitpicking and attempts to throw dust in our eyes, and avoiding a serious attempt to address the OP. But let me (presuming,if he will forgive me, to speak for the OP) attempt to answer by making a concrete proposal, as a starting point for discussion: *In a dictionary literal, it should be a syntax error to have two keys which are literal int or literal basestring values and which are equal. *(I say basestring, meaning that { 'lion' : 'x', u'lion' : 'y' } would be an error. So of course would { 18L : 'x', 0x12 : 'y' } etc.) From this starting point, bikeshedding can begin: (1) Is there any other kind of literal (i.e. other than int or string) which should be included? (Float and complex? Decimal?? What have I missed?) (2) Should constant int/string expressions which are folded to constants at compile time also be included i.e. would { 2 : 'x', 1+1 : 'y } also be an error? (Suggestion: This is an implementation detail, undefined. It's irrelevant to the practical case.) (3) Should a runtime error be raised instead of a SyntaxError ? (I can't imagine why, but if it turns out to be easier to implement, fine.) (4) Should a warning be raised instead of an error? (5) Should the action be different if not just the keys, but the values also are constant and equal (i.e. a no-effect repeat), e.g. a warning instead of an error? ( I suggest no. It's still likely to be unintentional, i.e. a mistake that the programmer would want to know about.) (6) Are there any other changes to the proposal which would simplify implementation, while still addressing the original use case? (7) If the proposal is accepted, should it appear complete in the next Python release, or should there be a more gradual adoption process? (8) Can/should anything be done to mitigate breaking automatic code generators which do generate duplicate literal keys? Later: While I was composing the above, 1 more post arrived from Luigi and from Steven. I must reply to two points from Steven: (1) "When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways" No, no, no. Computers detect our errors and we would be helpless if they didn't. If this argument were taken to its logical conclusion, you would say that it is the programmer's responsibility to ensure that an entire (10000-line missile guidance?) program is syntactically correct, and all the compiler need do is either say "OK" or "Syntax Error" (or worse, try to run a syntactically incorrect program). Yes, Luigi's colleague could have made many mistakes, but surely it is a good thing for as many of them as is reasonably possible to be detected by the computer? Then we just have to decide what "reasonably possible" is. (2) "So unless you have a good answer to the question "Why are duplicate keys so special that the dict constructor has to guard against them, when it doesn't guard against all the other failures we have to check for?", I think the status quo should stand. There is one obvious answer: Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them. That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste." Sure, it is a matter of opinion. I ask: In the OP, what is the probability that (1) the programmer deliberately included a duplicate literal key (2) the programmer made a mistake? I concede that the answer may be different for automatic code generators (although if I wrote one that generated duplicate literal keys, I wouldn't be proud of it). Best wishes Rob Cliffe

On Wed, May 4, 2016 at 11:54 AM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
This is the very soonest it can possibly appear, which means that notions of "long integer" and "basestring" aren't really applicable - they're Python 2 concepts. Text strings and byte strings in Python 3 are completely different:
{'lion': 'x', b'lion': 'y'} {'lion': 'x', b'lion': 'y'}
and there aren't any short/long integers. So here's my rewording of your proposal: *** Identical string or numeric literals used as keys in the same dictionary display MAY cause a SyntaxError. *** Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined. Implementations are welcome to not check for this at all if they wish. Valid types for this check would be str, bytes, int, float, and complex - there's no way that the same literal could result in unequal values, so these are all safe. I'm still not sure whether I'm in favour of the proposal or not, but that's how I'd word it. ChrisA

On Tue, May 3, 2016, at 22:22, Chris Angelico wrote:
Whether constants get folded before this check, and whether {1:'x', 1.0:'y'} causes an error, can be implementation-defined.
For *general* constant folding, maybe, but what's the point of having it if it doesn't apply to negative numbers and nonimaginary complex numbers? How about tuple displays consisting of only numeric and string literals, empty tuple displays, and tuple displays which themselves satisfy this definition?

On Wed, May 4, 2016 at 12:44 PM, Random832 <random832@fastmail.com> wrote:
That's why I say implementation-defined. I would generally expect -5 to count as a literal, but if (2+3) or ("a", "b") doesn't, so be it. Of course, I would expect 'a' and "a" to be seen as identical. Likewise 100 and 0x64. Beyond that, it's basically up to whatever the implementation finds easy; as long as the keys *would be* folded, and are literals, it's permissible to raise the error. ChrisA

On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
Did your colleague really have 200+ items in the dict? No matter, I suppose. The same principle applies. When you have significant amount of data in a dict (or any other data structure, such as a list, tree, whatever), the programmer has to take responsibility for the data validation. Not the compiler. Out of all the possible errors, why is "duplicate key" so special? Your colleague could have caused unexpected behaviour in many ways: lives_in = { # missed value 'lion': ['Africa', 'America'], # misspelled value 'parrot': ['Eruope'], # misspelled key 'kangeroo': ['Australia'], # invalid key 'kettle': ['Arctic'], # invalid value 'aardvark': 'South Africa', # missed key # oops, forgot 'tiger' altogether } Where was your colleague's data validation? I'm sorry that your colleague lost a lot of time debugging this failure, but you might have had exactly the same result from any of the above errors. Unless somebody can demonstrate that "duplicate keys" is a systematic and common failure among Python programmers, I think that it is perfectly reasonable to put the onus on detecting duplicates on the programmer, just like all those other data errors. The data validation need not be a big burden. In my own code, unless the dict is so small that I can easily see that it is correct with my own eyes, I always follow it with an assertion: assert len(lives_in) == 250 which is a cheap test for at least some duplicate, missed or extra keys. But depending on your use-case, it may be that dict is the wrong data structure to use, and you need something that will validate items as they are entered. Unless your dict is small enough that you can see it is correct by eye, you need some sort of check that your data is valid, that you haven't forgotten keys, or misspelled them. The dict constructor won't and can't do that for you, so you need to do it youself. Once you're doing that, then it is no extra effort to check for duplicates. So unless you have a good answer to the question "Why are duplicate keys so special that the dict constructor has to guard against them, when it doesn't guard against all the other failures we have to check for?", I think the status quo should stand. There is one obvious answer: Duplicate keys are special because, unlike the other errors, the dict constructor CAN guard against them. That would be a reasonable answer. But I'm not sure if it is special *enough* to justify violating the Zen of Python. That may be a matter of opinion and taste. -- Steve

On 04.05.2016 03:09, Steven D'Aprano wrote:
Issuing a warning for this would probably help, but raising an error introduced a backwards incompatibility which is not warranted IMO, given how seldom such situations occur in practice. For the implementation, there are two possibilities I can think of: 1. have STORE_MAP run a test for an already existing entry and issue a warning 2. add a final CHECK_MAP byte code to check that the number of keys in the mapping correspond to the number of keys added via the dict literal The first one causes a lot of overhead, esp. for larger mappings such as the ones used by the codecs or other static data tables. Most of the times these are generated anyway, so the warning case would not occur at all, but you'd still have to check for a collision N times. The second one is cheap regarding performance, but may not be accurate, since STORE_MAP may well be working on dynamically generated keys. It does work well for constants, and if we're just issuing a warning, may be good enough. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 04 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Wed, May 04, 2016 at 01:56:56PM +0200, M.-A. Lemburg wrote:
I'm not sure I understand what you mean by this. Are you suggesting that if I write: {1: None, 1: None} Python knows that it got two keys as argument, and so CHECK_MAP can determine that there is a duplicate? (i.e. that 2 keys were arguments, but the finished dict only has length 1). But if I write: {x: None, x: None} Python *cannot* tell that there were two keys given (since they are expressions, not constants) and CHECK_MAP can't do anything? -- Steve

On 04.05.2016 14:15, Steven D'Aprano wrote:
Well, the CHECK_MAP byte code would only check whether the number of assignments you are doing is equal to the number of keys found in the final dict. That's a very coarse test and not perfect. It can also not tell you which key causes the mismatch. On the plus side, the extra check is a simply size check and doesn't have to be applied per added mapping. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 04 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Wed, May 04, 2016 at 02:37:56PM +0200, M.-A. Lemburg wrote:
I agree with you that an error is unjustified but a warning might be.
You have suggested that's too expensive, so I'm happy to rule that out.
Sorry MAL, perhaps I'm a bit slow tonight, but I'm still lost. {x: None, x: None} -- would CHECK_MAP detect that as a duplicate or not? If the answer is "Yes", then this seems to me to be a cheap way of detecting the existence of at least one duplicate (but not which key was duplicate). That strikes me as good enough, in which case I would support raising a warning. -- Steve

On 04.05.2016 16:00, Steven D'Aprano wrote:
Yes:
Since 1 != 2 -> issue warning We'd just have to tell the CHECK_MAP byte code to check for len(map) == 2. The byte code compiler will be able to determine this and add the correct byte code entry. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 04 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

After following the long thread, I'm solidly -1, at least on an error. A warning wouldn't be so bad, but would still feel like a nuisance not a feature. Such a warning should belong to linters, not to the language itself. Use those to automatically catch statically detectible probably-bad patterns like duplicate literals as keys of dictionary displays. The thing that nudged me to -1 is code something like the following, which I've actually written for completely valid use-cases: def code_generate(data_source, output=open('data.py','w')): """Write data to loadable Python module data_source may contain multiple occurrences of each key. Later occurrences are "better" than earlier ones """ print("data = {", file=output) for key, val in data_source: print(" %s: %s," % (key, val), file=output) print("}", file=output) I really don't want the Python interpreter to complain about my perfectly valid 'data.py' module that I auto-generated knowing full well it might have duplicate literal keys. Yes, of course I could write a more complex function that explicitly pruned keys if they were duplicates before writing to 'data.py'. But I do not want to have to think about that when a well defined semantics of Python since version 0.9 does not require that extra work. On Wed, May 4, 2016 at 4:56 AM, M.-A. Lemburg <mal@egenix.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, May 5, 2016 at 12:36 AM, David Mertz <mertz@gnosis.cx> wrote:
Or, much more simply, you could just dump it straight out like this: print("data = dict(%r)" % list(data_source), file=output) If the dict display starts rejecting duplicates, these kinds of code generators should be easily tweakable. The only risk would be if duplicates can *theoretically* happen, but are incredibly rare, such that the code runs for years without anyone noticing. This is a way of catching compile-time-detectable errors. Python has generally been pretty helpful with that kind of thing, even to the point of special-casing print-used-as-a-statement to tell you to put parens around it. While I fully understand that there are reasons against doing this (including "that's the job of linters, not the core language"), I do _not_ understand the apparent attitude that it's a fundamentally bad thing to do. Most cases where a dict display is used, it's intended to have unique keys. ChrisA

On Wed, May 4, 2016 at 10:59 AM Chris Angelico <rosuav@gmail.com> wrote:
You accidentally snipped out Mertz' statement that he didn't want the interpreter to complain about "probably-bad" patterns. This implies the distinction that linters should complain about probably-bad things while the interpreter only complains about always-bad things. I guess you could argue that warnings exist so that the interpreter can complain about probably-bad things as well. If that's what you'd like, we shouldn't call it a "compile-time-detectable error" but instead, uh, ... hard to find the right vocabulary here.

All this discussion is certainly very educational for the participants, but I want to remind everyone that this is not going to happen without a PEP. Personally I'm -0 on the whole business. Some quick notes: - In ABC, this was considered, and the rule was that {k1: v1; k2: v2} was allowed to have k1==k2 as long as v1==v2. Otherwise it was an error. (But ABC did not have comprehensions.) - Franklin? brought up sets and suggested the same thing should happen there. I'm not so sure though. When teaching Python it's nice to be able to demonstrate the fundamental property of sets by showing it:
{1, 1, 1} {1}
-- --Guido van Rossum (python.org/~guido)

I honestly don't recall and I assume the docs are vague, but "comparing expressions" was not a thing in ABC. I'm sure they compared the values. (This is why I showed it with variables instead of constants.) On Wed, May 4, 2016 at 8:29 AM, Michael Selik <michael.selik@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 5/3/2016 9:09 PM, Steven D'Aprano wrote:
I often use large literal dicts, with literal string keys. There are many times (a couple times a month) I add a duplicate key because I am prone to making mistakes like that. Now, I am lucky because I purchased an IDE that highlights those duplicates as errors. If I had to rely on a separate linter; with my duplicate keys hiding in a long list of other trivial formatting mistakes, then the time cost of reviewing the linting output on every code change is greater than the time cost of simply debugging the program when I see it inevitably malfunction. Instead, I would use data validation in my own code.
I would not use this check, it is error prone, given the number of times I update the dicts during development. Either I will loose count, or the line math is wrong (because of extra spaces), or I would count the duplicate. A better consistency check requires I construct all literal dicts like: def add_item(dict_, key, value): if key in dict_: raise Exception() dict_[key] = value lives_in = {} add_item(lives_in, 'lion', ['Africa', 'America']) add_item(lives_in, 'parrot', ['Europe']) # ... 100+ more rows here add_item(lives_in, 'lion', ['Europe']) # ... 100+ more rows here Which is inelegant, but necessary to minimize the programmer time wasted trying to detect these duplicate keys manually.

On 5/4/2016 11:45 AM, Chris Angelico wrote:
Thinking about examples [1], it turns out I was mistaken. My IDE does NOT have an *integrated* linter that detects duplicate literal keys! My IDE does detect duplicate literal keys for Javascript, which I obviously got confused. Oh dear! That means my dicts are full of duplicates! https://github.com/klahnakoski/TestLog-ETL/blob/d7e8a99f7532a0551c98dfdccefe...
participants (21)
-
Akira Li
-
Chris Angelico
-
David Mertz
-
Ethan Furman
-
Franklin? Lee
-
Gregory P. Smith
-
Guido van Rossum
-
Jonathan Goble
-
João Santos
-
Kyle Lahnakoski
-
Luigi Semenzato
-
M.-A. Lemburg
-
Michael Selik
-
Michel Desmoulin
-
Nick Coghlan
-
Paul Moore
-
Pavol Lisy
-
Random832
-
Rob Cliffe
-
Steven D'Aprano
-
Terry Reedy