[Python-ideas] dictionary constructor should not allow duplicate keys

Tue May 3 21:40:32 EDT 2016

On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
> On Tue, May 03, 2016 at 02:27:16PM -0700, Ethan Furman wrote:
>> On 05/03/2016 01:43 PM, Michael Selik wrote:
>>> On Tue, May 3, 2016 at 4:00 PM Ethan Furman wrote:
>>>> Which seems irrelevant to your argument: a duplicate key is a duplicate
>>>> key whether it's 123 or 'xyz'.
>>>
>>> If an expression includes an impure function, then the duplication of
>>> assignment to that key may have a desirable side-effect.
>>
>> I'm willing to say that should be done with an existing dict, not in a
>> literal.
>
> But are you willing to say that the compiler should enforce that
> stylistic judgement?
>
>
>>> How would you handle an expression that evaluates differently for each
>>> call? For example:
>>>
>>>      {random(): 0, random(): 1}
>>
>> Easy:  Don't Do That.  ;)
>
> I see your wink, so I presume you're not actually suggesting that the
> compiler (1) prohibit all function calls in dict displays, or (2)
> hard-code the function name "random" in a black list.

Indeed.

> So what are you actually suggesting? Michael is raising a good point
> here. If you don't like random as an example, how about:
>
> d = {spam(a): 'a', spam(b): 'BB', spam(c): 'Ccc'}
 >
 > I'm intentionally not giving you the values of a, b or c, or telling
 > you what spam() returns. Now you have the same information available
 > to you as the compiler has at compile time. What do you intend to do?

Since the dict created by that dict display happens at run time, I am 
suggesting that during the process of creating that dict that any keys, 
however generated or retrieved, that are duplicates of keys already in 
the dict, cause an appropriate error (to be determined).

Also, any dict display that is able to be used for dict creation at 
compile time (thereby avoiding the run-time logic) should have the same 
checks and raise the same error if duplicate keys are found (I imagine 
both run-time and compile-time dict creation from dict displays would 
use the same underlying function).

> It's one thing to say "duplicate keys should be prohibited", and another
> to come up with a detailed explanation of what precisely should happen.

Hopefully my explanation is detail enough.

>>> Let me flip the original request: I'd like to hear stronger arguments for
>>> change, please. I'd be particularly interested in hearing how often
>>> Pylint has caught this mistake.
>>
>> Well, when this happened to me I spent a loooonnnnnngggg time figuring
>> out what the problem is.
>
> Okay, but how often does this happen? Daily? Weekly? Once per career?

Once a year, maybe.  The experience is painful enough to remember when 
the subject arises, but not painful enough to remember to be careful.  ;)

> What's a loooonnnnnngggg time? Twenty minutes? Twenty man-weeks?

Longer than an hour, shorter than a day.

>> One just doesn't expect duplicate keys to not raise:
>>
>> --> dict(one=1, one='uno')
>>    File "<stdin>", line 1
>> SyntaxError: keyword argument repeated
>
> There's a difference though. Keyword argument *names* must be
> identifiers, not expressions, and duplicates can be recognised by the
> compiler at compile-time:

I am largely unconcerned by the run-time/compile-time distinction in 
this case -- whenever it happens, check that the incoming key doesn't 
already exist.

> If you think of it as:
>
> for key,item in initial_values:
>      if key in self:
>          raise TypeError('duplicate key')
>      else:
>          self[key] = item
>
>
> then you have to deal with the fact that you might only notice a
> duplicate after mutating your dict, which may include side-effects.

Not sure what you mean by "mutating your dict" -- we're still talking 
about initial "dict display" to "dict" conversion, yes?

--
~Ethan~