[Python-ideas] dictionary constructor should not allow duplicate keys

Tue May 3 21:54:39 EDT 2016

On 04/05/2016 00:51, Steven D'Aprano wrote:
> Hi Luigi,
>
> On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
>
> [...]
>> lives_in = { 'lion': ['Africa', 'America'],
>>               'parrot': ['Europe'],
>>               #... 100+ more rows here
>>               'lion': ['Europe'],
>>               #... 100+ more rows here
>>             }
>>
>> The above constructor overwrites the first 'lion' entry silently,
>> often causing unexpected behavior.
> [...]
>> For context, someone ran into this problem in my team at Google (we
>> fixed it using pylint).  I haven't seen any valid reason (in the bug
>> or elsewhere) in favor of these constructor semantics.  From the
>> discussions I have seen, it seems just an oversight in the
>> implementation/specification of dictionary literals.  I'd be happy to
>> hear stronger reasoning in favor of the status quo.
> As far as I can see, you haven't specified in *detail* what change you
> wish to propose. It is difficult for me to judge your proposal when I
> don't know precisely what you are proposing.
>
> Should duplicate keys be a SyntaxError at compile time, or a TypeError
> at runtime? Or something else?
>
> What counts as "duplicate keys"?  I presume that you mean that two keys
> count as duplicate if they hash to the same value, and are equal. But
> you keep mentioning "literals" -- does this mean you care more about
> whether they look the same rather than are the same?
>
> # duplicate literals, forbidden
> d = {100: 1, 100: 2}
>
> # duplicate non-literals, allowed
> d = {100: 1, len("ab")*50: 2}
>
>
> You keep mentioning "dictionary literal", but there actually is no
> such thing in Python. I think you mean a dict display. (Don't worry, I
> make the same mistake.) But the point is, a "dict literal" (display) can
> contain keys which are not themselves literals, as above. Those keys can
> have arbitrarily complex semantics, including side-effects. What do you
> expect to happen?
>
My initial reaction to the OP was negative, as in most contexts where 
keys are added to dictionaries, repeated keys silently overwrite, and 
consistency is a virtue.
However, the original use case (a long dict literal - (sorry, transcribe 
that into 'dict display' if you wish; to me 'dict literal' is an 
evocative and comprehensible term and I will continue to use it, not 
meaning to offend)) is a completely plausible one, and it now seems to 
me that detecting duplicates is a case where 'practicality beats 
purity'.  I agree with everything in Luigi's post of 30-May 2016 22:29 
(my local time, sorry).  I would just add that as someone who has used a 
linter (pylint) occasionally - I am trying to discipline myself to use 
it regularly - even if you are aware of it, there is still a barrier to 
its use: linters tend to produce a huge amount of mostly useless guff 
which you have to search to find the few nuggets that you really need.

Steven, I am sure that this is not your intention, but it feels as if 
your requests for clarification are nitpicking and attempts to throw 
dust in our eyes, and avoiding a serious attempt to address the OP.
But let me (presuming,if he will forgive me, to speak for the OP) 
attempt to answer by making a concrete proposal, as a starting point for 
discussion:

*In a dictionary literal, it should be a syntax error to have two keys 
which are literal int or literal basestring values and which are equal.

*(I say basestring, meaning that    { 'lion' : 'x', u'lion' : 'y' }    
would be an error.  So of course would    { 18L : 'x', 0x12 : 'y' }    etc.)

 From this starting point, bikeshedding can begin:
     (1) Is there any other kind of literal (i.e. other than int or 
string) which should be included?
           (Float and complex?  Decimal??  What have I missed?)
     (2) Should constant int/string expressions which are folded to 
constants at compile time also be included
                 i.e. would    { 2 : 'x', 1+1 : 'y }    also be an error?
            (Suggestion: This is an implementation detail, undefined.  
It's irrelevant to the practical case.)
     (3) Should a runtime error be raised instead of a SyntaxError ?
          (I can't imagine why, but if it turns out to be easier to 
implement, fine.)
     (4) Should a warning be raised instead of an error?
     (5) Should the action be different if not just the keys, but the 
values also are constant and equal (i.e. a no-effect repeat),
           e.g. a warning instead of an error?
          ( I suggest no.  It's still likely to be unintentional, i.e. a 
mistake that the programmer would want to know about.)
     (6) Are there any other changes to the proposal which would 
simplify implementation, while still addressing the original use case?
     (7) If the proposal is accepted, should it appear complete in the 
next Python release, or should there be a more gradual adoption process?
     (8) Can/should  anything be done to mitigate breaking automatic 
code generators which do generate duplicate literal keys?

Later:
While I was composing the above, 1 more post arrived from Luigi and from 
Steven.
I must reply to two points from Steven:

(1) "When you have significant amount of data in a dict (or any other 
data structure, such as a list, tree, whatever), the programmer has to 
take responsibility for the data validation. Not the compiler. Out of 
all the possible errors, why is "duplicate key" so special? Your 
colleague could have caused unexpected behaviour in many ways"

No, no, no.  Computers detect our errors and we would be helpless if 
they didn't.  If this argument were taken to its logical conclusion, you 
would say that it is the programmer's responsibility to ensure that an 
entire (10000-line missile guidance?) program is syntactically correct, 
and all the compiler need do is either say "OK" or "Syntax Error" (or 
worse, try to run a syntactically incorrect program).  Yes, Luigi's 
colleague could have made many mistakes, but surely it is a good thing 
for as many of them as is reasonably possible to be detected by the 
computer?  Then we just have to decide what "reasonably possible" is.

(2) "So unless you have a good answer to the question "Why are duplicate 
keys so special that the dict constructor has to guard against them, 
when it doesn't guard against all the other failures we have to check 
for?", I think the status quo should stand. There is one obvious answer: 
Duplicate keys are special because, unlike the other errors, the dict 
constructor CAN guard against them. That would be a reasonable answer. 
But I'm not sure if it is special *enough* to justify violating the Zen 
of Python. That may be a matter of opinion and taste."

Sure, it is a matter of opinion.  I ask: In the OP, what is the probability that (1) the programmer deliberately included a duplicate literal key (2) the programmer made a mistake?
I concede that the answer may be different for automatic code generators (although if I wrote one that generated duplicate literal keys, I wouldn't be proud of it).

Best wishes
Rob Cliffe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160504/3edebfe1/attachment-0001.html>