<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 04/05/2016 00:51, Steven D'Aprano wrote:<br>
<blockquote cite="mid:20160503235120.GA12028@ando.pearwood.info"
type="cite">
<pre wrap="">Hi Luigi,
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
[...]
</pre>
<blockquote type="cite">
<pre wrap="">lives_in = { 'lion': ['Africa', 'America'],
'parrot': ['Europe'],
#... 100+ more rows here
'lion': ['Europe'],
#... 100+ more rows here
}
The above constructor overwrites the first 'lion' entry silently,
often causing unexpected behavior.
</pre>
</blockquote>
<pre wrap="">[...]
</pre>
<blockquote type="cite">
<pre wrap="">For context, someone ran into this problem in my team at Google (we
fixed it using pylint). I haven't seen any valid reason (in the bug
or elsewhere) in favor of these constructor semantics. From the
discussions I have seen, it seems just an oversight in the
implementation/specification of dictionary literals. I'd be happy to
hear stronger reasoning in favor of the status quo.
</pre>
</blockquote>
<pre wrap="">
As far as I can see, you haven't specified in *detail* what change you
wish to propose. It is difficult for me to judge your proposal when I
don't know precisely what you are proposing.
Should duplicate keys be a SyntaxError at compile time, or a TypeError
at runtime? Or something else?
What counts as "duplicate keys"? I presume that you mean that two keys
count as duplicate if they hash to the same value, and are equal. But
you keep mentioning "literals" -- does this mean you care more about
whether they look the same rather than are the same?
# duplicate literals, forbidden
d = {100: 1, 100: 2}
# duplicate non-literals, allowed
d = {100: 1, len("ab")*50: 2}
You keep mentioning "dictionary literal", but there actually is no
such thing in Python. I think you mean a dict display. (Don't worry, I
make the same mistake.) But the point is, a "dict literal" (display) can
contain keys which are not themselves literals, as above. Those keys can
have arbitrarily complex semantics, including side-effects. What do you
expect to happen?
</pre>
</blockquote>
My initial reaction to the OP was negative, as in most contexts
where keys are added to dictionaries, repeated keys silently
overwrite, and consistency is a virtue.<br>
However, the original use case (a long dict literal - (sorry,
transcribe that into 'dict display' if you wish; to me 'dict
literal' is an evocative and comprehensible term and I will continue
to use it, not meaning to offend)) is a completely plausible one,
and it now seems to me that detecting duplicates is a case where
'practicality beats purity'. I agree with everything in Luigi's
post of 30-May 2016 22:29 (my local time, sorry). I would just add
that as someone who has used a linter (pylint) occasionally - I am
trying to discipline myself to use it regularly - even if you are
aware of it, there is still a barrier to its use: linters tend to
produce a huge amount of mostly useless guff which you have to
search to find the few nuggets that you really need.<br>
<br>
Steven, I am sure that this is not your intention, but it feels as
if your requests for clarification are nitpicking and attempts to
throw dust in our eyes, and avoiding a serious attempt to address
the OP.<br>
But let me (presuming,if he will forgive me, to speak for the OP)
attempt to answer by making a concrete proposal, as a starting point
for discussion:<br>
<br>
<b>In a dictionary literal, it should be a syntax error to have two
keys which are literal int or literal basestring values and which
are equal.<br>
<br>
</b>(I say basestring, meaning that { 'lion' : 'x', u'lion' : 'y'
} would be an error. So of course would { 18L : 'x', 0x12 :
'y' } etc.)<br>
<br>
From this starting point, bikeshedding can begin:<br>
(1) Is there any other kind of literal (i.e. other than int or
string) which should be included?<br>
(Float and complex? Decimal?? What have I missed?)<br>
(2) Should constant int/string expressions which are folded to
constants at compile time also be included<br>
i.e. would { 2 : 'x', 1+1 : 'y } also be an
error?<br>
(Suggestion: This is an implementation detail,
undefined. It's irrelevant to the practical case.)<br>
(3) Should a runtime error be raised instead of a SyntaxError ?<br>
(I can't imagine why, but if it turns out to be easier to
implement, fine.)<br>
(4) Should a warning be raised instead of an error?<br>
(5) Should the action be different if not just the keys, but the
values also are constant and equal (i.e. a no-effect repeat),<br>
e.g. a warning instead of an error?<br>
( I suggest no. It's still likely to be unintentional,
i.e. a mistake that the programmer would want to know about.)<br>
(6) Are there any other changes to the proposal which would
simplify implementation, while still addressing the original use
case?<br>
(7) If the proposal is accepted, should it appear complete in
the next Python release, or should there be a more gradual adoption
process?<br>
(8) Can/should anything be done to mitigate breaking automatic
code generators which do generate duplicate literal keys?<br>
<br>
Later:<br>
While I was composing the above, 1 more post arrived from Luigi and
from Steven.<br>
I must reply to two points from Steven:<br>
<br>
(1) "When you have significant amount of data in a dict (or any
other data structure, such as a list, tree, whatever), the
programmer has to take responsibility for the data validation. Not
the compiler. Out of all the possible errors, why is "duplicate key"
so special? Your colleague could have caused unexpected behaviour in
many ways"<br>
<br>
No, no, no. Computers detect our errors and we would be helpless if
they didn't. If this argument were taken to its logical conclusion,
you would say that it is the programmer's responsibility to ensure
that an entire (10000-line missile guidance?) program is
syntactically correct, and all the compiler need do is either say
"OK" or "Syntax Error" (or worse, try to run a syntactically
incorrect program). Yes, Luigi's colleague could have made many
mistakes, but surely it is a good thing for as many of them as is
reasonably possible to be detected by the computer? Then we just
have to decide what "reasonably possible" is.<br>
<br>
(2) "So unless you have a good answer to the question "Why are
duplicate keys so special that the dict constructor has to guard
against them, when it doesn't guard against all the other failures
we have to check for?", I think the status quo should stand.
There is one obvious answer:
Duplicate keys are special because, unlike the other errors, the
dict constructor CAN guard against them.
That would be a reasonable answer. But I'm not sure if it is special
<b class="moz-txt-star"><span class="moz-txt-tag">*</span>enough<span
class="moz-txt-tag">*</span></b> to justify violating the Zen of
Python. That may be a matter of opinion and taste."<br>
<pre wrap="">Sure, it is a matter of opinion. I ask: In the OP, what is the probability that (1) the programmer deliberately included a duplicate literal key (2) the programmer made a mistake?
I concede that the answer may be different for automatic code generators (although if I wrote one that generated duplicate literal keys, I wouldn't be proud of it).
</pre>
Best wishes<br>
Rob Cliffe<br>
</body>
</html>