[Python-ideas] dictionary constructor should not allow duplicate keys
Steven D'Aprano
steve at pearwood.info
Wed May 4 08:08:39 EDT 2016
On Tue, May 03, 2016 at 05:46:33PM -0700, Luigi Semenzato wrote:
[...]
> > Should duplicate keys be a SyntaxError at compile time, or a TypeError
> > at runtime? Or something else?
>
> Is there such a thing as a SyntaxWarning?
Yes there is.
> From my perspective it
> would be fine to make it a SyntaxError, but I am not sure it would be
> overall a good choice for legacy code (i.e. as an error it might break
> old code, and I don't know how many other things a new language
> specification is going to break).
>
> It could also be a run-time error, but it might be nicer to detect it
> earlier. Maybe both.
There are serious limits to what the compiler can detect at
compile-time. So unless you have your heart set on a compile-time
SyntaxError (or Warning) it might be best to forget all about
compile-time detection, ignore the question of "literals", and just
focus on run-time TypeError if a duplicate key is detected.
Otherwise, you will have the situation where Python only detects *some*
duplicate keys, and your colleague will be cursing that Python does such
a poor job of detecting duplicates. And it will probably be
implementation-dependent, e.g. if your Python compiler does constant
folding it might detect {1+1: None, 2: None} as duplicates, but if it
doesn't have constant folding (or you have turned it off), it won't.
So with implementation-dependent compile-time checks, you can't even
guarantee what will be detected. In that case, you might as well use a
linter.
As far as I am concerned, a compile-time check is next-to-useless. It
will be more annoying than useful, since it will give people a false
sense of security, while still missing duplicates.
So I intend to only discuss a run-time solution, which has the advantage
that Python will detect a much wider range of duplicates: not just:
{"SPAM": None, "SPAM": None}
for example, but also:
{"SPAM": None, ("spa".upper() + "AM"[1:]): None}
But now we're getting out of the realm of "detect obvious typos and
duplicates" and into a more judgemental area. Once you start prohibiting
complex expressions or function calls that happen return duplicate keys,
you can no longer be *quite* so sure that this is an error.
Maybe the programmer has some reason for allowing duplicates. Not
necessarily a *good* reason, but people write all sorts of ugly, bad,
fragile, silly code without the compiler giving them an error.
"Consenting adults" applies, and Python generally allows us to shoot
ourselves in the foot.
Let's say I write something like this:
with open(path) as f:
d = {
f.write(a): 'alpha',
f.write(b): 'beta',
f.write(c): 'gamma',
f.write(d): 'delta',
}
and purely by my bad luck, it turns out that len(b) and len(c) are
equal, so that there are two duplicate keys. Personally, I think this is
silly code, but there's no rule against silly code in Python, and maybe
I have a (good/bad/stupid) reason for writing this. Maybe I don't care
that duplicate keys are over-written.
If we introduce your rule, that's the same as saying "this code is
so awful, that we have to ban it... but only *sometimes*, when the
lengths happen to be equal, the rest of the time it's fine".
Are you okay with saying that? (Not a rhetorical question, and you are
allowed to say "Yes".)
So which is worse?
- most of the time the code works fine, but sometimes it fails,
raising an exception and leaving the file in a half-written state; or
- the code always works fine, except that sometimes a duplicate key
over-writes the previous value, which I may not even care about.
I don't know which is worse. If there is no clear answer that is
obviously right, then the status quo wins, even if the status quo is
less than perfect.
Even if the status quo is *awful*, it may be that all the alternatives
are just as bad.
I think a compile-time check is just enough to give people a false sense
of security, and so is *worse* than what we have now. And I'm right on
the fence regarding a run-time check.
So to me, my judgement is: with no clear winner, the status quo stays.
> > What counts as "duplicate keys"? I presume that you mean that two keys
> > count as duplicate if they hash to the same value, and are equal. But
> > you keep mentioning "literals" -- does this mean you care more about
> > whether they look the same rather than are the same?
>
> Correct. The errors that I am guessing matter the most are those for
> which folks copy-paste a key-value pair, where the key is a literal
> string, intending to change the key, and then forget to change it.
With respect, I think that is a harmful position to take. That leaves
the job half-done: the constructor will complain about *some*
duplicates, but not all, and worse, which ones it detects may depend on
the implementation you happen to be using!
If it is worth complaining about
{0: None, 0: None}
then it must also be worth complaining about:
{0: None, 0.0: None, int(): None, 1-1: None, len(""): None}
etc. Otherwise, I guarantee that somebody will be pulling their hair
out as to why Python only sometimes detects duplicate keys. Better to
never detect them at all (so you know you have to test for duplicates
yourself) than to give a false sense of security.
--
Steve
More information about the Python-ideas
mailing list