[Python-ideas] dictionary constructor should not allow duplicate keys
Steven D'Aprano
steve at pearwood.info
Tue May 3 21:09:20 EDT 2016
On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:
> The original problem description:
>
> lives_in = { 'lion': ['Africa', 'America'],
> 'parrot': ['Europe'],
> #... 100+ more rows here
> 'lion': ['Europe'],
> #... 100+ more rows here
> }
>
> The above constructor overwrites the first 'lion' entry silently,
> often causing unexpected behavior.
Did your colleague really have 200+ items in the dict? No matter, I
suppose. The same principle applies.
When you have significant amount of data in a dict (or any other data
structure, such as a list, tree, whatever), the programmer has to take
responsibility for the data validation. Not the compiler. Out of all the
possible errors, why is "duplicate key" so special? Your colleague could
have caused unexpected behaviour in many ways:
lives_in = { # missed value
'lion': ['Africa', 'America'],
# misspelled value
'parrot': ['Eruope'],
# misspelled key
'kangeroo': ['Australia'],
# invalid key
'kettle': ['Arctic'],
# invalid value
'aardvark': 'South Africa',
# missed key
# oops, forgot 'tiger' altogether
}
Where was your colleague's data validation? I'm sorry that your
colleague lost a lot of time debugging this failure, but you might have
had exactly the same result from any of the above errors.
Unless somebody can demonstrate that "duplicate keys" is a systematic
and common failure among Python programmers, I think that it is
perfectly reasonable to put the onus on detecting duplicates on the
programmer, just like all those other data errors.
The data validation need not be a big burden. In my own code, unless the
dict is so small that I can easily see that it is correct with my own
eyes, I always follow it with an assertion:
assert len(lives_in) == 250
which is a cheap test for at least some duplicate, missed or extra keys.
But depending on your use-case, it may be that dict is the wrong data
structure to use, and you need something that will validate items as
they are entered. Unless your dict is small enough that you can see it
is correct by eye, you need some sort of check that your data is valid,
that you haven't forgotten keys, or misspelled them.
The dict constructor won't and can't do that for you, so you need to do
it youself. Once you're doing that, then it is no extra effort to check
for duplicates.
So unless you have a good answer to the question "Why are duplicate keys
so special that the dict constructor has to guard against them, when it
doesn't guard against all the other failures we have to check for?", I
think the status quo should stand.
There is one obvious answer:
Duplicate keys are special because, unlike the other errors, the dict
constructor CAN guard against them.
That would be a reasonable answer. But I'm not sure if it is special
*enough* to justify violating the Zen of Python. That may be a matter of
opinion and taste.
--
Steve
More information about the Python-ideas
mailing list