[Python-ideas] dictionary constructor should not allow duplicate keys

Nick Coghlan ncoghlan at gmail.com
Wed May 4 00:03:59 EDT 2016


On 4 May 2016 at 11:40, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
>> I'm intentionally not giving you the values of a, b or c, or telling
>> you what spam() returns. Now you have the same information available
>> to you as the compiler has at compile time. What do you intend to do?
>
> Since the dict created by that dict display happens at run time, I am
> suggesting that during the process of creating that dict that any keys,
> however generated or retrieved, that are duplicates of keys already in the
> dict, cause an appropriate error (to be determined).

I was curious as to whether or not it was technically feasible to
implement this "duplicate keys" check solely for dict displays in
CPython without impacting other dict use cases, and it turns out it
should be.

The key point is that BUILD_MAP already has its own PyDict_SetItem()
loop in ceval.c (iterating over stack entries), and hence doesn't rely
on the "dict_common_update" code shared by dict.update and the dict
constructor(s).

As such, the technical change being requested here (at least for
CPython), is that BUILD_MAP be updated to do a PyDict_Contains() check
prior to each key insertion, and:

1. Report a DeprecationWarning in 3.6
2. Report a [RuntimeError? ValueError?] in 3.7+

For code generators emitting dict displays, running their content
through OrderedDict (or an equivalent if the code generator is written
in something other than Python) prior to writing it out will be
sufficient to eliminate duplicates.

Since PyDict_Contains() is O(1), this shouldn't introduce a major
slowdown in dict display evaluation (although the impact should still
be measued prior to making any changes).

All-in-all, the concept gets a +0 from me, although to allow for
implementations that don't share CPython's distinction between dict
displays and the normal dict constructor, any such behaviour (if
introduced) should likely be flagged as an implementation detail that
may vary across interpreters (as I would have been -1 if BUILD_MAP's
implementation wasn't already independent of dict_common_update).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list