[Python-ideas] dictionary constructor should not allow duplicate keys

Wed May 4 22:38:05 EDT 2016


On 05/05/2016 02:36, Terry Reedy wrote:
> On 5/4/2016 12:03 AM, Nick Coghlan wrote:
>> On 4 May 2016 at 11:40, Ethan Furman <ethan at stoneleaf.us> wrote:
>>> On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
>>>> I'm intentionally not giving you the values of a, b or c, or telling
>>>> you what spam() returns. Now you have the same information available
>>>> to you as the compiler has at compile time. What do you intend to do?
>>>
>>> Since the dict created by that dict display happens at run time, I am
>>> suggesting that during the process of creating that dict that any keys,
>>> however generated or retrieved, that are duplicates of keys already 
>>> in the
>>> dict, cause an appropriate error (to be determined).
>>
>> I was curious as to whether or not it was technically feasible to
>> implement this "duplicate keys" check solely for dict displays in
>> CPython without impacting other dict use cases, and it turns out it
>> should be.
>>
>> The key point is that BUILD_MAP already has its own PyDict_SetItem()
>> loop in ceval.c (iterating over stack entries), and hence doesn't rely
>> on the "dict_common_update" code shared by dict.update and the dict
>> constructor(s).
>
> Changing only BUILD_MAP would invalidate current code equivalences and 
> currently sensible optimizations.  A toy example:
>
> >>> d1 = {'a': 'a1'}
> >>> d2 = {f(): 'a2'}
> >>> d1.update(d2)
> >>> d1
> {'a': 'a2'}
>
> Sensible and comprehensible code transformation rule are important. 
> Currently, the following rather trivial optimization of the above works.
>
> >>> d1 = {'a': 'a1', f(): 'a2'}
> >>> d1
> {'a': 'a2'}
>
> The special rule for dict displays would invalidate this.
>
> In my post yesterday in response to Luigi (where I began 'The 
> thread...'), I gave 4 equivalent other ways to initialize a dict using 
> a Python loop (include a dict comprehension).  Using a dict display 
> amounts to un-rolling any of the loops and replacing the Python loop 
> with the C loop buried in ceval.  Changing the operation of just that 
> loop would break the current equivalence.
>
> I suspect that the proposed change would introduce more bugs than it 
> exposes.
The OP mentioned (even if he didn't explicitly produce it, 
understandable if it was very long) a real-life use case where a 
warning/error would have aided debugging.
I find this case realistic.
Can anyone produce a single real-life use case where repeated literal 
keys needed to be accepted without a warning or error?
Here I'm throwing down the gauntlet to those who theorise about what 
(breakable) code *might* be out in the wild, and asking them to produce 
just one real-life case.
Rob Cliffe