[Python-ideas] dictionary constructor should not allow duplicate keys

Terry Reedy tjreedy at udel.edu
Wed May 4 21:36:05 EDT 2016


On 5/4/2016 12:03 AM, Nick Coghlan wrote:
> On 4 May 2016 at 11:40, Ethan Furman <ethan at stoneleaf.us> wrote:
>> On 05/03/2016 05:23 PM, Steven D'Aprano wrote:
>>> I'm intentionally not giving you the values of a, b or c, or telling
>>> you what spam() returns. Now you have the same information available
>>> to you as the compiler has at compile time. What do you intend to do?
>>
>> Since the dict created by that dict display happens at run time, I am
>> suggesting that during the process of creating that dict that any keys,
>> however generated or retrieved, that are duplicates of keys already in the
>> dict, cause an appropriate error (to be determined).
>
> I was curious as to whether or not it was technically feasible to
> implement this "duplicate keys" check solely for dict displays in
> CPython without impacting other dict use cases, and it turns out it
> should be.
>
> The key point is that BUILD_MAP already has its own PyDict_SetItem()
> loop in ceval.c (iterating over stack entries), and hence doesn't rely
> on the "dict_common_update" code shared by dict.update and the dict
> constructor(s).

Changing only BUILD_MAP would invalidate current code equivalences and 
currently sensible optimizations.  A toy example:

 >>> d1 = {'a': 'a1'}
 >>> d2 = {f(): 'a2'}
 >>> d1.update(d2)
 >>> d1
{'a': 'a2'}

Sensible and comprehensible code transformation rule are important. 
Currently, the following rather trivial optimization of the above works.

 >>> d1 = {'a': 'a1', f(): 'a2'}
 >>> d1
{'a': 'a2'}

The special rule for dict displays would invalidate this.

In my post yesterday in response to Luigi (where I began 'The 
thread...'), I gave 4 equivalent other ways to initialize a dict using a 
Python loop (include a dict comprehension).  Using a dict display 
amounts to un-rolling any of the loops and replacing the Python loop 
with the C loop buried in ceval.  Changing the operation of just that 
loop would break the current equivalence.

I suspect that the proposed change would introduce more bugs than it 
exposes.

-- 
Terry Jan Reedy



More information about the Python-ideas mailing list