[Python-Dev] More compact dictionaries with faster iteration

Wed Dec 12 23:09:37 CET 2012

On 12/12/2012 11:06 PM, Dag Sverre Seljebotn wrote:
> On 12/12/2012 10:31 PM, PJ Eby wrote:
>> On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 12/12/2012 01:15 AM, Nick Coghlan wrote:
>>>>
>>>> On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov at microsoft.com
>>>> <mailto:dinov at microsoft.com>> wrote:
>>>>
>>>>      OTOH changing certain dictionaries in IronPython (such as keyword
>>>>      args) to be
>>>>      ordered would certainly be possible.  Personally I just wouldn't
>>>>      want to see it
>>>>      be the default as that seems like unnecessary overhead when the
>>>>      specialized
>>>>      class exists.
>>>>
>>>>
>>>> Which reminds me, I was going to note that one of the main gains with
>>>> ordered keyword arguments, is their use in the construction of
>>>> string-keyed objects where you want to be able to control the order of
>>>> iteration (e.g. for serialisation or display purposes). Currently you
>>>> have to go the path of something like namedtuple where you define the
>>>> order of iteration in one operation, and set the values in another.
>>>
>>>
>>> So here's a brand new argument against ordered dicts: The existence of
>>> perfect hashing schemes. They fundamentally conflict with ordered dicts.
>>
>> If I understand your explanation, then they don't conflict with the
>> type of ordering described in this thread.  Raymond's optimization
>> separates the "hash table" part from the "contents" part of a
>> dictionary, and there is no requirement that these two parts be in the
>> same size or the same order.
>
> I don't fully agree.
>
> Perfect hashing already separates "hash table" from "contents" (sort
> of), and saves the memory in much the same way.

This was a bit inaccurate, but the point is: The perfect hash function 
doesn't need any fill-in to avoid collisions, you can (except in 
exceptional circumstances) fill the table 100%, so the memory is already 
saved.

Dag Sverre

>
> Yes, you can repeat the trick and have 2 levels of indirection, but that
> then requires an additional table of small ints which is pure overhead
> present for the sorting; in short, it's no longer an optimization but
> just overhead for the sortability.
>
> Dag Sverre
>
>>
>> Indeed, Raymond's split design lets you re-parameterize the hashing
>> all you want, without perturbing the iteration order at all.  You
>> would in fact be able to take a dictionary at any moment, and say,
>> "optimize the 'hash table' part to a non-colliding state based on the
>> current contents", without touching the 'contents' part at all.
>>
>> (One could do this at class creation time on a class dictionary, and
>> just after importing on a module dictionary, for example.)
>>
>