[Python-Dev] More compact dictionaries with faster iteration

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Dec 12 21:37:08 CET 2012

On 12/12/2012 01:15 AM, Nick Coghlan wrote:
> On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov at microsoft.com
> <mailto:dinov at microsoft.com>> wrote:
>     OTOH changing certain dictionaries in IronPython (such as keyword
>     args) to be
>     ordered would certainly be possible.  Personally I just wouldn't
>     want to see it
>     be the default as that seems like unnecessary overhead when the
>     specialized
>     class exists.
> Which reminds me, I was going to note that one of the main gains with
> ordered keyword arguments, is their use in the construction of
> string-keyed objects where you want to be able to control the order of
> iteration (e.g. for serialisation or display purposes). Currently you
> have to go the path of something like namedtuple where you define the
> order of iteration in one operation, and set the values in another.

So here's a brand new argument against ordered dicts: The existence of 
perfect hashing schemes. They fundamentally conflict with ordered dicts.

I played with using them for vtable dispatches in Cython this summer, 
and they can perform really, really well for branch-predicted lookups in 
hot loops, because you always/nearly always eliminate linear probing and 
so there's no branch misses or extra comparisons. (The overhead of a 
perfect hash table lookup over a traditional vtable lookup was only a 
couple of cycles in my highly artificial fully branch-predicted 

There's some overhead in setup; IIRC, ~20 microseconds for 64 elements, 
2 GHz CPU, though that was a first prototype implementation and both 
algorithmic improvements and tuning should be possible.

So it's not useful for everything, but perhaps for things like module 
dictionaries and classes an "optionally perfect" dict can make sense.

Note: I'm NOT suggesting the use of perfect hashing, just making sure 
it's existence is mentioned and that one is aware that if always-ordered 
dicts become the language standard it precludes this option far off in 
the future.

(Something like a sort() method could still work and make the dict 
"unperfect"; one could also have a pack() method that made the dict 
perfect again.).

That concludes the on-topic parts of my post.

Dag Sverre Seljebotn


Going off-topic for those who are interested, here's the longwinded and 
ugly details. My code [1] is based on the paper [2]  (psuedo-code in 
Appendix A), but I adapted it a bit to be useful for tens/hundreds of 
elements rather than billions.

The ingredients:

  1) You need the hash to be 32 bits (or 64) of good entropy (md5 or 
murmurhash or similar). (Yes, that's a tall order for CPython, I'm just 
describing the scheme.) (If the hash collides on all bits you *will* 
collide, so some fallback is still necesarry, just unlikely.)

  2) To lookup, the idea is (psuedo-code!)

typedef struct {
     int m_f m_g, r, k;
     int16_t d[k]; /* "small" int, like current proposal */
} table_header_t;

And then one computes index of an element with hash "h" using the function

((h >> tab->r) & tab->m_f) ^ tab->d[h & tab->m_g]

rather than the usual "h % n". While more arithmetic, arithmetic is 
cheap and branch misses are not.

  3) To set up/repack a table one needs to find the parameters. The 
general idea is:

  a) Partition the hashes into k bins by using "h & m_g". There will be 
collisions, but the number of bins with many collisions will be very 
small; most bins will have 2 or 1 or 0 elements.

  b) Starting with the largest bin, distribute the elements according to
the hash function. If a bin collides with the existing contents, try 
another value for d[binindex] until it doesn't.

The r parameter let's you try again 32 (or 64) times to find a solution. 
In my testcases there was ~0.1% chance of not finding a solution (that 
is, exhausting possible choices of r) with 64-bit hashes with 4 or 8 
elements and no empty table elements. For any other number of elements, 
or with some empty elements, the chance of failure was much lower.)

[1] It's not exactly a great demo, but it contains the algorithm. If 
there's much interest I should clean it up and make a proper benchmark 
demo out of it:


[2] Pagh (1999)


More information about the Python-Dev mailing list