[Python-Dev] More compact dictionaries with faster iteration

Thu Dec 13 06:11:23 CET 2012

On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> Perfect hashing already separates "hash table" from "contents" (sort of),
> and saves the memory in much the same way.
>
> Yes, you can repeat the trick and have 2 levels of indirection, but that
> then requires an additional table of small ints which is pure overhead
> present for the sorting; in short, it's no longer an optimization but just
> overhead for the sortability.

I'm confused.  I understood your algorithm to require repacking,
rather than it being a suitable algorithm for incremental change to an
existing dictionary.  ISTM that that would mean you still pay some
sort of overhead (either in time or space) while the dictionary is
still being mutated.

Also, I'm not sure how 2 levels of indirection come into it.   The
algorithm you describe has, as I understand it, only up to 12
perturbation values ("bins"), so it's a constant space overhead, not a
variable one.  What's more, you can possibly avoid the extra memory
access by using a different perfect hashing algorithm, at the cost of
a slower optimization step or using a little more memory.

> Note: I'm NOT suggesting the use of perfect hashing, just making
> sure it's existence is mentioned and that one is aware that if
> always-ordered dicts become the language standard it precludes
> this option far off in the future.

Not really.  It means that some forms of perfect hashing might require
adding a few more ints worth of overhead for the dictionaries that use
it.  If it's really a performance benefit for very-frequently-used
dictionaries, that might still be worthwhile.