[Python-ideas] dict.hash - optimized per module

Sun Oct 17 11:41:34 CEST 2010

On Sun, 17 Oct 2010 06:27:37 pm Jan Koprowski wrote:

>   After watching I made graph, using presented at conference library
> dictinfo, for __builtin__.__dict__.
>   When I saw few collisions I think "Why this module doesn't have
> their own hashing function implementation which allow to avoid
> collision in this set of names?". 

Python 2.6 has 143 builtin names, and zero collisions:

>>> hashes = {}
>>> import __builtin__
>>> for name in __builtin__.__dict__:
...     h = hash(name)
...     if h in hashes: print "Collision for", name
...     L = hashes.setdefault(h, [])
...     L.append(name)
...
>>> len(hashes)
143
>>> filter(lambda x: len(x) > 1, hashes.values())
[]
>>> next(hashes.iteritems())
(29257728, ['bytearray'])

> My second think was "Why each 
> Python module doesn't have their own internal hashing function which
> doesn't produce collisions in scope of his names".

Firstly, the occasional collision doesn't matter much.

Secondly, your idea would mean that every module would need it's own 
custom-made hash function. Writing good hash functions is hard. The 
Python hash function is very, very good. Expecting developers to 
produce *dozens* of hash functions equally as good is totally 
impractical.

-- 
Steven D'Aprano