hash issues [WAS] Re: [Tutor] hash()ing a list

jfouhy at paradise.net.nz jfouhy at paradise.net.nz
Tue Mar 29 10:14:36 CEST 2005

Quoting Brian van den Broek <bvande at po-box.mcgill.ca>:

> I had thought lookup was by hash value, and thus expected the access 
> to some_dict to cause troubles. Yet it worked. Is it that lookup is by 
> hash value, and then equality if need be so as to settle ambiguity, or 
> have I completely misunderstood the mechanism of dictionary lookup?

Hash functions are rarely perfect (in terms of mapping every input to a
different output) [1], so they have to deal with collisions.

AFAIK, the hash table lookup algorithm basically works like this:

    def lookup(self, key):
        hits = self.internalDataStructure[key]
        for key_, value in hits:
            if key_ == key:
                return value
        raise KeyError, 'Key not found'.

(except that it will be written in C and there may be advanced trickery involved
to improve efficiency in other ways)


[1] The pigeonhole principle [2] tells us that it is rarely possible to make
perfect hash functions.  For example, there are more possible strings than there
are integers [3,4], so, in principle, you could find arbitrarily many strings
all with the same hash value.  In that case, a dictionary with them as keys
would be O(n), not O(1).  But a good hash function will make that very unlikely.
[2] http://en.wikipedia.org/wiki/Pigeonhole_principle
[3] Because every integer n has a corresponding string "n", but there are
strings which do not (directly) correspond to an integer (eg, "foo").
[4] Unless we allow integers to go to infinity... But computers generally do not
have infinite memory :-)

More information about the Tutor mailing list