hash issues [WAS] Re: [Tutor] hash()ing a list

Brian van den Broek bvande at po-box.mcgill.ca
Tue Mar 29 10:59:16 CEST 2005


jfouhy at paradise.net.nz said unto the world upon 2005-03-29 03:14:
> Quoting Brian van den Broek <bvande at po-box.mcgill.ca>:
> 
> 
>>I had thought lookup was by hash value, and thus expected the access 
>>to some_dict to cause troubles. Yet it worked. Is it that lookup is by 
>>hash value, and then equality if need be so as to settle ambiguity, or 
>>have I completely misunderstood the mechanism of dictionary lookup?
> 
> 
> Hash functions are rarely perfect (in terms of mapping every input to a
> different output) [1], so they have to deal with collisions.
> 
> AFAIK, the hash table lookup algorithm basically works like this:
> 
>     def lookup(self, key):
>         hits = self.internalDataStructure[key]
>         for key_, value in hits:
>             if key_ == key:
>                 return value
>         raise KeyError, 'Key not found'.
> 
> (except that it will be written in C and there may be advanced trickery involved
> to improve efficiency in other ways)
> 
> HTH!


Thanks. The code snip puts into code what I meant by asking if "lookup 
is by hash value, and then equality if need be so as to settle 
ambiguity", but the code is clearer than the prose ;-)


> [1] The pigeonhole principle [2] tells us that it is rarely possible to make
> perfect hash functions.  For example, there are more possible strings than there
> are integers [3,4], so, in principle, you could find arbitrarily many strings
> all with the same hash value.  In that case, a dictionary with them as keys
> would be O(n), not O(1).  But a good hash function will make that very unlikely.
> [2] http://en.wikipedia.org/wiki/Pigeonhole_principle
> [3] Because every integer n has a corresponding string "n", but there are
> strings which do not (directly) correspond to an integer (eg, "foo").
> [4] Unless we allow integers to go to infinity... But computers generally do not
> have infinite memory :-)


I just about spat a beverage all over my keyboard when I read:

> there are more possible strings than there are integers

until I read the footnote and realized that you meant ints not 
integers (or, integer in the python sense, rather than in the 
mathematical sense).

I got tempted to start droning on about why I wanted to spit, but 
there is a limit to how off topic one can go ;-)  (For the curious or 
the insomniacs, due to some cool results in set-theory, if we had 
infinitely many integers in our computers, we could get a perfect hash 
function, not just for the strings or the integers, but for all 
recursively specifiable Python objects, in the same single hash function.)

Thanks for the reply,

Brian vdB



More information about the Tutor mailing list