hash issues [WAS] Re: [Tutor] hash()ing a list
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Tue Mar 29 10:37:31 CEST 2005
> *Almost* all ints are fixed points for the hashing function in the
> sense that hash(some_int) == some_int. Almost all as:
>
> >>> hash(-1)
> -2
>
> Any idea why -1 is the sole exception?
[warning: beginners, skip this. Completely inconsequential CPython detail
ahead.]
Hi Brian,
Yeah, I remember this from studying the Python C implementation. -1 is
used internally in the CPython implementation to represent that the hash
function failed in some bad way. It is a pure CPython implementation
detail, and is completely not important. *grin*
There's a really brief mention of this in 'Objects/object.c' in the
comment:
/******/
/* Set of hash utility functions to help maintaining the invariant that
if a==b then hash(a)==hash(b)
All the utility functions (_Py_Hash*()) return "-1" to signify an
error.
*/
/******/
In terms of "essential" versus "accidental" properties, this is an
"accidental" one that we shouldn't worry too much about.
> I had thought lookup was by hash value, and thus expected the access to
> some_dict to cause troubles. Yet it worked. Is it that lookup is by hash
> value, and then equality if need be so as to settle ambiguity,
Yes, exactly! That's why equality and hashing codes are tied together in
this kind of "if a==b then hash(a)==hash(b)" contract: there's bound to be
"collisions" in any hashing scheme. So we have to do something about
collisions. Equality is meant to settle things when two keys have the
same hash code.
I just did a quick Google search to see if someone else has written about
hash table theory, and trusty Wikipedia came up with a very nice article:
http://en.wikipedia.org/wiki/Hash_table
If you have more questions, please feel free to ask. Best of wishes to
you!
More information about the Tutor
mailing list