hash issues [WAS] Re: [Tutor] hash()ing a list

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Tue Mar 29 10:37:31 CEST 2005


> *Almost* all ints are fixed points for the hashing function in the
> sense that hash(some_int) == some_int. Almost all as:
>
>  >>> hash(-1)
> -2
>
> Any idea why -1 is the sole exception?

[warning: beginners, skip this.  Completely inconsequential CPython detail
ahead.]

Hi Brian,

Yeah, I remember this from studying the Python C implementation.  -1 is
used internally in the CPython implementation to represent that the hash
function failed in some bad way.  It is a pure CPython implementation
detail, and is completely not important. *grin*

There's a really brief mention of this in 'Objects/object.c' in the
comment:

/******/
/* Set of hash utility functions to help maintaining the invariant that
        if a==b then hash(a)==hash(b)

   All the utility functions (_Py_Hash*()) return "-1" to signify an
error.
*/
/******/

In terms of "essential" versus "accidental" properties, this is an
"accidental" one that we shouldn't worry too much about.


> I had thought lookup was by hash value, and thus expected the access to
> some_dict to cause troubles. Yet it worked. Is it that lookup is by hash
> value, and then equality if need be so as to settle ambiguity,

Yes, exactly!  That's why equality and hashing codes are tied together in
this kind of "if a==b then hash(a)==hash(b)" contract: there's bound to be
"collisions" in any hashing scheme.  So we have to do something about
collisions.  Equality is meant to settle things when two keys have the
same hash code.

I just did a quick Google search to see if someone else has written about
hash table theory, and trusty Wikipedia came up with a very nice article:

    http://en.wikipedia.org/wiki/Hash_table

If you have more questions, please feel free to ask.  Best of wishes to
you!



More information about the Tutor mailing list