[Tutor] hash()ing a list

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Sun Mar 27 22:38:09 CEST 2005



On Sun, 27 Mar 2005, Orri Ganel wrote:

> While I do not have a pressing need to hash a list, I am curious as to
> why, if lists are unhashable, there is a __hash__() method in the list
> class, which also does not work on lists, but results in a 'TypeError:
> list objects are unhashable'.  What's the point of including a
> __hash__() method in the list class if lists are unhashable?

[warning: my post is long.  Sorry!]

Hi Orri,

This is an interesting question!  Let's check something:

######
>>> [].__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: list objects are unhashable
######

Yes, lists have a __hash__() function that isn't really meant to be
called.


Let's imagine that our list object didn't have a __hash__() function.  If
we try our expermient again, we can imagine that we might get something
like this:

###### (thought experiment) --- not real Python!
>>> [].__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'list' object has no attribute '__hash__'
######

The error message here, in our thought experiment, is slightly less
informative: rather than knowing that lists aren't meant to be hashed, we
can imagine that we'd get a more generic AttributeError about lists not
having a __hash__() method.

So one possible explanation is that lists have a __hash__() to make sure
we get good error messages when we misuse them.  *grin*


I have to admit that I'm making all this up as a story: I'm not sure if
"better error message" is the real reason that lists have a customized
__hash__(), although, from looking at the list implementation, this seems
reasonable.

[some time passes with Danny fiddling with CPython]

>From my explorations below, my best story is that all Python objects must
have a __hash__(), so list.__hash__ exists so that it errors out
deliberately and early.  I kept running notes of what I was trying, so if
you're interested, read below.



[The material below is for folks who are familiar with C, and goes into
this in more rambling depth.  Skip this if it starts looking weird.
*grin*]

In the C implementation of hash(), the implementation in Object.c appears
to do the following:

    1.  Check to see if the object implements its own __hash__.
    2.  If it doesn't, check to see if it's allowed to just use the
        pointer address as a hash number --- basically check if the object
        type doesn't define an __eq__.
    3.  Finally, throw up its hands and just say "TypeError: unhashable type".

(See PyObject_Hash() in Objects/object.c for the details)

>From Objects/typeobject.c, it appears that all the base types in Python
get a default implementation of __hash__ under a certain set of
conditions:

/******/
        if (type->tp_flags & base->tp_flags & Py_TPFLAGS_HAVE_RICHCOMPARE)
{
                if (type->tp_compare == NULL &&
                    type->tp_richcompare == NULL &&
                    type->tp_hash == NULL)
                {
                        type->tp_compare = base->tp_compare;
                        type->tp_richcompare = base->tp_richcompare;
                        type->tp_hash = base->tp_hash;
                }
        }
/******/

but because the Python list type does define its own rich comparison
operators, no default hash function should be set.

If I go ahead and munge up listobject.c so that it no longer has a
tp_hash:

/******/
mumak:~/Desktop/downloads/Python-2.4/Objects dyoo$ !diff
diff -u listobject.c listobject.c2
--- listobject.c        Sun Mar 27 12:12:11 2005
+++ listobject.c2       Sun Mar 27 12:11:52 2005
@@ -2660,7 +2660,8 @@
        0,                                      /* tp_as_number */
        &list_as_sequence,                      /* tp_as_sequence */
        &list_as_mapping,                       /* tp_as_mapping */
-       list_nohash,                            /* tp_hash */
+       0,                                      /* tp_hash */
+       /*      list_nohash, */                 /* tp_hash */
        0,                                      /* tp_call */
        0,                                      /* tp_str */
        PyObject_GenericGetAttr,                /* tp_getattro */
/******/

and recompile the interpreter, I'd expect that we'd end up with an object
and who would respond to a hash() call with a "TypeError: unhashable type"
error.  Let's check that:

###### Warning: locally munged Python interpreter
######
mumak:~/local/Python-2.4 dyoo$ bin/python
Python 2.4 (#1, Mar 27 2005, 12:18:24)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> [].__hash__
<method-wrapper object at 0x38ac30>
>>> [].__hash__()
3720536
>>> hash([])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: unhashable type
######

Weird.  *grin* Ok, some of that was expected, and some of that was very
unexpected.  There must be some code path in the interpreter that still
defines a default __hash__() somewhere, even if an implementation isn't
given.  But at least we do get an 'unhashable type' error if we use the
builtin hash() function.

So my best guess for Orri's question is: better error message.  *grin*


Anyway, I hope this helps!



More information about the Tutor mailing list