[Tutor] hash()ing a list

Orri Ganel singingxduck at gmail.com
Mon Mar 28 01:12:54 CEST 2005


So, any class that has 'rich comparison methods' defined is
unhashable? What gives? (See '[Tutor] unhashable objects')


On Sun, 27 Mar 2005 12:38:09 -0800 (PST), Danny Yoo
<dyoo at hkn.eecs.berkeley.edu> wrote:
> 
> 
> On Sun, 27 Mar 2005, Orri Ganel wrote:
> 
> > While I do not have a pressing need to hash a list, I am curious as to
> > why, if lists are unhashable, there is a __hash__() method in the list
> > class, which also does not work on lists, but results in a 'TypeError:
> > list objects are unhashable'.  What's the point of including a
> > __hash__() method in the list class if lists are unhashable?
> 
> [warning: my post is long.  Sorry!]
> 
> Hi Orri,
> 
> This is an interesting question!  Let's check something:
> 
> ######
> >>> [].__hash__()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: list objects are unhashable
> ######
> 
> Yes, lists have a __hash__() function that isn't really meant to be
> called.
> 
> Let's imagine that our list object didn't have a __hash__() function.  If
> we try our expermient again, we can imagine that we might get something
> like this:
> 
> ###### (thought experiment) --- not real Python!
> >>> [].__hash__()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: 'list' object has no attribute '__hash__'
> ######
> 
> The error message here, in our thought experiment, is slightly less
> informative: rather than knowing that lists aren't meant to be hashed, we
> can imagine that we'd get a more generic AttributeError about lists not
> having a __hash__() method.
> 
> So one possible explanation is that lists have a __hash__() to make sure
> we get good error messages when we misuse them.  *grin*
> 
> I have to admit that I'm making all this up as a story: I'm not sure if
> "better error message" is the real reason that lists have a customized
> __hash__(), although, from looking at the list implementation, this seems
> reasonable.
> 
> [some time passes with Danny fiddling with CPython]
> 
> From my explorations below, my best story is that all Python objects must
> have a __hash__(), so list.__hash__ exists so that it errors out
> deliberately and early.  I kept running notes of what I was trying, so if
> you're interested, read below.
> 
> [The material below is for folks who are familiar with C, and goes into
> this in more rambling depth.  Skip this if it starts looking weird.
> *grin*]
> 
> In the C implementation of hash(), the implementation in Object.c appears
> to do the following:
> 
>     1.  Check to see if the object implements its own __hash__.
>     2.  If it doesn't, check to see if it's allowed to just use the
>         pointer address as a hash number --- basically check if the object
>         type doesn't define an __eq__.
>     3.  Finally, throw up its hands and just say "TypeError: unhashable type".
> 
> (See PyObject_Hash() in Objects/object.c for the details)
> 
> From Objects/typeobject.c, it appears that all the base types in Python
> get a default implementation of __hash__ under a certain set of
> conditions:
> 
> /******/
>         if (type->tp_flags & base->tp_flags & Py_TPFLAGS_HAVE_RICHCOMPARE)
> {
>                 if (type->tp_compare == NULL &&
>                     type->tp_richcompare == NULL &&
>                     type->tp_hash == NULL)
>                 {
>                         type->tp_compare = base->tp_compare;
>                         type->tp_richcompare = base->tp_richcompare;
>                         type->tp_hash = base->tp_hash;
>                 }
>         }
> /******/
> 
> but because the Python list type does define its own rich comparison
> operators, no default hash function should be set.
> 
> If I go ahead and munge up listobject.c so that it no longer has a
> tp_hash:
> 
> /******/
> mumak:~/Desktop/downloads/Python-2.4/Objects dyoo$ !diff
> diff -u listobject.c listobject.c2
> --- listobject.c        Sun Mar 27 12:12:11 2005
> +++ listobject.c2       Sun Mar 27 12:11:52 2005
> @@ -2660,7 +2660,8 @@
>         0,                                      /* tp_as_number */
>         &list_as_sequence,                      /* tp_as_sequence */
>         &list_as_mapping,                       /* tp_as_mapping */
> -       list_nohash,                            /* tp_hash */
> +       0,                                      /* tp_hash */
> +       /*      list_nohash, */                 /* tp_hash */
>         0,                                      /* tp_call */
>         0,                                      /* tp_str */
>         PyObject_GenericGetAttr,                /* tp_getattro */
> /******/
> 
> and recompile the interpreter, I'd expect that we'd end up with an object
> and who would respond to a hash() call with a "TypeError: unhashable type"
> error.  Let's check that:
> 
> ###### Warning: locally munged Python interpreter
> ######
> mumak:~/local/Python-2.4 dyoo$ bin/python
> Python 2.4 (#1, Mar 27 2005, 12:18:24)
> [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> [].__hash__
> <method-wrapper object at 0x38ac30>
> >>> [].__hash__()
> 3720536
> >>> hash([])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: unhashable type
> ######
> 
> Weird.  *grin* Ok, some of that was expected, and some of that was very
> unexpected.  There must be some code path in the interpreter that still
> defines a default __hash__() somewhere, even if an implementation isn't
> given.  But at least we do get an 'unhashable type' error if we use the
> builtin hash() function.
> 
> So my best guess for Orri's question is: better error message.  *grin*
> 
> Anyway, I hope this helps!
> 
> 


-- 
Email: singingxduck AT gmail DOT com
AIM: singingxduck
Programming Python for the fun of it.


More information about the Tutor mailing list