[Tutor] hash()ing a list
Orri Ganel
singingxduck at gmail.com
Mon Mar 28 01:12:54 CEST 2005
So, any class that has 'rich comparison methods' defined is
unhashable? What gives? (See '[Tutor] unhashable objects')
On Sun, 27 Mar 2005 12:38:09 -0800 (PST), Danny Yoo
<dyoo at hkn.eecs.berkeley.edu> wrote:
>
>
> On Sun, 27 Mar 2005, Orri Ganel wrote:
>
> > While I do not have a pressing need to hash a list, I am curious as to
> > why, if lists are unhashable, there is a __hash__() method in the list
> > class, which also does not work on lists, but results in a 'TypeError:
> > list objects are unhashable'. What's the point of including a
> > __hash__() method in the list class if lists are unhashable?
>
> [warning: my post is long. Sorry!]
>
> Hi Orri,
>
> This is an interesting question! Let's check something:
>
> ######
> >>> [].__hash__()
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: list objects are unhashable
> ######
>
> Yes, lists have a __hash__() function that isn't really meant to be
> called.
>
> Let's imagine that our list object didn't have a __hash__() function. If
> we try our expermient again, we can imagine that we might get something
> like this:
>
> ###### (thought experiment) --- not real Python!
> >>> [].__hash__()
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AttributeError: 'list' object has no attribute '__hash__'
> ######
>
> The error message here, in our thought experiment, is slightly less
> informative: rather than knowing that lists aren't meant to be hashed, we
> can imagine that we'd get a more generic AttributeError about lists not
> having a __hash__() method.
>
> So one possible explanation is that lists have a __hash__() to make sure
> we get good error messages when we misuse them. *grin*
>
> I have to admit that I'm making all this up as a story: I'm not sure if
> "better error message" is the real reason that lists have a customized
> __hash__(), although, from looking at the list implementation, this seems
> reasonable.
>
> [some time passes with Danny fiddling with CPython]
>
> From my explorations below, my best story is that all Python objects must
> have a __hash__(), so list.__hash__ exists so that it errors out
> deliberately and early. I kept running notes of what I was trying, so if
> you're interested, read below.
>
> [The material below is for folks who are familiar with C, and goes into
> this in more rambling depth. Skip this if it starts looking weird.
> *grin*]
>
> In the C implementation of hash(), the implementation in Object.c appears
> to do the following:
>
> 1. Check to see if the object implements its own __hash__.
> 2. If it doesn't, check to see if it's allowed to just use the
> pointer address as a hash number --- basically check if the object
> type doesn't define an __eq__.
> 3. Finally, throw up its hands and just say "TypeError: unhashable type".
>
> (See PyObject_Hash() in Objects/object.c for the details)
>
> From Objects/typeobject.c, it appears that all the base types in Python
> get a default implementation of __hash__ under a certain set of
> conditions:
>
> /******/
> if (type->tp_flags & base->tp_flags & Py_TPFLAGS_HAVE_RICHCOMPARE)
> {
> if (type->tp_compare == NULL &&
> type->tp_richcompare == NULL &&
> type->tp_hash == NULL)
> {
> type->tp_compare = base->tp_compare;
> type->tp_richcompare = base->tp_richcompare;
> type->tp_hash = base->tp_hash;
> }
> }
> /******/
>
> but because the Python list type does define its own rich comparison
> operators, no default hash function should be set.
>
> If I go ahead and munge up listobject.c so that it no longer has a
> tp_hash:
>
> /******/
> mumak:~/Desktop/downloads/Python-2.4/Objects dyoo$ !diff
> diff -u listobject.c listobject.c2
> --- listobject.c Sun Mar 27 12:12:11 2005
> +++ listobject.c2 Sun Mar 27 12:11:52 2005
> @@ -2660,7 +2660,8 @@
> 0, /* tp_as_number */
> &list_as_sequence, /* tp_as_sequence */
> &list_as_mapping, /* tp_as_mapping */
> - list_nohash, /* tp_hash */
> + 0, /* tp_hash */
> + /* list_nohash, */ /* tp_hash */
> 0, /* tp_call */
> 0, /* tp_str */
> PyObject_GenericGetAttr, /* tp_getattro */
> /******/
>
> and recompile the interpreter, I'd expect that we'd end up with an object
> and who would respond to a hash() call with a "TypeError: unhashable type"
> error. Let's check that:
>
> ###### Warning: locally munged Python interpreter
> ######
> mumak:~/local/Python-2.4 dyoo$ bin/python
> Python 2.4 (#1, Mar 27 2005, 12:18:24)
> [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> [].__hash__
> <method-wrapper object at 0x38ac30>
> >>> [].__hash__()
> 3720536
> >>> hash([])
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: unhashable type
> ######
>
> Weird. *grin* Ok, some of that was expected, and some of that was very
> unexpected. There must be some code path in the interpreter that still
> defines a default __hash__() somewhere, even if an implementation isn't
> given. But at least we do get an 'unhashable type' error if we use the
> builtin hash() function.
>
> So my best guess for Orri's question is: better error message. *grin*
>
> Anyway, I hope this helps!
>
>
--
Email: singingxduck AT gmail DOT com
AIM: singingxduck
Programming Python for the fun of it.
More information about the Tutor
mailing list