dictionary keys, __hash__, __cmp__

Jan-Erik Meyer-Lütgens python at meyer-luetgens.de
Wed Nov 5 05:55:43 EST 2003


Miika Keskinen wrote:
> On Tue, 04 Nov 2003 21:52:42 +0100, Jan-Erik Meyer-Lütgens wrote:
>  
>>In the Python Language Reference, I found the following statements about
>>using objects as dictionary keys:
>>
>>    1. "__hash__() should return a 32-bit integer."
>>
>>    2. "The only required property is that objects which
>>        compare equal have the same hash value."
>>
>>    3. "If a class does not define a __cmp__() method it
>>        should not define a __hash__() operation either."
>>
>>
>>Can I asume that:
>> 
>>  -- keys are interchangeable (equivalent),
>>     if the following is valid:
>>
>>         hash(key1) == hash(key2) and key1 == key2
> 
> Yes. note that key1 == key2 implies hash(key1) == hash(key2)

... only if I define __cmp__() and __hash__() appropriate
(and not breaking the 2nd rule)


>>  -- I can ignore the 2nd statement, if I am aware of
>>     the fact that: if objects are equal it dosn't mean that they are
>>     the same key.
> 
> 
> So you're introducing scenario where different objects are considered
> equal  in means of __cmp__ while having different hash. I think that's not
> normal.
> 
That's the point. I know that this in not normal. But is it possible?

>>  -- I can savely ignore the 3rd statement, because python
>>     falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.
> 
> 
> Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
> __hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
> NOT in third rule. I'm sure there is situations where it's perfectly
> normal to use id-value's and custom hashes. Anyways you can redefine
> __cmp__() simply to (and thus avoiding to break against third rule):
> 
> def __cmp__(self, other):
> 	return id(self).__cmp__(id(other))
> 
> 

But I want __cmp__() for another comparison. As an real world example
I have a file object which should be stored in a dictionary. __cmp__()
compare the contents of two files. Thus I must define the __hash__()
method. I use id(obj) as the hash function.

So I've breaking the rule: Objects which compare equal have the same hash.

I use the file objects as dictionary keys and it seems to work.
My assumption is that keys are interchangeable (equivalent), if
hash(key1) == hash(key2) and key1 == key2. In my example keys are
equivalent, when they are identical.
id(file_object1) == id(file_object2) and file1 have the same contents
as file2.

Is this assumption valid? Or is there some sophistication in the
python implemention, that break things sometimes? Resulting in
deletion of important files :-)

-- 
Jan-Erik





More information about the Python-list mailing list