keying by identity in dict and set
Steve White
stevan.white at gmail.com
Sat Oct 19 07:31:24 EDT 2019
Hi,
I have an application that would benefit from object instances
distinguished by identity being used in dict's and set's. To do this,
the __hash__ method must be overridden, the obvious return value being
the instance's id.
This works flawlessly in extensive tests on several platforms, and on
a couple of different Python versions and implementations.
The documentation seems to preclude a second requirement, however.
I also want to use the == operator on these objects to mean a natural
comparison of values, different from identity, so that two instances
comparing equivalent does not imply that they are identical.
But the documentation for __hash__ has:
"The only required property is that objects which compare equal
have the same hash value"
Yet it seems something more is going on, because as near as I can
tell, it just works, perfectly, every time, despite this requirement.
If the the __hash__ method returns the object's id, the __eq__ method
of the *never* called when the object is added to a dict. (See
examples below.)
It would appear that if __hash__ returns the id, then that id is used
internally as the key, and since the id is by definition unique, no
key collision ever occurs -- at least in every Python implementation
I've tried. It also seems that, for a class instance obj,
hash( hash( obj ) ) == hash( obj )
hash( id( obj ) ) == id( obj )
These are very strong and useful properties. Where are they documented?
It looks like the existing Python built-in containers do exactly what
I need, but the documentation suggests that they may not.
Taking the documentation at face value, I'm in the position of
choosing between abandoning the natural use of operator '==', or of
introducing an unfamiliar container implementation. As my package is
intended for the use by other people, neither option is attractive.
Questions:
Is all this apparent behaviour documented somewhere that I have missed?
Is there some fatal reason that this approach must never be used,
besides the lack of documentary support for it?
Is it in fact safe to assume that __eq__ is never called under these
circumstances?
===============================================================
from __future__ import print_function
from sys import stdout
class A( object ): # minimal hashable object
def __init__( self, v1, v2 ):
self.v = ( v1, v2 )
def __eq__( self, b ):
#return self.v[0] == b.v[0] and self.v[1] == b.v[1]
raise Exception( "CALLED __eq__" )
def __hash__( self ): # instances distinguished by identity
return id( self )
# It was suggested that set and dict internally use some representation
# of keys that contains less information than the id() value returned by
# __hash__, thus causing key collisions, which are resolved by calls to
# __eq__.
# In that case, one would expect __eq__ to be called eventually if enough
# objects were added to a set. I don't see that, though.
NINSTANCES = 3000000 # play with this number -- carefully!
STATUS_INTERVAL = 100000
def test():
""" hammer the set algorithms """
s = set()
instances = []
for i in range( 0, NINSTANCES ):
p = A( 1, 0 )
s.add( p )
instances.append( p )
if not i % STATUS_INTERVAL:
stdout.write( str( i // STATUS_INTERVAL ) + " " )
stdout.flush()
stdout.write( "\n" )
print( "length of set", len( s ) )
print( "number of instances", len( instances ) )
for i in instances:
if not i in s:
print( "INSTANCE DROPPED OUT!" )
test()
More information about the Python-list
mailing list