Planning a Python Course for Beginners
Marko Rauhamaa
marko at pacujo.net
Thu Aug 10 09:31:56 EDT 2017
Peter Otten <__peter__ at web.de>:
> Steven D'Aprano wrote:
>> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
>>
>>> Good point! A very good __hash__() implementation is:
>>>
>>> def __hash__(self):
>>> return id(self)
>>>
>>> In fact, I didn't know Python (kinda) did this by default already. I
>>> can't find that information in the definition of object.__hash__():
>>
>>
>> Hmmm... using id() as the hash would be a terrible hash function.
id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id().
> It's actually id(self) >> 4 (almost, see C code below), to account for
> memory alignment.
Memory alignment makes no practical difference. It it is any good, the
internal implementation will further scramble and scale the returned
hash value. For example:
index = hash(obj) % prime_table_size
>> would fall into similar buckets if they were created at similar
>> times, regardless of their value, rather than being well distributed.
>
> If that were the problem it wouldn't be solved by the current approach:
It is not a problem. Hash values don't need to be well distributed, they
simply need to be discerning to tiny differences in equality.
>>>> sample = [object() for _ in range(10)]
>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
> [1, 1, 1, 1, 1, 1, 1, 1, 1]
Nice demo :-)
Marko
More information about the Python-list
mailing list