Planning a Python Course for Beginners
Peter Otten
__peter__ at web.de
Thu Aug 10 05:00:54 EDT 2017
Steven D'Aprano wrote:
> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
>
>> Good point! A very good __hash__() implementation is:
>>
>> def __hash__(self):
>> return id(self)
>>
>> In fact, I didn't know Python (kinda) did this by default already. I
>> can't find that information in the definition of object.__hash__():
>
>
> Hmmm... using id() as the hash would be a terrible hash function. Objects
It's actually id(self) >> 4 (almost, see C code below), to account for
memory alignment.
>>> obj = object()
>>> hex(id(obj))
'0x7f1f058070b0'
>>> hex(hash(obj))
'0x7f1f058070b'
>>> sample = (object() for _ in range(10))
>>> all(id(obj) >> 4 == hash(obj) for obj in sample)
True
> would fall into similar buckets if they were created at similar times,
> regardless of their value, rather than being well distributed.
If that were the problem it wouldn't be solved by the current approach:
>>> sample = [object() for _ in range(10)]
>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
[1, 1, 1, 1, 1, 1, 1, 1, 1]
Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}
More information about the Python-list
mailing list