Hash of None varies per-machine

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri Apr 3 21:02:53 EDT 2009


On Fri, 03 Apr 2009 10:50:08 -0700, ben.taylor wrote:

> 1. Is it correct that if you hash two things that are not equal they
> might give you the same hash value? 

Absolutely. From help(hash):

hash(...)
    hash(object) -> integer

    Return a hash value for the object.  Two objects with the same 
    value have the same hash value.  The reverse is not necessarily 
    true, but likely.


This is the pigeon-hole principle. On my PC, hash() returns a 32-bit 
integer between -2147483648 and 2147483647, so there are 2**32 unique 
hash values (the pigeon-holes). Presumably on 64-bit versions of Python, 
hash() will return a 64-bit result, giving 2**64 unique hash values. But 
there are an infinite number of possible objects which can be hashed (the 
pigeons), and so you have to have more than one pigeon per pigeon-hole.


> Like, for instance, None and the
> number 261862182320 (which is what my machine gives me if I hash None).
> Note this is just an example, I'm aware hashing integers is probably
> daft.

Hashing has a very important role in Python, and you will be using it 
very often behind the scenes. You couldn't use integers as keys in 
dictionaries if they couldn't be hashed: another name for a dict is a 
"hash table".


> I'm guessing that's fine, since you can't hash something to a
> number without colliding with that number (or at least without hashing
> the number to something else, like hashing every number to itself * 2,
> which would then mean you couldn't hash very large numbers)

You can hash numbers no matter how big they are.

>>> hash(float('inf'))
314159
>>> hash(2**999)
128



> 2. Should the hash of None vary per-machine? I can't think why you'd
> write code that would rely on the value of the hash of None, but you
> might I guess.

The value of hash(None) appears to be the value of id(None), which means 
it is the memory address that None happens to get, which means it will 
depend on the precise order that Python allocates things when it starts 
up, which will vary from platform to platform and version to version.



> 3. Given that presumably not all things can be hashed (since the
> documentation description of hash() says it gives you the hash of the
> object "if it can be hashed"), should None be hashable?

Any object can be hashed if it has a working __hash__ method. There's no 
reason not to have None hashable -- it costs nothing and allows you to 
use None as a dict key.



-- 
Steven



More information about the Python-list mailing list