hashkey/digest for a complex object

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Oct 10 11:32:43 EDT 2010


On Sun, 10 Oct 2010 13:49:09 +0000, kj wrote:

>>> to give only two of an infinite number of counter-examples.
> 
> (Infinite???)

I was mistaken. Given Arnaud's specification that we look only at the 
Python 2.x ints, I believe that there is only one counter-example, namely 
-1.

However, in Python 3.x I would be correct. The reasoning is a simple 
application of the pigeon-hole principle. Since hash(n) is limited to 
returning a finite number of distinct results, but there are an infinite 
number of Python 3 ints, it follows that there must be an infinite number 
of collisions.


>>And, in fact,
>>(-1) is the only int such that hash(x) != x.
> 
> Arnaud, how did you determine that -1 is the only such int?  I can't
> imagine any other approach other than a brute-force check of all ints...

Reading the source code is also a good approach.

I don't read C very well, but even I can work out what this is doing:

static long
int_hash(PyIntObject *v)
{
	/* XXX If this is changed, you also need to change the way
	   Python's long, float and complex types are hashed. */
	long x = v -> ob_ival;
	if (x == -1)
		x = -2;
	return x;
}


(from intobject.c in Python 2.6.1). It takes a Python int object, 
extracts the value of it as a C long, returns -2 if the value is -1 
otherwise returns the value.


>  When I tried this I ran into unforeseen limitations in xrange, etc. 
> It's all very doable, but still, it would take at least about 3 hours on
> my laptop.


What are you doing? This takes less than 20 seconds to test up to 2**24:

import time
def test(n):
    t = time.time()
    assert hash(0) == 0
    assert hash(1) == 1
    assert hash(-1) == -2
    for i in xrange(2, n):
        assert hash(i) == i, "failed %d" % i
        assert hash(-i) == -i, "failed %d" % -i
    t = time.time() - t
    return t


>>> test(2**24)
18.076412916183472

Since 2**31 is 2**7 = 128 times bigger, I estimate that testing 
everything up to 2**31 should take less than 45 minutes. And my PC is not 
exactly the fastest machine on the block.



-- 
Steven



More information about the Python-list mailing list