hashkey/digest for a complex object
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Oct 10 11:32:43 EDT 2010
On Sun, 10 Oct 2010 13:49:09 +0000, kj wrote:
>>> to give only two of an infinite number of counter-examples.
>
> (Infinite???)
I was mistaken. Given Arnaud's specification that we look only at the
Python 2.x ints, I believe that there is only one counter-example, namely
-1.
However, in Python 3.x I would be correct. The reasoning is a simple
application of the pigeon-hole principle. Since hash(n) is limited to
returning a finite number of distinct results, but there are an infinite
number of Python 3 ints, it follows that there must be an infinite number
of collisions.
>>And, in fact,
>>(-1) is the only int such that hash(x) != x.
>
> Arnaud, how did you determine that -1 is the only such int? I can't
> imagine any other approach other than a brute-force check of all ints...
Reading the source code is also a good approach.
I don't read C very well, but even I can work out what this is doing:
static long
int_hash(PyIntObject *v)
{
/* XXX If this is changed, you also need to change the way
Python's long, float and complex types are hashed. */
long x = v -> ob_ival;
if (x == -1)
x = -2;
return x;
}
(from intobject.c in Python 2.6.1). It takes a Python int object,
extracts the value of it as a C long, returns -2 if the value is -1
otherwise returns the value.
> When I tried this I ran into unforeseen limitations in xrange, etc.
> It's all very doable, but still, it would take at least about 3 hours on
> my laptop.
What are you doing? This takes less than 20 seconds to test up to 2**24:
import time
def test(n):
t = time.time()
assert hash(0) == 0
assert hash(1) == 1
assert hash(-1) == -2
for i in xrange(2, n):
assert hash(i) == i, "failed %d" % i
assert hash(-i) == -i, "failed %d" % -i
t = time.time() - t
return t
>>> test(2**24)
18.076412916183472
Since 2**31 is 2**7 = 128 times bigger, I estimate that testing
everything up to 2**31 should take less than 45 minutes. And my PC is not
exactly the fastest machine on the block.
--
Steven
More information about the Python-list
mailing list