[Python-Dev] Algoritmic Complexity Attack on Python

Tim Peters tim.one@comcast.net
Sat, 31 May 2003 14:50:01 -0400


[Jeff Epler]
> Is there at least a guarantee that the hashing algorithm won't change
> in a bugfix release?

Guido said "yes" weakly, but the issue hasn't come up in recent times.  In
the past we've changed the hash functions for at least strings, tuples, and
floats, based on systematic weaknesses uncovered by real-life ordinary data.
OTOH, there's a requirement that, for objects of types that can be used as
dict keys, two objects that compare equal must deliver equal hash codes, so
people creating (mostly) number-like or (not sure if anyone does this)
string-like types have to duplicate the hash codes Python delivers for
builtin numbers and strings that compare equal to objects of their types.
For example, the author of a Rational class should arrange for

    hash(Rational(42, 1))

to deliver the same result as

    hash(42) == hash(42L) == hash(42.0) == hash(complex(42.0, 0.0))

Such code would break if we changed the int/long/float/complex hashes for
inputs that compare equal to integers.

Tedious exercise for the reader:  find a set of bad datetime objects in 2.3
("bad" in the sense of their hash codes colliding; on a box where hash()
returns a 32-bit int, there must be collisions, since datetime objects have
way more than 32 independent bits of state).