[Python-Dev] Saving the hash value of tuples

Guido van Rossum guido at python.org
Sun Apr 2 19:38:04 CEST 2006


On 4/1/06, Noam Raphael <noamraph at gmail.com> wrote:
> I've found out that the hash value of tuples isn't saved after it's
> calculated. With strings it's different: the hash value of a string is
> calculated only on the first call to hash(string), and saved in the
> structure for future use. Saving the value makes dict lookup of tuples
> an operation with an amortized cost of O(1).

Have you done any measurements to confirm that this makes any
difference? I'm not particularly enamored of theoretical
optimizations. This one in particular has a definite cost in space
which needs to be weighed seriously.

> Saving the hash value means that if an item of the tuple changes its
> hash value, the hash value of the tuple won't be changed. I think it's
> ok, since:
>   1. Hash value of things shouldn't change.
>   2. Dicts assume that too.

Sure. (But you do realize that if a tuple contains a mutable value its
hash value raises an exception, right?)

> I tried the change, and it turned out that I had to change cPickle a
> tiny bit: it uses a 2-tuple which is allocated when the module
> initializes to lookup tuples in a dict. I changed it to properly use
> PyTuple_New and Py_DECREF, and now the complete test suite passes. I
> run test_cpickle before the change and after it, and it took the same
> time (0.89 seconds on my computer).

Not just cPickle. I believe enumerate() also reuses a tuple.

> What do you think? I see three possibilities:
>   1. Nothing should be done, everything is as it should be.
>   2. The cPickle module should be changed to not abuse the tuple, but
> there's no reason to add an extra word to the tuple structure and
> break binary backwards compatibility.
>   3. Both should be changed.

I'm -1 on the change. Tuples are pretty fundamental in Python and
hashing them is relatively rare. I think the extra required space for
all tuples isn't worth the potential savings for some cases.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list