[Python-Dev] Saving the hash value of tuples

Noam Raphael noamraph at gmail.com
Sat Apr 1 22:32:19 CEST 2006


Hello,

I've found out that the hash value of tuples isn't saved after it's
calculated. With strings it's different: the hash value of a string is
calculated only on the first call to hash(string), and saved in the
structure for future use. Saving the value makes dict lookup of tuples
an operation with an amortized cost of O(1).

Saving the hash value means that if an item of the tuple changes its
hash value, the hash value of the tuple won't be changed. I think it's
ok, since:
  1. Hash value of things shouldn't change.
  2. Dicts assume that too.

I tried the change, and it turned out that I had to change cPickle a
tiny bit: it uses a 2-tuple which is allocated when the module
initializes to lookup tuples in a dict. I changed it to properly use
PyTuple_New and Py_DECREF, and now the complete test suite passes. I
run test_cpickle before the change and after it, and it took the same
time (0.89 seconds on my computer).

What do you think? I see three possibilities:
  1. Nothing should be done, everything is as it should be.
  2. The cPickle module should be changed to not abuse the tuple, but
there's no reason to add an extra word to the tuple structure and
break binary backwards compatibility.
  3. Both should be changed.

I will be happy to send a patch, if someone shows interest.

Have a good day,
Noam


More information about the Python-Dev mailing list