[Python-Dev] Hashing proposal: change only string-only dicts
"Martin v. Löwis"
martin at v.loewis.de
Tue Jan 17 21:59:28 CET 2012
I'd like to propose a different approach to seeding the string hashes:
only do so for dictionaries involving only strings, and leave the
tp_hash slot of strings unchanged.
Each string would get two hashes: the "public" hash, which is constant
across runs and bugfix releases, and the dict-hash, which is only used
by the dictionary implementation, and only if all keys to the dict are
strings. In order to allow caching of the hash, all dicts should use
the same hash (if caching wasn't necessary, each dict could use its own
seed).
There are several variants of that approach wrt. caching of the hash
1. add an additional field to all string objects, to cache the second
hash value.
a) variant: in 3.3, drop the extra field, and declare that hashes
may change across runs
2. only cache the dict-hash, recomputing the public hash each time
3. on a per-string choice, cache either the dict-hash or the public
hash, depending on which one gets computed first, and recompute
the other one every time it's needed.
As you can see, 1 vs. 2/3 is a classical time-space-tradeoff.
What do you think?
Regards,
Martin
More information about the Python-Dev
mailing list