[Python-Dev] Hashing proposal: change only string-only dicts
Gregory P. Smith
greg at krypto.org
Wed Jan 18 06:58:51 CET 2012
On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. Löwis" <martin at v.loewis.de>wrote:
> I'd like to propose a different approach to seeding the string hashes:
> only do so for dictionaries involving only strings, and leave the
> tp_hash slot of strings unchanged.
> Each string would get two hashes: the "public" hash, which is constant
> across runs and bugfix releases, and the dict-hash, which is only used
> by the dictionary implementation, and only if all keys to the dict are
> strings. In order to allow caching of the hash, all dicts should use
> the same hash (if caching wasn't necessary, each dict could use its own
> There are several variants of that approach wrt. caching of the hash
> 1. add an additional field to all string objects, to cache the second
> hash value.
yuck, our objects are large enough as it is.
> a) variant: in 3.3, drop the extra field, and declare that hashes
> may change across runs
+1 Absolutely. We can and should make 3.3 change hashes across runs
(behavior that can be disabled via a flag or environment variable).
I think the issue of doctests and such breaking even in 2.7 due to hash
order changes is a being overblown. Code like that has already needs to
fix its tests at least once when they want tests to pass on on both 32-bit
and 64-bit python VMs (they have different hashes). Do we have _any_
measure of how big a deal this will be before going too far here?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev