[Python-Dev] Hashing proposal: change only string-only dicts

Gregory P. Smith greg at krypto.org
Wed Jan 18 06:58:51 CET 2012


On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. Löwis" <martin at v.loewis.de>wrote:

> I'd like to propose a different approach to seeding the string hashes:
> only do so for dictionaries involving only strings, and leave the
> tp_hash slot of strings unchanged.
>
> Each string would get two hashes: the "public" hash, which is constant
> across runs and bugfix releases, and the dict-hash, which is only used
> by the dictionary implementation, and only if all keys to the dict are
> strings. In order to allow caching of the hash, all dicts should use
> the same hash (if caching wasn't necessary, each dict could use its own
> seed).
>
> There are several variants of that approach wrt. caching of the hash
> 1. add an additional field to all string objects, to cache the second
>   hash value.
>

yuck, our objects are large enough as it is.


>   a) variant: in 3.3, drop the extra field, and declare that hashes
>   may change across runs
>

+1 Absolutely.  We can and should make 3.3 change hashes across runs
(behavior that can be disabled via a flag or environment variable).

I think the issue of doctests and such breaking even in 2.7 due to hash
order changes is a being overblown.  Code like that has already needs to
fix its tests at least once when they want tests to pass on on both 32-bit
and 64-bit python VMs (they have different hashes).  Do we have _any_
measure of how big a deal this will be before going too far here?

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120117/387bfbe9/attachment.html>


More information about the Python-Dev mailing list