Hash stability

Chris Angelico rosuav at gmail.com
Sat Jan 14 19:36:00 EST 2012


On Sat, Jan 14, 2012 at 3:42 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On the Python Dev mailing list, there is a discussion going on about the
> stability of the hash function for strings.
>
> How many people rely on hash(some_string) being stable across Python
> versions? Does anyone have code that will be broken if the string hashing
> algorithm changes?

On reading your post I immediately thought that you could, if changing
algorithm, simultaneously fix the issue of malicious collisions, but
that appears to be what you're doing it for primarily :)

Suggestion: Create a subclass of dict, the SecureDict or something,
which could either perturb the hashes or even use a proper
cryptographic hash function; normal dictionaries can continue to use
the current algorithm. The description in Objects/dictnotes.txt
suggests that it's still well worth keeping the current system for
programmer-controlled dictionaries, and only change user-controlled
ones (such as POST data etc).

It would then be up to the individual framework and module authors to
make use of this, but it would not impose any cost on the myriad other
uses of dictionaries - there's no point adding extra load to every
name lookup just because of a security issue in an extremely narrow
situation. It would also mean that code relying on hash(str) stability
wouldn't be broken.

ChrisA



More information about the Python-list mailing list