[Python-Dev] Hash collision security issue (now public)

PJ Eby pje at telecommunity.com
Sat Dec 31 22:43:00 CET 2011


On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:

> Hash functions are already unstable across Python versions. Making
> them unstable across interpreter processes (multiprocessing doesn't
> share dicts, right?) doesn't sound like a big additional problem.
> Users who want a distributed hash table will need to pull their own
> hash function out of hashlib or re-implement a non-cryptographic hash
> instead of using the built-in one, but they probably need to do that
> already to allow themselves to upgrade Python.
>

Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
settable to an int, and defaults to whatever we're using now.  Then
programs that want a fix can just set it to a random number, and on Python
versions that support it, it takes effect.  Everywhere else it's a silent
no-op.

Downside: sys has to have slots for this to work; does sys actually have
slots?  My memory's hazy on that.  I guess actually it'd have to be
sys.set_hash_seed().  But same basic idea.

Anyway, this would make fixing the problem *possible*, while still pushing
off the hard decisions to the app/framework developers.  ;-)

Downside: every hash operation includes one extra memory access, but
strings only compute their hash once anyway.)

Given that changing dict won't help, and changing the default hash is a
non-starter, an option to set the seed is probably the way to go.  (Maybe
with an environment variable and/or command line option so users can work
around old code.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111231/0cb7bfb8/attachment.html>


More information about the Python-Dev mailing list