[Python-3000] Performance Notes - new hash algorithm

Guido van Rossum guido at python.org
Fri Sep 7 22:53:45 CEST 2007


I'd like Tim Peters's input on this before we change it. I seem to
recall that there's an aspect of non-randomness to the existing hash
function that's important when you hash many closely related strings,
e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
so long that I may misremember this, and perhaps it was related to the
dictionary implementation.

In any case we need to see the code as a patch, of course.

On 9/7/07, Gregory P. Smith <greg at krypto.org> wrote:
> On 9/4/07, Thomas Hunger <hto at arcor.de> wrote:
> >
> > Hello,
> >
> > I don't know much about python internals, so the following might be
> > bogus:
> >
> > I replaced unicode_hash and string_hash with the hash function from
> > here: http://www.azillionmonkeys.com/qed/hash.html.
> >
> > Then I ran the following micro-benchmark :
> >
> >     $ time ./python bench.py
> >
> > where bech.py is:
> >
> >     f = dict((line, nr) for nr, line
> >              in enumerate(open('/usr/share/dict/words',
> >
> encoding='latin1').readlines()))
> >
> > Python3k original hash: real    0m2.210s
> >               new hash: real    0m1.842s
> >
> > So maybe this is an interesting hash function?
> >
> > Tom
>
> Sounds like a great idea to me.  Can you submit it as a patch?
>
> We should run some more realistic perf tests and profiles but I imagine the
> impact will only be good.
>
> -gps
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list