[Python-Dev] String hash function multiplier

Raymond Hettinger python at rcn.com
Tue Apr 13 22:16:16 EDT 2004


[Jeff Epler]
> > With -O2 -mcpu=i686 or newer, gcc uses "imul" for both 100003 and
> > 65599,
> > rather than shifts and adds.


[Bob Ippolito]
> It's not expected that GCC optimize an integer constant into shifts on
> its own. 

Right, the actual diff is:

*** 1145,1151 ****
        p = (unsigned char *) a->ob_sval;
        x = *p << 7;
        while (--len >= 0)
!               x = (1000003*x) ^ *p++;
        x ^= a->ob_size;
        if (x == -1)
                x = -2;
--- 1152,1158 ----
        p = (unsigned char *) a->ob_sval;
        x = *p << 7;
        while (--len >= 0)
!               x = (x << 6) + (x << 16) - x + (long)*p++;
        x ^= a->ob_size;
        if (x == -1)
                x = -2;



> I guess the real question for Raymond is, does it really make a
> measurable difference?

Yes, the loop runs 20% faster on my Pentium III.  The speedup ought to
be much more dramatic on the Pentium IV (where the integer ALU
instructions speedup from 1 clock to 0.5 clocks while the latency on
integer multiplication slows down from 4 clocks to 14-18 clocks).


>  And what effect does it have on pickled dicts
> (or other such hash-using data structures), if any?

The test suite runs fine.  Dicts may display in a different order than
in previous pythons.  That may upset some doctests if the writer didn't
take Tim's documented advice about making tests that do not depend on
display order.


Raymond




More information about the Python-Dev mailing list