[Python-Dev] talking about performance... [case closed]

Tue, 20 Jun 2000 12:47:47 -0400

[Fredrik Lundh]
> for the record, my little benchmark now runs about five
> times faster than before -- which means that 16-bit find
> is now ~30% faster than 8-bit find (barry? ;-)

[Andrew M. Kuchling]/
> Speculation: the code in stringobject.c looks like this:
>                 for (; i <= last; ++i)
>                         if (s[i] == sub[0] &&
>                             (n == 1 || memcmp(&s[i+1], &sub[1],
> n-1) == 0))
>                                 return (long)i;
> ...
> the Unicode find is simpler, just doing the full memcmp.  I wonder
> if the extra n == 1 and i+1, n-1 arithmetic costs more than it
> saves?

So the intent of the code is to avoid the expense of calling memcmp if a
full n-character memcmp can't possibly find equality at i (due to mismatch
of first characters), and the assumption is that first characters mismatch
frequently.  Both seem like good calls to me!

> This is probably delicately tied to the speed of your memcmp().

Also on whether your memcmp() is an actual external function or a full or
partial inline expansion (e.g., the strcmp at KSR expanded into code much
like the "by-hand" stuff above, skipping "the real" strcmp if the first
characters weren't equal).

> Anyone want to try the simpler version and report on whether
> it makes a difference?

No <wink>.  But if someone does, a good compromise might be to split this
into two loops, one for the n==1 case and another for n>1.  The "+1/-1" are
dubious in any case (i.e., "memcmp(s+i, sub, n) == 0" saves the fiddly
arithmetic at the also-minor cost of making memcmp do one redundant
character compare).