[Python-Dev] Comparison speed

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 17 May 2001 08:41:15 +0200


> 1. String objects are also equal despite being different objects,
>    if their ob_sinterned pointers are equal and non-NULL.  So if
>    you're looking for every trick in & out of the book, that's
>    another one.

That does not help. In the entire test suite, there are 0 instances
where strings are compared which are not identical, but have equal
ob_sinterned pointers.

> > So 5% of the calls are with identical strings, for which I can
> > immediately decide the outcome.
>
> But also at the cost of doing a fruitless compare and branch in 95%
> of calls.

Whether there's a fruitless branch depends on your compiler. With gcc
3, you can write

	if (__builtin_expect(a == b, 0)) {

and then the body of the if block will be moved out of the way of
linear control flow.

> Any idea where those 800,000 virgin calls to oldcomp are coming
> from?  That's a lot.

As far as I could trace it, most of them come from lookdict_string (at
various locations inside this function).

> > #comps:                        2949421
> > #memcmps:                       917776
> >
> > So still, ca. 30% can be decided by first byte.
> 
> Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
> 2949421, in which case I would have expected you to say that 70% can be
> decided by first byte.

Oops, you are right.

> It's clearer that this is going to hurt sorting (& bisect etc), by
> adding yet another layer of function call to get Py_LT resolved (as
> for dict compares too, the string richcmp can't do anything to speed
> up Py_LT that string oldcmp can't do just as efficiently -- indeed,
> that's the great advantage oldcmp's "compare first character" test
> had: that *can* decide Py_LT in one byte much of the time (but
> length comparison cannot)).

So to support sorting better, I should special-case Py_LT in
string_richcompare also, to avoid the function call ?-)

> Note too earlier mail about how adding a richcmp slot to strings will
> suddenly slow cmp(string1, string2) (which is the usual way to program a
> search tree, because cmp() *used* to call a string comparison routine only
> once; but after adding a richcmp slot, each cmp(string1, string2) will call
> the richcmp slot from 1 thru 3 times (data-dependent)).

Yes, that is a serious problem. Fortunately, very few calls in my
programs go to string_compare through cmp() now. But then, your
programs are different, of course...

Regards,
Martin