[Python-Dev] Comparison speed

Tim Peters tim.one@home.com
Sun, 20 May 2001 05:31:25 -0400


[M0artin v. Loewis]
> ...
> If I set tp_richcompare of strings to 0, I get past this code, and do
>
> 		c = (*f)(v, w);
> 		if (PyErr_Occurred())

Note that the usual way to write this is

 		if (c < 0 && PyErr_Occurred())

More work for my artificial "ab" < "cd" case but a net win in real life (when
c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
when c < 0 there's no way in the cmp protocol to use c's value alone to
distinguish between "less than" and "error").

> 			return NULL;
> 		return convert_3way_to_object(op, c);
>
> Here, I get 3 function calls: f is string_compare, then
> PyErr_Occurred, finally convert_3way_to_object, which converts
> {-1,0,1} x Op -> {Py_True, Py_False}.

Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

> Indeed, when I inline convert_3way_to_object, I get the same speed in
> both cases (with the remaining differences attributed to measurement
> and gcc doing register usage differently in both functions).

OK, understood, and thanks for following up!

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

In the absence of significant sorting, I agreed Py_EQ is most frequent.

> With a control flow like
>
> 		if (a->ob_size != b->ob_size)
>                    goto False;
>
> 		if (a->ob_size == 0)
>                    goto True;
>
> 		if (a->ob_sval[0] != b->ob_sval[0])
>                    goto False;
>
> 		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
>                    goto False;
>                 else
>                    goto True;
>
> we can reduce the number of function calls

Suggest collapsing the third into the first:

		if (a->ob_size != b->ob_size
                || a->ob_sval[0] != b->ob_sval[0])
                    goto False;

There's no danger of over-indexing when ob_size==0, because it doesn't
include the trailing null byte Python always sticks at the end of string
objects; and the first-byte check is much more likely to pay off than the
zero-length check (comparison to a null string?  gotta be rare as clear
conclusions <wink>), and better to test for the more common case first.