[Python-Dev] Comparison speed

Tue, 15 May 2001 20:37:56 -0400

[Martin v. Loewis]
> ...
> I'd like to add another data point, answering the question what types
> are most frequently compared.

That varies wildly by app.  I have apps where int compares *overwhelmingly*
dominate, others where float compares do, many where strings compares do, and
the last code I wrote for Zope spends most of its (very substantial) time
doing lookups of "object ids" in dicts.  In Python terms, those are Pythong
lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and
that's another case where ceval.c's special-casing of int compares is
impotent.

Heck, sort a large homogeneous array once, and whatever element type that
array has will likely dominate comparisons for the whole app!

That's why I'm so keen to chop out a half dozen layers of blubber for *all*
types that don't play the richcmp game (which today includes every type I
mentioned above).

> The first set of data is for running the Python testsuite.
>
> riches      3040952  # Calls to PyType_RichCompare
> eqs         2828345  # Calls where the types are equal
>
> String      2323122
> Float        141507
> Int          125187
> Type          99477
> Tuple         84503
> Long          30325
> Unicode       10782
> Instance       9335
> List           2997
> None            383
> Class           318
> Complex         219
> Dict             57
> Array            49
> WeakRef          34
> Function         11
> File             11
> SRE_Pattern      10
> CFunction         9
> Lock              8
> Module            1
>
> So strings cover 82% of all the compare calls of equally-typed
> objects, followed by floats with 5%. Those calls together cover 93% of
> the richcompare calls.
>
> Since this might give a blurred view of what is actually used in
> applications,

Note that the top 4 types don't have a tp_richcompare slot today.  The tuples
are likely composed of simple scalar types, and the latter benefit too.  But
as above, we can't say anything in advance about the *specific* types a given
app is going to compare most often.  There is no "typical app" in that
respect.

> I ran the PyXML testsuite with that python binary
> also. Leaving out types that are not used, I get
>
> riches        88465
> eqs           59279
>
> String        48097
> Int            5681
> Type           3170
> Tuple           760
> List            492
> Float           332
> Instance        269
> Unicode         243
> None            225
> SRE_Pattern       4
> Long              3
> Complex           3
>
> The first observation here is that "only" 67% of the calls are with
> equally-typed objects.

Someone who cares about the speed of PyXML would be well advised to figure
out why <0.9 wink>:  there's no scheme on the horizon that will speed
mixed-type comparisons one whit.

> Of those, 80% are with strings, 9% with integers.

XML is a string-crunching app, right?

> The last example is idle, where I just did an "import httplib", for
> fun.
>
> riches        50923
> eqs           49882
>
> String        31198
> Tuple          8312
> Type           7978
> Int            1456
> None            600
> SRE_Pattern     210
> List            122
> Instance          4
> Float             1
> Instance method   1
>
> Roughly the same picture: 97% calls with equally-typed objects, of
> those 62% strings, 3% integers. Notice the 15% for tuples and types,
> each.

Surprising!

> So to speed-up the common case clearly means to speed-up string
> comparisons.

The only thing the apps I've tried have in common is that the types compared
most often do have tp_compare but not tp_richcompare functions.  The test
suite, XML and IDLE are all heavy string-slingers.

> If I'd need to optimize anything else afterwards, I'd look into type
> objects - most likely, they are compared for EQ, which can be done
> nicely and directly in a tp_richcompare also.

Would do just as well to give them a one-liner tp_compare function (in
conjunction with the posted patch).

> Those two optimizations together would give a richcompare to 95% of
> the objects in the IDLE case.

Since that's the exact opposite of what I want to do, it's at least
interesting <wink>.  Whatever, there needs to be a (very) fast path, and it
needs to pick on something that all common types implement, including at
least strings, ints, longs, floats and-- I guess --type objects.

I don't know about other people, but I have lots of code that uses the cmp()
function heavily.  That path has also gotten bloated, and tries each of
Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes".
It does this now even if the tp_compare slot is defined.  The only thing
that's saving cmp()-slinging code from major sloth now is that the basic
types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out
early (before doing the three-way Py_EQ etc dance).  But give the basic
scalar types richcmp functions, and cmp() will slow down a lot (unless more
hacks are added to stop that).