[Python-Dev] Comparison speed
Tim Peters
tim.one@home.com
Tue, 15 May 2001 20:37:56 -0400
[Martin v. Loewis]
> ...
> I'd like to add another data point, answering the question what types
> are most frequently compared.
That varies wildly by app. I have apps where int compares *overwhelmingly*
dominate, others where float compares do, many where strings compares do, and
the last code I wrote for Zope spends most of its (very substantial) time
doing lookups of "object ids" in dicts. In Python terms, those are Pythong
lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and
that's another case where ceval.c's special-casing of int compares is
impotent.
Heck, sort a large homogeneous array once, and whatever element type that
array has will likely dominate comparisons for the whole app!
That's why I'm so keen to chop out a half dozen layers of blubber for *all*
types that don't play the richcmp game (which today includes every type I
mentioned above).
> The first set of data is for running the Python testsuite.
>
> riches 3040952 # Calls to PyType_RichCompare
> eqs 2828345 # Calls where the types are equal
>
> String 2323122
> Float 141507
> Int 125187
> Type 99477
> Tuple 84503
> Long 30325
> Unicode 10782
> Instance 9335
> List 2997
> None 383
> Class 318
> Complex 219
> Dict 57
> Array 49
> WeakRef 34
> Function 11
> File 11
> SRE_Pattern 10
> CFunction 9
> Lock 8
> Module 1
>
> So strings cover 82% of all the compare calls of equally-typed
> objects, followed by floats with 5%. Those calls together cover 93% of
> the richcompare calls.
>
> Since this might give a blurred view of what is actually used in
> applications,
Note that the top 4 types don't have a tp_richcompare slot today. The tuples
are likely composed of simple scalar types, and the latter benefit too. But
as above, we can't say anything in advance about the *specific* types a given
app is going to compare most often. There is no "typical app" in that
respect.
> I ran the PyXML testsuite with that python binary
> also. Leaving out types that are not used, I get
>
> riches 88465
> eqs 59279
>
> String 48097
> Int 5681
> Type 3170
> Tuple 760
> List 492
> Float 332
> Instance 269
> Unicode 243
> None 225
> SRE_Pattern 4
> Long 3
> Complex 3
>
> The first observation here is that "only" 67% of the calls are with
> equally-typed objects.
Someone who cares about the speed of PyXML would be well advised to figure
out why <0.9 wink>: there's no scheme on the horizon that will speed
mixed-type comparisons one whit.
> Of those, 80% are with strings, 9% with integers.
XML is a string-crunching app, right?
> The last example is idle, where I just did an "import httplib", for
> fun.
>
> riches 50923
> eqs 49882
>
> String 31198
> Tuple 8312
> Type 7978
> Int 1456
> None 600
> SRE_Pattern 210
> List 122
> Instance 4
> Float 1
> Instance method 1
>
> Roughly the same picture: 97% calls with equally-typed objects, of
> those 62% strings, 3% integers. Notice the 15% for tuples and types,
> each.
Surprising!
> So to speed-up the common case clearly means to speed-up string
> comparisons.
The only thing the apps I've tried have in common is that the types compared
most often do have tp_compare but not tp_richcompare functions. The test
suite, XML and IDLE are all heavy string-slingers.
> If I'd need to optimize anything else afterwards, I'd look into type
> objects - most likely, they are compared for EQ, which can be done
> nicely and directly in a tp_richcompare also.
Would do just as well to give them a one-liner tp_compare function (in
conjunction with the posted patch).
> Those two optimizations together would give a richcompare to 95% of
> the objects in the IDLE case.
Since that's the exact opposite of what I want to do, it's at least
interesting <wink>. Whatever, there needs to be a (very) fast path, and it
needs to pick on something that all common types implement, including at
least strings, ints, longs, floats and-- I guess --type objects.
I don't know about other people, but I have lots of code that uses the cmp()
function heavily. That path has also gotten bloated, and tries each of
Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes".
It does this now even if the tp_compare slot is defined. The only thing
that's saving cmp()-slinging code from major sloth now is that the basic
types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out
early (before doing the three-way Py_EQ etc dance). But give the basic
scalar types richcmp functions, and cmp() will slow down a lot (unless more
hacks are added to stop that).