[Python-Dev] Comparison speed
Tim Peters
tim.one@home.com
Thu, 17 May 2001 12:02:38 -0400
[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete. I realize I
> probably haven't expressed this thought clearly, and I'm not going to
> push for this to happen quickly or forecefully, but it's nevertheless
> how I see things. I expect it would allow a tremendous cleanup of the
> comparison code. It will never reach the simplicity of cmp() -- but
> think of Einstein's (?) rule "things should be as simple as they can
> be, but no simpler." Clearly cmp() was too simple. :-)
>
> Anyway, it worries me whenever I hear someone express the thought that
> adding rich comparisons to a particular object type would be a bad
> idea because it would slow things down.
At the moment, "almost all" comparisons in the dynamic sense have no need of
richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple
<wink>. For now richcmps are a tail-wagging-the-dog phenomenon, or more like
the tail growing 10 pounds of dense matted hair, making the once-frisky puppy
slow to a crawl because its butt is scraping the ground <wink>.
Martin and I can resolve our differences wrt strings via getting rid of old
strcmp entirely. Do you like the implications?
1. Code using cmp(string1, string2) will clearly run significantly
slower, calling string comparison 1 (when == obtains), 2 (when <
obtains), or 3 (when > obtains) times instead of always once only.
Since == is the least likely outcome when using cmp() on strings
(you can conclude that by instrumenting Python, or by common
sense <0.5 wink>), the number of string compare calls more than
doubles in practice for string cmp()-slinging programs (which
includes existing well-written tree-based lookup schemes).
2. String dictionary lookup will, unlike the general non-dict case
Martin instrumented, never pass the new "are the pointers the
same?" richcmp Py_EQ test (because dict lookup already makes that
test inline). So if old strcmp goes away, dict lookups that
have to resort to strcmp will start paying for hopeless tests.
OTOH, the "pointers equal?" test looks of dubious value for the
non-dict string case anyway (where it succeeded only 1 in 20
times).
#2 is a special case that can be special-cased to death, but #1 likely
applies to code using cmp() for comparisons of objects of any type, and
that's the primary reason I've resisted adding richcmps to the
heavily-compared types (variously string, int, float, long, and type
objects). Also the case that adding "a fast path" shouldn't have to endure
wading thru multiple gimmicks (kinda defeats the idea of "fast" <wink>), so
the instant *one* heavily-compared basic type grows a richcmp (there are 0
such today), all should.
So that's what I'll aim at.