[Python-Dev] Comparison speed

Tim Peters tim.one@home.com
Thu, 17 May 2001 12:02:38 -0400


[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.  I realize I
> probably haven't expressed this thought clearly, and I'm not going to
> push for this to happen quickly or forecefully, but it's nevertheless
> how I see things.  I expect it would allow a tremendous cleanup of the
> comparison code.  It will never reach the simplicity of cmp() -- but
> think of Einstein's (?) rule "things should be as simple as they can
> be, but no simpler."  Clearly cmp() was too simple. :-)
>
> Anyway, it worries me whenever I hear someone express the thought that
> adding rich comparisons to a particular object type would be a bad
> idea because it would slow things down.

At the moment, "almost all" comparisons in the dynamic sense have no need of
richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple
<wink>.  For now richcmps are a tail-wagging-the-dog phenomenon, or more like
the tail growing 10 pounds of dense matted hair, making the once-frisky puppy
slow to a crawl because its butt is scraping the ground <wink>.

Martin and I can resolve our differences wrt strings via getting rid of old
strcmp entirely.  Do you like the implications?

1. Code using cmp(string1, string2) will clearly run significantly
   slower, calling string comparison 1 (when == obtains), 2 (when <
   obtains), or 3 (when > obtains) times instead of always once only.
   Since == is the least likely outcome when using cmp() on strings
   (you can conclude that by instrumenting Python, or by common
   sense <0.5 wink>), the number of string compare calls more than
   doubles in practice for string cmp()-slinging programs (which
   includes existing well-written tree-based lookup schemes).

2. String dictionary lookup will, unlike the general non-dict case
   Martin instrumented, never pass the new "are the pointers the
   same?" richcmp Py_EQ test (because dict lookup already makes that
   test inline).  So if old strcmp goes away, dict lookups that
   have to resort to strcmp will start paying for hopeless tests.
   OTOH, the "pointers equal?" test looks of dubious value for the
   non-dict string case anyway (where it succeeded only 1 in 20
   times).

#2 is a special case that can be special-cased to death, but #1 likely
applies to code using cmp() for comparisons of objects of any type, and
that's the primary reason I've resisted adding richcmps to the
heavily-compared types (variously string, int, float, long, and type
objects).  Also the case that adding "a fast path" shouldn't have to endure
wading thru multiple gimmicks (kinda defeats the idea of "fast" <wink>), so
the instant *one* heavily-compared basic type grows a richcmp (there are 0
such today), all should.

So that's what I'll aim at.