Comparing strings from the back?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Sep 3 22:17:30 EDT 2012
On Mon, 03 Sep 2012 21:54:01 -0400, Roy Smith wrote:
> Let's assume you're testing two strings for equality. You've already
> done the obvious quick tests (i.e they're the same length), and you're
> down to the O(n) part of comparing every character.
>
> I'm wondering if it might be faster to start at the ends of the strings
> instead of at the beginning? If the strings are indeed equal, it's the
> same amount of work starting from either end. But, if it turns out that
> for real-life situations, the ends of strings have more entropy than the
> beginnings, the odds are you'll discover that they're unequal quicker by
> starting at the end.
And if the strings have the same end and different beginnings:
running
jumping
cooking
then you'll find out quicker by starting at the beginning.
No general-purpose string comparison operator can make assumptions about
the content of the strings, like "they are nearly always pathnames on
Unix-like systems like /home/user/x and /home/user/y".
I suppose you could have two different operators, == for "test from the
front" and === for "test from the back", and push that decision onto the
user, who will then spend hours angsting about the "most efficient"
equality test and days painstakingly profiling his data to save four
nanoseconds at runtime.
Besides, then somebody will say "Yes, but what about the cases where the
prefix and the suffix are both equal, but the middle will be different?"
and propose a third string-equality operator ==== and then there will be
bloodshed.
On average, string equality needs to check half the characters in the
string. Any individual comparisons may require fewer, or more, than that,
but whichever way you scan the string, some comparisons will take more
and some will take fewer. With no general way of telling ahead of time
which will be which, on average you can't do better than O(N/2) and no
complexity justification for picking "start from the end" from "start
from the start".
--
Steven
More information about the Python-list
mailing list