Case-insensitive string comparison

Avner Ben avner at skilldesign.com
Tue Apr 15 10:15:44 EDT 2003


"Duncan Booth" <duncan at NOSPAMrcp.co.uk> wrote in message
news:Xns935E66BE3E498duncanrcpcouk at 127.0.0.1...
> "Avner Ben" <avner at skilldesign.com> wrote in
> news:3e9b0e03$1 at news.012.net.il:
>
> That seems very strange to me as lower casing the strings has the
> potential to be much more efficient than a case insensitive compare even
> when the compare is done by a C routine. Perhaps if you posted a general
> outline of your code we could suggest ways to speed it up. The biggest
> potential for code speedup is often to use a different algorithm rather
> than trying to optimise individual operations.
>

    I am searching for one or more members of a list of names (that may get
near 1000 pieces in a normal application and may average some 40 characters)
in a string (that may average 100 characters). For each name found in the
string, I do something to the string, using the original name (in its proper
case). At present, I am looking for the exact name as stated, with agreeable
performance. However, this makes me miss many cases where the author of the
string did not care for the casing (which happens frequently). I have little
problem with either lowering or uppering the entire list once, because it is
not expected to change often and when that happens once in a while, a
subscription mechanism can take care of that. However, this requires holding
two lists (I still need the original names as are) and lowering or uppering
each string for comparison (and keep its original version for
restructuring).

    Obviously, the efficient algorithm here would be to skip the standard
comparisons altogether and write an algorithm that advances on the string
character by character and - for each - attempts to parse using the list. I
have written this kind of algorithm in C++ before and it performed
reasonably even with quantities larger than the present ones. However, when
I wrote this algorithm in Python, the performance became unacceptable. The
lesson I learnt from the experience was that the built-in string search and
comparison of the standard Python library (which, I take it, is written in
C) is at least four times faster than anything you can do for yourself, by
running character by character in Python.

    Avner.






More information about the Python-list mailing list