When Python outruns C++ ...

Julian Tibble chasm at rift.sytes.net
Tue Apr 1 08:58:27 EST 2003


In article <mailman.1049181423.2644.python-list at python.org>, Jp Calderone wrote:
>   It looks like this could easily be improved with the application of a
> slightly smarter searching algorithm, such as Rabin-Karp search or
> Boyer-Moore find.
> 
>   Both of these use a technique of doing a little extra work up front to
> make each pass through the loop much less expensive.  Google for details.

I can't see why.  Maybe for more complex patterns, but not for just finding
words.

All you have to do to match words is go through the file character by
character with a simple finite state machine.  The C snippet I used in
my comparison program was as follows:

/* C extract to count occurences of words
 * ht = hash table handle
 * t is one past the last character (set to '\0' in case there is a
 *                                   word right at the end)
 */

for (i = 0; i <= t; i++) {
        if (isalpha(buf[i])) {
                if (!in_word) {
                        in_word = 1;
                        start = &buf[i];
                }
                buf[i] = tolower(buf[i]);
        } else {
               if (in_word) {
                       buf[i] = '\0';
                       in_word = 0;
                       hash_countword(ht, start);
               }
        }
}

This is clearly O(n)

Rabin-Karp pattern search algorithm is an alternative to trying to match
a pattern at every position, which I'm not doing.


Julian




More information about the Python-list mailing list