[Python-Dev] other "magic strings" issues

Skip Montanaro skip at pobox.com
Fri Nov 7 14:47:27 EST 2003


    Raymond> Other than possibly upper and lower, the rest should be skipped
    Raymond> and left for tests like isdigit().  The tests are faster than
    Raymond> the usual linear search style of: if char in str.letters.

A couple people have claimed that the .is*() string methods are faster than
testing a character against a string.  I'm sure that's true in some cases,
but it seems not to be true for string.ascii_letters.  Here are several
timeit.py runs, ordered from slowest to fastest.  Both situations have a
pair of runs, one with a positive test and one with a negative test.

    Using char in someset:
    % timeit.py -s 'import string, sets; pset = sets.Set(string.ascii_letters)' "'.' in pset"
    100000 loops, best of 3: 4.68 usec per loop
    % timeit.py -s 'import string, sets; pset = sets.Set(string.ascii_letters)' "'z' in pset"
    100000 loops, best of 3: 4.58 usec per loop

    Using char.isalpha() or char.islower():
    % timeit.py -s 'import string' "'z'.islower()"
    1000000 loops, best of 3: 0.93 usec per loop
    % timeit.py -s 'import string' "'.'.islower()"
    1000000 loops, best of 3: 0.928 usec per loop
    % timeit.py -s 'import string' "'z'.isalpha()"
    1000000 loops, best of 3: 0.893 usec per loop
    % timeit.py -s 'import string' "'.'.isalpha()"
    1000000 loops, best of 3: 0.96 usec per loop

    Using char in somestring:
    % timeit.py -s 'import string; pset = string.ascii_letters' "'z' in pset"
    1000000 loops, best of 3: 0.617 usec per loop
    % timeit.py -s 'import string; pset = string.ascii_letters' "'.' in pset"
    1000000 loops, best of 3: 0.747 usec per loop

    Using char in somedict:
    % timeit.py -s 'import string; pset = dict(zip(string.ascii_letters,string.ascii_letters))' "'.' in pset"
    1000000 loops, best of 3: 0.502 usec per loop
    % timeit.py -s 'import string; pset = dict(zip(string.ascii_letters,string.ascii_letters))' "'z' in pset"
    1000000 loops, best of 3: 0.509 usec per loop

The only clear loser is the 'char in set' case, no doubt due to its current
Python implementation, however testing a character for membership in a short
string seems to be faster than using the .is*() methods to me.

Skip



More information about the Python-Dev mailing list