Replacing words from strings except 'and' / 'or' / 'and not'

Jp Calderone exarkun at divmod.com
Sat Nov 27 19:28:02 CET 2004


On Sat, 27 Nov 2004 12:54:02 -0500, Peter Hansen <peter at engcorp.com> wrote:
>Peter Maas wrote:
> > Diez B. Roggisch schrieb:
> >> import sets
> >> KEYWORDS = sets.Set(['and', 'or', 'not'])
> >>...
> >> def decorate(w):
> >>     if w in KEYWORDS:
> >>         return w
> >>     return "*%s*" % w
> >>
> > Is there a reason to use sets here? I think lists will do as well.
> 
> Sets are implemented using dictionaries, so the "if w in KEYWORDS"
> part would be O(1) instead of O(n) as with lists...
> 
> (I.e. searching a list is a brute-force operation, whereas
> sets are not.)

  And yet... using sets here is slower in every possible case:

exarkun at boson:~$ timeit.py -s 'from sets import Set; x = Set(["and", "or", "not"])' 'None in x'
1000000 loops, best of 3: 1.54 usec per loop
exarkun at boson:~$ timeit.py -s 'from sets import Set; x = Set(["and", "or", "not"])' '"not" in x'
1000000 loops, best of 3: 1.49 usec per loop
exarkun at boson:~$ timeit.py -s 'from sets import Set; x = Set(["and", "or", "not"])' '"or" in x'
1000000 loops, best of 3: 1.48 usec per loop
exarkun at boson:~$ timeit.py -s 'from sets import Set; x = Set(["and", "or", "not"])' '"and" in x'
1000000 loops, best of 3: 1.46 usec per loop
exarkun at boson:~$ timeit.py -s 'x = ["and", "or", "not"]' 'None in x'
1000000 loops, best of 3: 0.5 usec per loop
exarkun at boson:~$ timeit.py -s 'x = ["and", "or", "not"]' '"not" in x'
1000000 loops, best of 3: 0.235 usec per loop
exarkun at boson:~$ timeit.py -s 'x = ["and", "or", "not"]' '"or" in x'
1000000 loops, best of 3: 0.21 usec per loop
exarkun at boson:~$ timeit.py -s 'x = ["and", "or", "not"]' '"and" in x'
10000000 loops, best of 3: 0.186 usec per loop
exarkun at boson:~$ 

  This is a pretty clear example of premature optimization.  When developing any software, the first goal should be to make it work, as simply as possible.  Then, when a need for improved performance is identified, the software should be profiled.  Finally, when real hot spots are identified, they should be optimized.

  Going out of order is often detrimental, and the above example is a clear case of this: the "optimization" slowed down the function by a factor of no less than 3 and no more than 8.2.

  Jp



More information about the Python-list mailing list