[Tutor] file.read..... Abort Problem

Kent Johnson kent37 at tds.net
Thu Oct 20 22:34:34 CEST 2005


Kent Johnson wrote:
> Here is a program that scans a string for test chars, either using a single regex search or by individually searching for the test chars. The test data set doesn't include any of the test chars so it is a worst case (neither scan terminates early):
> 
> # FindAny.py
> import re, string
> 
> data = string.letters * 2500
> 
> testchars = string.digits + string.whitespace
> testRe = re.compile('[' + testchars + ']')
> 
> def findRe():
>     return testRe.search(data) is not None
> 
> def findScan():
>     for c in testchars:
>         if c in data:
>             return True
>     return False
> 
> 
> and here are the results of timing calls to findRe() and findScan():
> 
> F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findRe()"
> 100 loops, best of 3: 2.29 msec per loop
> 
> F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findScan()"
> 100 loops, best of 3: 2.04 msec per loop
> 
> Surprised the heck out of me!

On the other hand, if the number of chars you are searching for is large (and unicode?) the regex solution wins:

# FindAny.py (new version)
# From a recipe by Martin v. Löwis, who claims that this regex match is "time independent of the number of characters in the class"
# http://tinyurl.com/7jqgt
import re, string, sys

data = string.digits * 2500
data = data.decode('ascii')

uppers = []
for i in range(sys.maxunicode):
  c = unichr(i)
  if c.isupper(): uppers.append(c)
uppers = u"".join(uppers)
uppers_re = re.compile(u'[' + uppers + u']')

def findRe():
    return uppers_re.search(data) is not None

def findScan():
    for c in uppers:
        if c in data:
            return True
    return False


F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findRe()"
1000 loops, best of 3: 442 usec per loop

F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findScan()"
10 loops, best of 3: 36 msec per loop

Now the search solution takes 80 times as long as the regex!

I'm-a-junkie-for-timings-ly-yrs,
Kent



More information about the Tutor mailing list