[Tutor] file.read..... Abort Problem
Kent Johnson
kent37 at tds.net
Thu Oct 20 22:34:34 CEST 2005
Kent Johnson wrote:
> Here is a program that scans a string for test chars, either using a single regex search or by individually searching for the test chars. The test data set doesn't include any of the test chars so it is a worst case (neither scan terminates early):
>
> # FindAny.py
> import re, string
>
> data = string.letters * 2500
>
> testchars = string.digits + string.whitespace
> testRe = re.compile('[' + testchars + ']')
>
> def findRe():
> return testRe.search(data) is not None
>
> def findScan():
> for c in testchars:
> if c in data:
> return True
> return False
>
>
> and here are the results of timing calls to findRe() and findScan():
>
> F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findRe()"
> 100 loops, best of 3: 2.29 msec per loop
>
> F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findScan()"
> 100 loops, best of 3: 2.04 msec per loop
>
> Surprised the heck out of me!
On the other hand, if the number of chars you are searching for is large (and unicode?) the regex solution wins:
# FindAny.py (new version)
# From a recipe by Martin v. Löwis, who claims that this regex match is "time independent of the number of characters in the class"
# http://tinyurl.com/7jqgt
import re, string, sys
data = string.digits * 2500
data = data.decode('ascii')
uppers = []
for i in range(sys.maxunicode):
c = unichr(i)
if c.isupper(): uppers.append(c)
uppers = u"".join(uppers)
uppers_re = re.compile(u'[' + uppers + u']')
def findRe():
return uppers_re.search(data) is not None
def findScan():
for c in uppers:
if c in data:
return True
return False
F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findRe()"
1000 loops, best of 3: 442 usec per loop
F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findScan()"
10 loops, best of 3: 36 msec per loop
Now the search solution takes 80 times as long as the regex!
I'm-a-junkie-for-timings-ly-yrs,
Kent
More information about the Tutor
mailing list