[Tutor] file.read..... Abort Problem
kent37 at tds.net
Thu Oct 20 22:21:25 CEST 2005
Danny Yoo wrote:
> On Thu, 20 Oct 2005, Tomas Markus wrote:
>>what is the most effective way to check a file for not allowed
>>characters or how to check it for allowed only characters (which might
>>be i.e. ASCII only).
> If the file is small enough to fit into memory, you might use regular
> expressions as a sledgehammer. See:
> for a small tutorial on regular expressions. But unless performance is a
> real concern, doing a character-by-character scan shouldn't be too
I was going to ask why you think regex is a sledgehammer for this one, then I decided to try the two alternatives and found out it is actually faster to scan for individual characters than to use a regex and look for them all at once!
Here is a program that scans a string for test chars, either using a single regex search or by individually searching for the test chars. The test data set doesn't include any of the test chars so it is a worst case (neither scan terminates early):
import re, string
data = string.letters * 2500
testchars = string.digits + string.whitespace
testRe = re.compile('[' + testchars + ']')
return testRe.search(data) is not None
for c in testchars:
if c in data:
and here are the results of timing calls to findRe() and findScan():
F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findRe()"
100 loops, best of 3: 2.29 msec per loop
F:\Tutor>python -m timeit -s "from FindAny import findRe, findScan" "findScan()"
100 loops, best of 3: 2.04 msec per loop
Surprised the heck out of me!
When in doubt, measure! When you think you know, measure anyway, you are probably wrong!
More information about the Tutor