Faster 'if char in string' test
Peter Otten
__peter__ at web.de
Thu Jun 24 12:12:22 EDT 2004
Kamilche wrote:
> he 'dict' solution proposed wouldn't work because the data I'm
> testing is in string form, and the overhead of translating the string
> to a dict before testing would swamp the results. So would using a
> set, because timings show a set is 4x slower than a dict.
>
> Unless I'm misunderstanding Peter's suggestion. Did you mean
> translating the string into a dict, then using a 'if boguschar in
> dict' test?
Look for validate_dict() to see what I meant. I've since found a faster
alternative using regular expressions.
<code>
import itertools, re, string
_validchars = string.ascii_letters + string.digits + \
"!@#$%^&*()`~-_=+[{]}\\|;:\'\",<.>/?\t "
_allchars = string.maketrans("", "")
_invalidchars = _allchars.translate(_allchars, _validchars)
_invaliddict = dict(itertools.izip(_invalidchars, itertools.repeat(None)))
valid = "This is a string to test for invalid characters."
invalid = valid + _invalidchars + valid
massaged = (_invalidchars + valid) * 100
rInvalid = re.compile("[%s]" % re.escape(_invalidchars))
def validate_list(s, invalid=_invalidchars):
for c in s:
if c in invalid:
return False
return True
def validate_dict(s, invalid=_invaliddict):
for c in s:
if c in invalid:
return False
return True
def validate_translate(s):
return len(s.translate(_allchars, _invalidchars)) == len(s)
def validate_regex(s, rInvalid=rInvalid):
return not rInvalid.search(s)
def validate_noop(s, valid=valid):
return s is valid
if __name__ == "__main__":
import timeit
tests = [f for k, f in globals().items() if k.startswith("validate_")]
for f in tests:
assert f(valid)
assert not f(invalid)
for f in tests:
name = f.__name__
print name
for data in ["valid", "invalid", "massaged"]:
print " ", data,
timeit.main([
"-sfrom findcharspeed import %s as f, %s as d" % (name, data),
"f(d)"])
</code>
Here are my timings:
validate_regex
valid 100000 loops, best of 3: 2.24 usec per loop
invalid 100000 loops, best of 3: 2.37 usec per loop
massaged 1000000 loops, best of 3: 1.61 usec per loop
validate_dict
valid 100000 loops, best of 3: 10.8 usec per loop
invalid 100000 loops, best of 3: 10.7 usec per loop
massaged 1000000 loops, best of 3: 0.99 usec per loop
validate_translate
valid 100000 loops, best of 3: 2.62 usec per loop
invalid 100000 loops, best of 3: 3.68 usec per loop
massaged 10000 loops, best of 3: 57.6 usec per loop
validate_list
valid 100000 loops, best of 3: 14.2 usec per loop
invalid 100000 loops, best of 3: 14 usec per loop
massaged 1000000 loops, best of 3: 1.02 usec per loop
validate_noop
valid 1000000 loops, best of 3: 0.582 usec per loop
invalid 1000000 loops, best of 3: 0.622 usec per loop
massaged 1000000 loops, best of 3: 0.634 usec per loop
Note how badly your solution using str.translate() may perform for certain
data due to its failure to shortcut.
Peter
More information about the Python-list
mailing list