Replace Several Items
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Wed Aug 13 23:45:06 EDT 2008
On Thu, 14 Aug 2008 01:54:55 +0000, Steven D'Aprano wrote:
> In full knowledge that Python is relatively hard to guess what is fast
> compared to what is slow, I'll make my guess of fastest to slowest:
>
> 1. repeated replace
> 2. repeated use of the form
> "if ch in my_string: my_string = my_string.replace(ch, "")
> 3. re.sub with literal replacement
> 4. re.sub with callback (lambda m: "")
I added an extra test, which I expected to be fastest of all: using the
string.translate() function.
Here are my results, as generated with the timeit module under Python 2.5:
$ python delchars.py
Replacing 72 chars from a string of length 216
[(5.3256440162658691, 'delchars5'), (10.688904047012329, 'delchars2'),
(10.85448694229126, 'delchars1'), (67.739475965499878, 'delchars3'),
(120.5037829875946, 'delchars4')]
Based on these results, the fastest to slowest techniques are:
1. string translate (delchars5)
2. repeated replace with a test (delchars2)
3. repeated replace without a test (delchars1)
4. re.sub with literal replacement (delchars3)
5. re.sub with callback (delchars4)
However the two versions using replace are quite close, and possibly not
significant. I imagine that it would be easy to find test cases where
they were in the opposite order.
While I'm gratified that my prediction was so close to the results I
found, I welcome any suggestions to better/faster/more efficient code.
Test code follows:
==================================================
import re, string
def delchars1(s, chars):
for c in chars:
s = s.replace(c, '')
return s
def delchars2(s, chars):
for c in chars:
if c in s:
s = s.replace(c, '')
return s
def delchars3(s, chars):
chars = re.escape(chars)
x = re.compile(r'[%s]' % chars)
return x.sub('', s)
def delchars4(s, chars):
chars = re.escape(chars)
x = re.compile(r'[%s]' % chars)
return x.sub(lambda m: '', s)
def delchars5(s, chars):
return string.translate(s, string.maketrans('', ''), chars)
funcs = [delchars1, delchars2, delchars3, delchars4, delchars5]
def test_same(s, chars, known_result):
results = [f(s, chars) for f in funcs]
for i in range(len(results)):
if results[i] != known_result:
msg = "function %s incorrectly gives %s" \
% (funcs[i], results[i])
raise AssertionError(msg)
s = "abcd.abcd-abcd/abcd"
chars = ".-/?"
test_same(s, chars, "abcd"*4)
# try something a little bigger
s = s*2 + "abcd..--//" + "a.b.c.d.a-b-c-d-a/b/c/d/"
s *= 3
test_same(s, chars, "abcd"*36)
# now do the timing tests
from timeit import Timer
t1 = Timer("delchars1(s, chars)",
"from __main__ import delchars1, s, chars")
t2 = Timer("delchars2(s, chars)",
"from __main__ import delchars2, s, chars")
t3 = Timer("delchars3(s, chars)",
"from __main__ import delchars3, s, chars")
t4 = Timer("delchars4(s, chars)",
"from __main__ import delchars4, s, chars")
t5 = Timer("delchars5(s, chars)",
"from __main__ import delchars5, s, chars")
times = [min(t.repeat()) for t in (t1, t2, t3, t4, t5)]
results = zip(times, [f.__name__ for f in funcs])
results.sort()
n = sum(s.count(c) for c in chars)
print "Replacing %d chars from a string of length %d" % (n, len(s))
print results
==================================================
--
Steven
More information about the Python-list
mailing list