Replace Several Items

Steven D'Aprano steve at
Thu Aug 14 05:45:06 CEST 2008

On Thu, 14 Aug 2008 01:54:55 +0000, Steven D'Aprano wrote:

> In full knowledge that Python is relatively hard to guess what is fast
> compared to what is slow, I'll make my guess of fastest to slowest:
> 1. repeated replace
> 2. repeated use of the form
>    "if ch in my_string: my_string = my_string.replace(ch, "")
> 3. re.sub with literal replacement
> 4. re.sub with callback (lambda m: "")

I added an extra test, which I expected to be fastest of all: using the 
string.translate() function.

Here are my results, as generated with the timeit module under Python 2.5:

$ python
Replacing 72 chars from a string of length 216
[(5.3256440162658691, 'delchars5'), (10.688904047012329, 'delchars2'), 
(10.85448694229126, 'delchars1'), (67.739475965499878, 'delchars3'), 
(120.5037829875946, 'delchars4')]

Based on these results, the fastest to slowest techniques are:

1. string translate (delchars5)
2. repeated replace with a test (delchars2)
3. repeated replace without a test (delchars1)
4. re.sub with literal replacement (delchars3)
5. re.sub with callback (delchars4)

However the two versions using replace are quite close, and possibly not 
significant. I imagine that it would be easy to find test cases where 
they were in the opposite order.

While I'm gratified that my prediction was so close to the results I 
found, I welcome any suggestions to better/faster/more efficient code.

Test code follows:


import re, string

def delchars1(s, chars):
    for c in chars:
        s = s.replace(c, '')
    return s

def delchars2(s, chars):
    for c in chars:
        if c in s:
            s = s.replace(c, '')
    return s

def delchars3(s, chars):
    chars = re.escape(chars)
    x = re.compile(r'[%s]' % chars)
    return x.sub('', s)

def delchars4(s, chars):
    chars = re.escape(chars)
    x = re.compile(r'[%s]' % chars)
    return x.sub(lambda m: '', s)

def delchars5(s, chars):
    return string.translate(s, string.maketrans('', ''), chars)

funcs = [delchars1, delchars2, delchars3, delchars4, delchars5]

def test_same(s, chars, known_result):
    results = [f(s, chars) for f in funcs]
    for i in range(len(results)):
        if results[i] != known_result:
            msg = "function %s incorrectly gives %s" \
            % (funcs[i], results[i])
            raise AssertionError(msg)

s = "abcd.abcd-abcd/abcd"
chars = ".-/?"
test_same(s, chars, "abcd"*4)

# try something a little bigger
s = s*2 + "abcd..--//" + "a.b.c.d.a-b-c-d-a/b/c/d/"
s *= 3
test_same(s, chars, "abcd"*36)

# now do the timing tests

from timeit import Timer
t1 = Timer("delchars1(s, chars)", 
    "from __main__ import delchars1, s, chars")
t2 = Timer("delchars2(s, chars)", 
    "from __main__ import delchars2, s, chars")
t3 = Timer("delchars3(s, chars)", 
    "from __main__ import delchars3, s, chars")
t4 = Timer("delchars4(s, chars)", 
    "from __main__ import delchars4, s, chars")
t5 = Timer("delchars5(s, chars)", 
    "from __main__ import delchars5, s, chars")

times = [min(t.repeat()) for t in (t1, t2, t3, t4, t5)]
results = zip(times, [f.__name__ for f in funcs])

n = sum(s.count(c) for c in chars)
print "Replacing %d chars from a string of length %d" % (n, len(s))
print results



More information about the Python-list mailing list