best way to replace first word in string?

Ron Adam rrr at ronadam.com
Sat Oct 22 23:59:10 EDT 2005


Steven D'Aprano wrote:

> On Sat, 22 Oct 2005 21:41:58 +0000, Ron Adam wrote:
> 
> 
>>Don't forget a string can be sliced.  In this case testing before you 
>>leap is a win.   ;-)
> 
> 
> Not much of a win: only a factor of two, and unlikely to hold in all
> cases. Imagine trying it on *really long* strings with the first space
> close to the far end: the split-and-join algorithm has to walk the string
> once, while your test-then-index algorithm has to walk it twice.
> 
> So for a mere factor of two benefit on short strings, I'd vote for the
> less complex split-and-join version, although it is just a matter of
> personal preference.
> 

Guess again...  Is this the results below what you were expecting?

Notice the join adds a space to the end if the source string is a single 
word.  But I allowed for that by adding one in the same case for the 
index method.

The big win I was talking about was when no spaces are in the string. 
The index can then just return the replacement.

These are relative percentages of time to each other.  Smaller is better.

Type 1 = no spaces
Type 2 = space at 10% of length
Type 3 = space at 90% of length

Type: Length

Type 1: 10        split/join: 317.38%  index: 31.51%
Type 2: 10        split/join: 212.02%  index: 47.17%
Type 3: 10        split/join: 186.33%  index: 53.67%
Type 1: 100       split/join: 581.75%  index: 17.19%
Type 2: 100       split/join: 306.25%  index: 32.65%
Type 3: 100       split/join: 238.81%  index: 41.87%
Type 1: 1000      split/join: 1909.40%  index: 5.24%
Type 2: 1000      split/join: 892.02%  index: 11.21%
Type 3: 1000      split/join: 515.44%  index: 19.40%
Type 1: 10000     split/join: 3390.22%  index: 2.95%
Type 2: 10000     split/join: 2263.21%  index: 4.42%
Type 3: 10000     split/join: 650.30%  index: 15.38%
Type 1: 100000    split/join: 3342.08%  index: 2.99%
Type 2: 100000    split/join: 1175.51%  index: 8.51%
Type 3: 100000    split/join: 677.77%  index: 14.75%
Type 1: 1000000   split/join: 3159.27%  index: 3.17%
Type 2: 1000000   split/join: 867.39%  index: 11.53%
Type 3: 1000000   split/join: 679.47%  index: 14.72%




import time
def test(func, source):
     t = time.clock()
     n = 6000000/len(source)
     s = ''
     for i in xrange(n):
         s = func(source, "replace")
     tt = time.clock()-t
     return s, tt

def replace_word1(source, newword):
     """Replace the first word of source with newword."""
     return newword + " " + " ".join(source.split(None, 1)[1:])

def replace_word2(source, newword):
     """Replace the first word of source with newword."""
     if ' ' in source:
         return newword + source[source.index(' '):]
     return newword + ' '   # space needed to match join results


def makestrings(n):
     s1 = 'abcdefghij' * (n//10)
     i, j = n//10, n-n//10
     s2 = s1[:i] + ' ' + s1[i:] + 'd.'    # space near front
     s3 = s1[:j] + ' ' + s1[j:] + 'd.'    # space near end
     return [s1,s2,s3]

for n in [10,100,1000,10000,100000,1000000]:
     for sn,s in enumerate(makestrings(n)):
         r1, t1 = test(replace_word1, s)
         r2, t2 = test(replace_word2, s)
         assert r1 == r2
         print "Type %i: %-8i  split/join: %.2f%%  index: %.2f%%" \
                % (sn+1, n, t1/t2*100.0, t2/t1*100.0)












More information about the Python-list mailing list