question about speed of sequential string replacement vs regex or

Ian Kelly ian.g.kelly at gmail.com
Wed Sep 28 19:48:48 CEST 2011


On Wed, Sep 28, 2011 at 3:28 AM, Xah Lee <xahlee at gmail.com> wrote:
> curious question.
>
> suppose you have 300 different strings and they need all be replaced
> to say "aaa".
>
> is it faster to replace each one sequentially (i.e. replace first
> string to aaa, then do the 2nd, 3rd,...)
> , or is it faster to use a regex with “or” them all and do replace one
> shot? (i.e. "1ststr|2ndstr|3rdstr|..." -> aaa)
>
> let's say the sourceString this replacement to be done on is 500k
> chars.
>
> Anyone? i suppose the answer will be similar for perl, python, ruby.
>
> btw, the origin of this question is about writing a emacs lisp
> function that replace ~250 html named entities to unicode char.

I haven't timed it at the scale you're talking about, but for Python I
expect regex will be your best bet:

# Python 3.2: Supposing the match strings and replacements are
# in a dict stored as `repls`...

import re

pattern = '|'.join(map(re.escape, repls.keys()))
new_str = re.sub(pattern, lambda m: repls[m.group()], old_str)

The problem with doing 300 str.replace calls is the 300 intermediate
strings that would be created and then collected.

Cheers,
Ian



More information about the Python-list mailing list