question about speed of sequential string replacement vs regex or
ian.g.kelly at gmail.com
Wed Sep 28 19:48:48 CEST 2011
On Wed, Sep 28, 2011 at 3:28 AM, Xah Lee <xahlee at gmail.com> wrote:
> curious question.
> suppose you have 300 different strings and they need all be replaced
> to say "aaa".
> is it faster to replace each one sequentially (i.e. replace first
> string to aaa, then do the 2nd, 3rd,...)
> , or is it faster to use a regex with “or” them all and do replace one
> shot? (i.e. "1ststr|2ndstr|3rdstr|..." -> aaa)
> let's say the sourceString this replacement to be done on is 500k
> Anyone? i suppose the answer will be similar for perl, python, ruby.
> btw, the origin of this question is about writing a emacs lisp
> function that replace ~250 html named entities to unicode char.
I haven't timed it at the scale you're talking about, but for Python I
expect regex will be your best bet:
# Python 3.2: Supposing the match strings and replacements are
# in a dict stored as `repls`...
pattern = '|'.join(map(re.escape, repls.keys()))
new_str = re.sub(pattern, lambda m: repls[m.group()], old_str)
The problem with doing 300 str.replace calls is the 300 intermediate
strings that would be created and then collected.
More information about the Python-list