when does re take over from string.replace ?

Tim Peters tim_one at email.msn.com
Fri May 28 03:31:59 EDT 1999


[sweeting at neuronet.com.my]
> The python doc's say to use the string module instead of the re
> module for simple tasks (and I'll readily admit that I'm glad to
> hear that). Now I have a translation that goes somewhat like this :
>
> oldwordlist = ['value1', 'value2', 'value3' ..... ]
> replacelist = ['newvalue1', 'newvalue2', 'newvalue3' ...... ]
> text = "about 2-3,000 characters of text........."
>
> for i in range (0, len(oldwordlist)) :
>     text = string.replace(text, oldwordlist[i], replacelist[i])
>
>
> The list of words to be replaced is usually 10-30 long.
> The bulk text on which this translation occurs is about 2,000
> characters.  It works fine and I've no problems but since this
> is being used often in my app, I figured I should try to
> optimise it.

Why?  Regardless of how often it's being used, do you have actual timings
that show it *needs* to be sped up?

> I have no idea how fast the built-in functions like
> string.replace

Ah -- I guess that answers the question above <wink>.

> are but at what stage should I be looking at doing this with a more
> complicated method that uses one pass instead of the current method
> that requires 10-30 passes depending on the replacelist. Would re be
> better suited even now ?

I doubt re would be faster even if you had a thousand strings to replace and
your text averaged a megabyte.  Try fleshing out the re code and see what it
takes to make that work.  Then time both ways in isolation ("from time
import clock").  With any regexp-based approach, you're going to have to pop
back into Python every time you get a match, in order to figure out the
appropriate replacement text.  This can boost your 10-30 calls now to 100s
of calls, depending on how many matches there are in total (you didn't say,
but it's a crucial aspect of the problem).  Each string.replace runs
entirely at C speed; parts of re don't (string.replace is *much* faster than
re.sub), and you're trying to do something re can't do on its own anyway.

If it turns out you need a huge speed boost, probably best to look into
Marc-Andre Lemburg's mxTextTools extension.  You'll pay big-time for a huge
speedup, though.

if-it's-faster-than-you-can-do-it-by-hand-it's-fast-enough<wink>-ly y'rs  -
tim






More information about the Python-list mailing list