string.join is abysmally slow

Gerhard Häring gerhard.nospam at bigfoot.de
Sun Apr 15 18:28:48 EDT 2001


On Mon, 16 Apr 2001 10:59:11 +1200 (NZST), Graham Guttocks wrote:
>Greetings,
>
>I've run into a performance problem in one of my functions, and wonder
>if I could get some recommendations on how to speed things up.
>
>What I'm trying to do is read in a textfile containing e-mail
>addresses, one per line, and use them to build a regular expression
>object in the form "address1|address2|address3|addressN" to search
>against.
>
>I'm using string.join to concatenate the addresses together, separated
>by a `|'.  The problem is that string.join is unacceptably slow in
>this task.  The following program takes 37 seconds on a PIII/700 to
>process a 239-line file!

Yes, the number one rule on optimization is not to make assumptions but to
profile instead :-) I didn't do that, but I would bet it's the re.compile being
called 239 times that wastes the most time. Just do one re.compile after having
built the regular expression and it should be a lot faster.

Gerhard

>
>--------------------------------------------------------------------
>
>import fileinput, re, string
>list = []
>
>for line in fileinput.input(textfile):
>    # Comment or blank line?
>    if line == '' or line[0] in '#':
>        continue
>    else:
>        list.append(string.strip(line))
>        # "address1|address2|address3|addressN"
>        regex = string.join(list,'|')
>        regex = '"' + regex + '"'
>        reo = re.compile(regex, re.I)

-- 
mail:   gerhard <at> bigfoot <dot> de
web:    http://highqualdev.com



More information about the Python-list mailing list