string.join is abysmally slow

Jose' Sebrosa sebrosa at artenumerica.com
Sun Apr 15 23:03:22 EDT 2001


Graham Guttocks wrote:
> 
> Greetings,
> 
> I've run into a performance problem in one of my functions, and wonder
> if I could get some recommendations on how to speed things up.
> 
> What I'm trying to do is read in a textfile containing e-mail
> addresses, one per line, and use them to build a regular expression
> object in the form "address1|address2|address3|addressN" to search
> against.
> 
> I'm using string.join to concatenate the addresses together, separated
> by a `|'.  The problem is that string.join is unacceptably slow in
> this task.  The following program takes 37 seconds on a PIII/700 to
> process a 239-line file!
> 
> --------------------------------------------------------------------
> 
> import fileinput, re, string
> list = []
> 
> for line in fileinput.input(textfile):
>     # Comment or blank line?
>     if line == '' or line[0] in '#':
>         continue
>     else:
>         list.append(string.strip(line))
>         # "address1|address2|address3|addressN"
>         regex = string.join(list,'|')
>         regex = '"' + regex + '"'
>         reo = re.compile(regex, re.I)
> 
> --------------------------------------------------------------------

I tried to run your code with a 240 line input file and it ran in no time in a
PII/233 Linux box.

Anyway, I believe that you want to put the last three lines of code outside of
the for loop.

Sebrosa



More information about the Python-list mailing list