string.join is abysmally slow
Jose' Sebrosa
sebrosa at artenumerica.com
Sun Apr 15 23:03:22 EDT 2001
Graham Guttocks wrote:
>
> Greetings,
>
> I've run into a performance problem in one of my functions, and wonder
> if I could get some recommendations on how to speed things up.
>
> What I'm trying to do is read in a textfile containing e-mail
> addresses, one per line, and use them to build a regular expression
> object in the form "address1|address2|address3|addressN" to search
> against.
>
> I'm using string.join to concatenate the addresses together, separated
> by a `|'. The problem is that string.join is unacceptably slow in
> this task. The following program takes 37 seconds on a PIII/700 to
> process a 239-line file!
>
> --------------------------------------------------------------------
>
> import fileinput, re, string
> list = []
>
> for line in fileinput.input(textfile):
> # Comment or blank line?
> if line == '' or line[0] in '#':
> continue
> else:
> list.append(string.strip(line))
> # "address1|address2|address3|addressN"
> regex = string.join(list,'|')
> regex = '"' + regex + '"'
> reo = re.compile(regex, re.I)
>
> --------------------------------------------------------------------
I tried to run your code with a 240 line input file and it ran in no time in a
PII/233 Linux box.
Anyway, I believe that you want to put the last three lines of code outside of
the for loop.
Sebrosa
More information about the Python-list
mailing list