string.join is abysmally slow
Gerhard Häring
gerhard.nospam at bigfoot.de
Sun Apr 15 18:28:48 EDT 2001
On Mon, 16 Apr 2001 10:59:11 +1200 (NZST), Graham Guttocks wrote:
>Greetings,
>
>I've run into a performance problem in one of my functions, and wonder
>if I could get some recommendations on how to speed things up.
>
>What I'm trying to do is read in a textfile containing e-mail
>addresses, one per line, and use them to build a regular expression
>object in the form "address1|address2|address3|addressN" to search
>against.
>
>I'm using string.join to concatenate the addresses together, separated
>by a `|'. The problem is that string.join is unacceptably slow in
>this task. The following program takes 37 seconds on a PIII/700 to
>process a 239-line file!
Yes, the number one rule on optimization is not to make assumptions but to
profile instead :-) I didn't do that, but I would bet it's the re.compile being
called 239 times that wastes the most time. Just do one re.compile after having
built the regular expression and it should be a lot faster.
Gerhard
>
>--------------------------------------------------------------------
>
>import fileinput, re, string
>list = []
>
>for line in fileinput.input(textfile):
> # Comment or blank line?
> if line == '' or line[0] in '#':
> continue
> else:
> list.append(string.strip(line))
> # "address1|address2|address3|addressN"
> regex = string.join(list,'|')
> regex = '"' + regex + '"'
> reo = re.compile(regex, re.I)
--
mail: gerhard <at> bigfoot <dot> de
web: http://highqualdev.com
More information about the Python-list
mailing list