string.join is abysmally slow

Ixokai news at myNOSPAM.org
Sun Apr 15 19:29:09 EDT 2001


Hello,

    I'd think that there has to be a better way to go about the problem. I
mean, I'd imagine its slow because strings are immutable; every time you
append something new, it has to allocate a new block of memory to be the
proper size, copy everything over, then deallocate the old one if its not
referenced anymore. This is just a guess, mind you.

    If your file is strictly one-address-per line, why not 'list =
file.readlines()' then 'if email in list:'.  Both operations take a fraction
of a second on my p2-400mhz. I tested it with a mock-up file of 239 pretend
email addresses I generated.

    Then again... after loading the entire file into a list of 'lines',
"regex = "|".join(lines)" is also nearly instantaneous for me. What version
of Python are you using?

--Stephen
(to reply, remove 'NOSPAM' and replace with 'seraph')


"Graham Guttocks" <graham_guttocks at yahoo.co.nz> wrote in message
news:mailman.987375622.11843.python-list at python.org...
> Greetings,
>
> I've run into a performance problem in one of my functions, and wonder
> if I could get some recommendations on how to speed things up.
>
> What I'm trying to do is read in a textfile containing e-mail
> addresses, one per line, and use them to build a regular expression
> object in the form "address1|address2|address3|addressN" to search
> against.
>
> I'm using string.join to concatenate the addresses together, separated
> by a `|'.  The problem is that string.join is unacceptably slow in
> this task.  The following program takes 37 seconds on a PIII/700 to
> process a 239-line file!
>
> --------------------------------------------------------------------
>
> import fileinput, re, string
> list = []
>
> for line in fileinput.input(textfile):
>     # Comment or blank line?
>     if line == '' or line[0] in '#':
>         continue
>     else:
>         list.append(string.strip(line))
>         # "address1|address2|address3|addressN"
>         regex = string.join(list,'|')
>         regex = '"' + regex + '"'
>         reo = re.compile(regex, re.I)
>
> --------------------------------------------------------------------
>
>
>
____________________________________________________________________________
_
> http://movies.yahoo.com.au - Yahoo! Movies
> - Now showing: Dude Where's My Car, The Wedding Planner, Traffic..
>




-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----



More information about the Python-list mailing list