[Tutor] Re: Speeding up file processing?

Lee Harr missive at hotmail.com
Tue Nov 11 19:47:36 EST 2003


import re

f = open("today.test") # the server log file

fields = []

for line in f:
    if re.match('^shirky.com', line): # find hits from my site
        fields = line.split()
        try: referer = fields[11] # grab the referer
        except: continue          # continue if there is a mangled line
        referer = re.sub('"', '', referer)
        if re.search("shirky", referer): continue # ignore internal links
        if re.search("-", referer):      continue # ...and email clicks
        referer = re.sub("www.", "", referer)
        print referer

I am not sure, but I think string methods will be faster than using
all those regular expressions:

if line.startswith('shirky')
referer.replace('"', '')
referer.find('shirky') >= 0

Makes the code easier to read (for me) anyhow.

If you try it, let us know if it helps...

The new MSN 8: advanced junk mail protection and 2 months FREE* 

More information about the Tutor mailing list