[2.5] Regex doesn't support MULTILINE?
Jay Loden
python at jayloden.com
Sun Jul 22 01:18:22 EDT 2007
Gilles Ganault wrote:
> Problem is, when I add re.DOTLINE, the search takes less than a second
> for a 500KB file... and about 1mn30 for a file that's 1MB, with both
> files holding similar contents.
>
> Why such a huge difference in performance?
>
> ========= Using Re =============
> import re
> import time
>
> pattern = "<span class=.?defaut.?>(\d+:\d+).*?</span>"
>
> pages = ["500KB.html","1MB.html"]
>
> #Veeeeeeeeeeery slow when parsing 1MB file !
> p = re.compile(pattern,re.IGNORECASE|re.MULTILINE|re.DOTALL)
> #p = re.compile(pattern,re.IGNORECASE|re.MULTILINE)
>
> for page in pages:
> f = open(page, "r")
> response = f.read()
> f.close()
>
> start = time.strftime("%H:%M:%S", time.localtime(time.time()))
> print "before findall @ " + start
> packed = p.findall(response)
> if packed:
> for item in packed:
> print item
> ===========================
>
I don't know if it'll result in a performance difference, but since you're just saving the result of re.findall() to a variable in order to iterate over it, you might as well just use re.finditer() instead:
for item in p.finditer(response):
print item
At least then it can start printing as soon as it hits a match instead of needing to find all the matches first.
-Jay
More information about the Python-list
mailing list