[2.5] Regex doesn't support MULTILINE?
python at jayloden.com
Sun Jul 22 07:18:22 CEST 2007
Gilles Ganault wrote:
> Problem is, when I add re.DOTLINE, the search takes less than a second
> for a 500KB file... and about 1mn30 for a file that's 1MB, with both
> files holding similar contents.
> Why such a huge difference in performance?
> ========= Using Re =============
> import re
> import time
> pattern = "<span class=.?defaut.?>(\d+:\d+).*?</span>"
> pages = ["500KB.html","1MB.html"]
> #Veeeeeeeeeeery slow when parsing 1MB file !
> p = re.compile(pattern,re.IGNORECASE|re.MULTILINE|re.DOTALL)
> #p = re.compile(pattern,re.IGNORECASE|re.MULTILINE)
> for page in pages:
> f = open(page, "r")
> response = f.read()
> start = time.strftime("%H:%M:%S", time.localtime(time.time()))
> print "before findall @ " + start
> packed = p.findall(response)
> if packed:
> for item in packed:
> print item
I don't know if it'll result in a performance difference, but since you're just saving the result of re.findall() to a variable in order to iterate over it, you might as well just use re.finditer() instead:
for item in p.finditer(response):
At least then it can start printing as soon as it hits a match instead of needing to find all the matches first.
More information about the Python-list