idiom for RE matching

Thu Jul 19 19:03:23 EDT 2007

On 7/19/07, Gordon Airporte <JHoover at fbi.gov> wrote:
>
> I have some code which relies on running each line of a file through a
> large number of regexes which may or may not apply. For each pattern I
> want to match I've been writing
>
> gotit = mypattern.findall(line)

Try to use iterator function finditer instead of findall. To see the
difference run below code by calling findIter or findAll function one at a
time in for loop.  You can have achieve atleast 4x better performance.

-----------------------------------------------------------------------------------
import re
import time

m = re.compile(r'(\d+/\d+/\d+)')
line = "Today's date is 21/07/2007 then yesterday's  20/07/2007"

def findIter(line):
    m.finditer(line)
    glist = [x.group(0) for x in g]

def findAll(line):
    glist = m.findall(line)

start = time.time()
for i in xrange(1000000):
    #findIter(line)
    findAll(line)
end = time.time()

print end-start

--------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070720/e8441f56/attachment.html>