regexp non-greedy matching bug?

Fredrik Lundh fredrik at pythonware.com
Mon Dec 5 09:31:56 CET 2005


Aahz wrote:

> While you're technically correct, I've been bitten too many times by
> forgetting whether to use match() or search().  I've fixed that problem
> by choosing to always use search() and combine with ^ as appropriate.

that's a bit suboptimal, though, at least for cases where failed matches
are rather common:

C:\>timeit -s "import re; p = re.compile('b')" "p.match('a'*100)"
100000 loops, best of 3: 6.14 usec per loop

C:\>timeit -s "import re; p = re.compile('^b')" "p.match('a'*100)"
100000 loops, best of 3: 6.25 usec per loop

C:\>timeit -s "import re; p = re.compile('^b')" "p.search('a'*100)"
100000 loops, best of 3: 15.4 usec per loop

(afaik, search doesn't have any heuristics for figuring out if it can skip
the search, so it'll check ^ against all available positions)

on the other hand, benchmarking RE:s always results in confusing
results:

C:\>timeit -s "import re; p = re.compile('b')" "p.search('a'*100)"
100000 loops, best of 3: 4.32 usec per loop

(should this really be *faster* than match for this case ?)

</F>






More information about the Python-list mailing list