Relative performance of comparable regular expressions

John Machin sjmachin at lexicon.net
Tue Jan 13 04:15:18 EST 2009


On Jan 13, 7:24 pm, "Barak, Ron" <Ron.Ba... at lsi.com> wrote:
> Hi,
>
> I have a question about relative performance of comparable regular expressions.
>
> I have large log files that start with three letters month names (non-unicode).
>
> Which would give better performance, matching with  "^[a-zA-Z]{3}", or with "^\S{3}" ?

(1) If you want to match at the start of a line, use re.match()
*without* the pointless "^". Don't use re.search with a pattern
starting with "^" -- it won't be any faster than and it could be a lot
worse; re.search doesn't know to stop if the first match fails:

command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
('^AB')
;text='Z'*100" "rx.match(text)"
1000000 loops, best of 3: 1.15 usec per loop

command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
('^AB')
;text='Z'*100" "rx.search(text)"
100000 loops, best of 3: 4.47 usec per loop

command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
('^AB')
;text='Z'*1000" "rx.search(text)"
10000 loops, best of 3: 34.1 usec per loop

(2) I think you mean "^\s{3}" not "^\S{3}"

(3) Now that you've seen how to do timings, over to you :-)

> Also, which is better (if different at all): "\d\d" or "\d{2}" ?
> Also, would matching "." be different (performance-wise) than matching the actual character, e.g. matching ":" ?
> And lastly, at the end of a line, is there any performance difference between "(.+)$" and "(.+)"

Cheers,
John



More information about the Python-list mailing list