Relative performance of comparable regular expressions

Steve Holden steve at
Tue Jan 13 06:47:05 EST 2009

John Machin wrote:
> On Jan 13, 7:24 pm, "Barak, Ron" <Ron.Ba... at> wrote:
>> Hi,
>> I have a question about relative performance of comparable regular expressions.
>> I have large log files that start with three letters month names (non-unicode).
>> Which would give better performance, matching with  "^[a-zA-Z]{3}", or with "^\S{3}" ?
> (1) If you want to match at the start of a line, use re.match()
> *without* the pointless "^". Don't use with a pattern
> starting with "^" -- it won't be any faster than and it could be a lot
> worse; doesn't know to stop if the first match fails:
> command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
> ('^AB')
> ;text='Z'*100" "rx.match(text)"
> 1000000 loops, best of 3: 1.15 usec per loop
> command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
> ('^AB')
> ;text='Z'*100" ""
> 100000 loops, best of 3: 4.47 usec per loop
> command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
> ('^AB')
> ;text='Z'*1000" ""
> 10000 loops, best of 3: 34.1 usec per loop
> (2) I think you mean "^\s{3}" not "^\S{3}"
> (3) Now that you've seen how to do timings, over to you :-)
>> Also, which is better (if different at all): "\d\d" or "\d{2}" ?
>> Also, would matching "." be different (performance-wise) than matching the actual character, e.g. matching ":" ?
>> And lastly, at the end of a line, is there any performance difference between "(.+)$" and "(.+)"
Of course if the log strings all begin with a string like "Dec 12 2009
...." then you don't need regular expressions at all - just pull the
characters out using their positions and slicing. The month would be
string[0:3] and so on.

Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC    

More information about the Python-list mailing list