Relative performance of comparable regular expressions

Steve Holden steve at holdenweb.com
Tue Jan 13 12:47:05 CET 2009


John Machin wrote:
> On Jan 13, 7:24 pm, "Barak, Ron" <Ron.Ba... at lsi.com> wrote:
>> Hi,
>>
>> I have a question about relative performance of comparable regular expressions.
>>
>> I have large log files that start with three letters month names (non-unicode).
>>
>> Which would give better performance, matching with  "^[a-zA-Z]{3}", or with "^\S{3}" ?
> 
> (1) If you want to match at the start of a line, use re.match()
> *without* the pointless "^". Don't use re.search with a pattern
> starting with "^" -- it won't be any faster than and it could be a lot
> worse; re.search doesn't know to stop if the first match fails:
> 
> command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
> ('^AB')
> ;text='Z'*100" "rx.match(text)"
> 1000000 loops, best of 3: 1.15 usec per loop
> 
> command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
> ('^AB')
> ;text='Z'*100" "rx.search(text)"
> 100000 loops, best of 3: 4.47 usec per loop
> 
> command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
> ('^AB')
> ;text='Z'*1000" "rx.search(text)"
> 10000 loops, best of 3: 34.1 usec per loop
> 
> (2) I think you mean "^\s{3}" not "^\S{3}"
> 
> (3) Now that you've seen how to do timings, over to you :-)
> 
>> Also, which is better (if different at all): "\d\d" or "\d{2}" ?
>> Also, would matching "." be different (performance-wise) than matching the actual character, e.g. matching ":" ?
>> And lastly, at the end of a line, is there any performance difference between "(.+)$" and "(.+)"
> 
Of course if the log strings all begin with a string like "Dec 12 2009
...." then you don't need regular expressions at all - just pull the
characters out using their positions and slicing. The month would be
string[0:3] and so on.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/




More information about the Python-list mailing list