re.search much slower then grep on some regular expressions
Peter Otten
__peter__ at web.de
Sat Jul 5 01:58:14 EDT 2008
John Nagle wrote:
> Henning_Thornblad wrote:
>> What can be the cause of the large difference between re.search and
>> grep?
>>
>> This script takes about 5 min to run on my computer:
>> #!/usr/bin/env python
>> import re
>>
>> row=""
>> for a in range(156000):
>> row+="a"
>> print re.search('[^ "=]*/',row)
>>
>>
>> While doing a simple grep:
>> grep '[^ "=]*/' input (input contains 156.000 a in
>> one row)
>> doesn't even take a second.
>>
>> Is this a bug in python?
>>
>> Thanks...
>> Henning Thornblad
>
> You're recompiling the regular expression on each use.
> Use "re.compile" before the loop to do it once.
Now that's premature optimization :-)
Apart from the fact that re.search() is executed only once in the above
script the re library uses a caching scheme so that even if the re.search()
call were in a loop the overhead would be a few microseconds for the cache
lookup.
Peter
More information about the Python-list
mailing list