re.search much slower then grep on some regular expressions
fernandes.fd at gmail.com
Fri Jul 4 22:43:11 CEST 2008
On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <__peter__ at web.de> wrote:
> Henning_Thornblad wrote:
>> What can be the cause of the large difference between re.search and
> grep uses a smarter algorithm ;)
>> This script takes about 5 min to run on my computer:
>> #!/usr/bin/env python
>> import re
>> for a in range(156000):
>> print re.search('[^ "=]*/',row)
>> While doing a simple grep:
>> grep '[^ "=]*/' input (input contains 156.000 a in
>> one row)
>> doesn't even take a second.
>> Is this a bug in python?
> You could call this a performance bug, but it's not common enough in real
> code to get the necessary brain cycles from the core developers.
> So you can either write a patch yourself or use a workaround.
> re.search('[^ "=]*/', row) if "/" in row else None
> might be good enough.
Wow... I'm rather surprised at how slow this is... using re.match
yields much quicker results, but of course it's not quite the same as
Incidentally, if you add the '/' to "row" at the end of the string,
re.search returns instantly with a match object.
I'm not versed enough in regex to tell if this is a bug or not
(although I suspect it is), but why would you say this particular
regex isn't common enough in real code?
More information about the Python-list