re.search much slower then grep on some regular expressions

Fri Jul 4 12:40:47 EDT 2008

On Jul 4, 1:36 pm, Peter Otten <__pete... at web.de> wrote:
> Henning_Thornblad wrote:
> > What can be the cause of the large difference between re.search and
> > grep?
>
> grep uses a smarter algorithm ;)
>
>
>
> > This script takes about 5 min to run on my computer:
> > #!/usr/bin/env python
> > import re
>
> > row=""
> > for a in range(156000):
> >     row+="a"
> > print re.search('[^ "=]*/',row)
>
> > While doing a simple grep:
> > grep '[^ "=]*/' input                  (input contains 156.000 a in
> > one row)
> > doesn't even take a second.
>
> > Is this a bug in python?
>
> You could call this a performance bug, but it's not common enough in real
> code to get the necessary brain cycles from the core developers.
> So you can either write a patch yourself or use a workaround.
>
> re.search('[^ "=]*/', row) if "/" in row else None
>
> might be good enough.
>
> Peter

It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
algorithm.
You could argue that if the costly RE features are not used then maybe
simpler, faster algorithms should be automatically swapped in but ....

- Paddy.