re.search much slower then grep on some regular expressions
paddy3118 at googlemail.com
Fri Jul 4 18:40:47 CEST 2008
On Jul 4, 1:36 pm, Peter Otten <__pete... at web.de> wrote:
> Henning_Thornblad wrote:
> > What can be the cause of the large difference between re.search and
> > grep?
> grep uses a smarter algorithm ;)
> > This script takes about 5 min to run on my computer:
> > #!/usr/bin/env python
> > import re
> > row=""
> > for a in range(156000):
> > row+="a"
> > print re.search('[^ "=]*/',row)
> > While doing a simple grep:
> > grep '[^ "=]*/' input (input contains 156.000 a in
> > one row)
> > doesn't even take a second.
> > Is this a bug in python?
> You could call this a performance bug, but it's not common enough in real
> code to get the necessary brain cycles from the core developers.
> So you can either write a patch yourself or use a workaround.
> re.search('[^ "=]*/', row) if "/" in row else None
> might be good enough.
It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
You could argue that if the costly RE features are not used then maybe
simpler, faster algorithms should be automatically swapped in but ....
More information about the Python-list