re.search much slower then grep on some regular expressions
pavlovevidence at gmail.com
Sat Jul 5 10:21:41 CEST 2008
On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner"
<basti.wies... at gmx.net> wrote:
> Paddy <paddy3... at googlemail.com>:
> > On Jul 4, 1:36 pm, Peter Otten <__pete... at web.de> wrote:
> >> Henning_Thornblad wrote:
> >> > What can be the cause of the large difference between re.search and
> >> > grep?
> >> grep uses a smarter algorithm ;)
> >> > This script takes about 5 min to run on my computer:
> >> > #!/usr/bin/env python
> >> > import re
> >> > row=""
> >> > for a in range(156000):
> >> > row+="a"
> >> > print re.search('[^ "=]*/',row)
> >> > While doing a simple grep:
> >> > grep '[^ "=]*/' input (input contains 156.000 a in
> >> > one row)
> >> > doesn't even take a second.
> >> > Is this a bug in python?
> >> You could call this a performance bug, but it's not common enough in real
> >> code to get the necessary brain cycles from the core developers.
> >> So you can either write a patch yourself or use a workaround.
> >> re.search('[^ "=]*/', row) if "/" in row else None
> >> might be good enough.
> >> Peter
> > It is not a smarter algorithm that is used in grep. Python RE's have
> > more capabilities than grep RE's which need a slower, more complex
> > algorithm.
> FWIW, grep itself can confirm this statement. The following command roughly
> takes as long as Python's re.search:
> # grep -P '[^ "=]*/' input
> -P tells grep to use real perl-compatible regular expressions.
This confirms that a particular engine might not be optimized for it,
but it's not necessarily a reflection that the engine is more complex.
I'm not sure of the specifics and maybe that is the case, but it could
also be a case of a different codebase which is optimized differently.
More information about the Python-list