re.search much slower then grep on some regular expressions

Kris Kennaway kris at FreeBSD.org
Tue Jul 8 09:58:31 EDT 2008


samwyse wrote:
> On Jul 4, 6:43 am, Henning_Thornblad <Henning.Thornb... at gmail.com>
> wrote:
>> What can be the cause of the large difference between re.search and
>> grep?
> 
>> While doing a simple grep:
>> grep '[^ "=]*/' input                  (input contains 156.000 a in
>> one row)
>> doesn't even take a second.
>>
>> Is this a bug in python?
> 
> You might want to look at Plex.
> http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/
> 
> "Another advantage of Plex is that it compiles all of the regular
> expressions into a single DFA. Once that's done, the input can be
> processed in a time proportional to the number of characters to be
> scanned, and independent of the number or complexity of the regular
> expressions. Python's existing regular expression matchers do not have
> this property. "

Very interesting!  Thanks very much for the pointer.

Kris




More information about the Python-list mailing list