Regex trouble

andrew cooke andrew at
Wed Apr 1 17:00:51 CEST 2009

".*?" is a "not greedy" match, which is significantly more difficult to
handle than a normal ".*".  so the performance will depend on quite
complex details of how the regular expression engine is implemented.  it
wouldn't surprise me if perl was better here, because it comes from a
background with a much stronger emphasis on regular expressions.

i know that not an exact answer, but unless you find the author of the re
library i am not sure you will get a much better explanation.  it comes
down to whether the regular expression can be implemented using a
deterministic finite automaton (rather than an indeterministic one).  if
you look at the table at the bottom of then i believe (i am not
100% sure) that the ".*?" match requires at least a pushdown automota,
while ".*" can be handled with a simple finite automaton.

disclaimer: this is all fairly new to me as i just recently implemented a
regular expression matcher myself, and i may be wrong on some of the


akshat agarwal wrote:
> Hi,
> I am trying to use the following snippet of code to print a regex match.
> s = '01234567890123456789x012'
> pat = r'(.*?x|[^a]+)*y'
> mo =, s)
> if mo is not None:
>     print
> By adding one character before the 'x' in the input string, the time taken
> to print the match doubles. This behaviour is not observed in perl. I am
> curious to know about the difference the in regex implementations of perl
> and python which causes this.
> Thanks
> --

More information about the Python-list mailing list