Regex trouble

andrew cooke andrew at
Wed Apr 1 17:02:55 CEST 2009

more exactly, my guess is perl has a special case for this that avoids
doing a search over all possible matchers via the pushdown stack.

andrew cooke wrote:
> ".*?" is a "not greedy" match, which is significantly more difficult to
> handle than a normal ".*".  so the performance will depend on quite
> complex details of how the regular expression engine is implemented.  it
> wouldn't surprise me if perl was better here, because it comes from a
> background with a much stronger emphasis on regular expressions.
> i know that not an exact answer, but unless you find the author of the re
> library i am not sure you will get a much better explanation.  it comes
> down to whether the regular expression can be implemented using a
> deterministic finite automaton (rather than an indeterministic one).  if
> you look at the table at the bottom of
> then i believe (i am not
> 100% sure) that the ".*?" match requires at least a pushdown automota,
> while ".*" can be handled with a simple finite automaton.
> disclaimer: this is all fairly new to me as i just recently implemented a
> regular expression matcher myself, and i may be wrong on some of the
> details.
> andrew
> akshat agarwal wrote:
>> Hi,
>> I am trying to use the following snippet of code to print a regex match.
>> s = '01234567890123456789x012'
>> pat = r'(.*?x|[^a]+)*y'
>> mo =, s)
>> if mo is not None:
>>     print
>> By adding one character before the 'x' in the input string, the time
>> taken
>> to print the match doubles. This behaviour is not observed in perl. I am
>> curious to know about the difference the in regex implementations of
>> perl
>> and python which causes this.
>> Thanks
>> --

More information about the Python-list mailing list