Thanks Andrew. I was also of the same view that perl handled this via some special cases.<br><br><div class="gmail_quote">On Wed, Apr 1, 2009 at 8:32 PM, andrew cooke <span dir="ltr"><<a href="mailto:andrew@acooke.org">andrew@acooke.org</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>


more exactly, my guess is perl has a special case for this that avoids<br>


doing a search over all possible matchers via the pushdown stack.<br>


<div><div></div><div class="h5"><br>


andrew cooke wrote:<br>


><br>


> ".*?" is a "not greedy" match, which is significantly more difficult to<br>


> handle than a normal ".*".  so the performance will depend on quite<br>


> complex details of how the regular expression engine is implemented.  it<br>


> wouldn't surprise me if perl was better here, because it comes from a<br>


> background with a much stronger emphasis on regular expressions.<br>


><br>


> i know that not an exact answer, but unless you find the author of the re<br>


> library i am not sure you will get a much better explanation.  it comes<br>


> down to whether the regular expression can be implemented using a<br>


> deterministic finite automaton (rather than an indeterministic one).  if<br>


> you look at the table at the bottom of<br>


> <a href="http://en.wikipedia.org/wiki/Finite_state_machine" target="_blank">http://en.wikipedia.org/wiki/Finite_state_machine</a> then i believe (i am not<br>


> 100% sure) that the ".*?" match requires at least a pushdown automota,<br>


> while ".*" can be handled with a simple finite automaton.<br>


><br>


> disclaimer: this is all fairly new to me as i just recently implemented a<br>


> regular expression matcher myself, and i may be wrong on some of the<br>


> details.<br>


><br>


> andrew<br>


><br>


><br>


> akshat agarwal wrote:<br>


>> Hi,<br>


>><br>


>> I am trying to use the following snippet of code to print a regex match.<br>


>><br>


>> s = '01234567890123456789x012'<br>


>><br>


>> pat = r'(.*?x|[^a]+)*y'<br>


>><br>


>> mo = re.search(pat, s)<br>


>> if mo is not None:<br>


>>     print mo.group(0)<br>


>><br>


>> By adding one character before the 'x' in the input string, the time<br>


>> taken<br>


>> to print the match doubles. This behaviour is not observed in perl. I am<br>


>> curious to know about the difference the in regex implementations of<br>


>> perl<br>


>> and python which causes this.<br>


>><br>


>> Thanks<br>


>> --<br>


>> <a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>


>><br>


><br>


><br>


<br>


<br>


</div></div></blockquote></div><br>