Regex trouble

andrew cooke andrew at acooke.org
Wed Apr 1 11:00:51 EDT 2009


".*?" is a "not greedy" match, which is significantly more difficult to
handle than a normal ".*".  so the performance will depend on quite
complex details of how the regular expression engine is implemented.  it
wouldn't surprise me if perl was better here, because it comes from a
background with a much stronger emphasis on regular expressions.

i know that not an exact answer, but unless you find the author of the re
library i am not sure you will get a much better explanation.  it comes
down to whether the regular expression can be implemented using a
deterministic finite automaton (rather than an indeterministic one).  if
you look at the table at the bottom of
http://en.wikipedia.org/wiki/Finite_state_machine then i believe (i am not
100% sure) that the ".*?" match requires at least a pushdown automota,
while ".*" can be handled with a simple finite automaton.

disclaimer: this is all fairly new to me as i just recently implemented a
regular expression matcher myself, and i may be wrong on some of the
details.

andrew


akshat agarwal wrote:
> Hi,
>
> I am trying to use the following snippet of code to print a regex match.
>
> s = '01234567890123456789x012'
>
> pat = r'(.*?x|[^a]+)*y'
>
> mo = re.search(pat, s)
> if mo is not None:
>     print mo.group(0)
>
> By adding one character before the 'x' in the input string, the time taken
> to print the match doubles. This behaviour is not observed in perl. I am
> curious to know about the difference the in regex implementations of perl
> and python which causes this.
>
> Thanks
> --
> http://mail.python.org/mailman/listinfo/python-list
>





More information about the Python-list mailing list