Regex trouble

andrew cooke andrew at acooke.org
Wed Apr 1 11:02:55 EDT 2009


more exactly, my guess is perl has a special case for this that avoids
doing a search over all possible matchers via the pushdown stack.

andrew cooke wrote:
>
> ".*?" is a "not greedy" match, which is significantly more difficult to
> handle than a normal ".*".  so the performance will depend on quite
> complex details of how the regular expression engine is implemented.  it
> wouldn't surprise me if perl was better here, because it comes from a
> background with a much stronger emphasis on regular expressions.
>
> i know that not an exact answer, but unless you find the author of the re
> library i am not sure you will get a much better explanation.  it comes
> down to whether the regular expression can be implemented using a
> deterministic finite automaton (rather than an indeterministic one).  if
> you look at the table at the bottom of
> http://en.wikipedia.org/wiki/Finite_state_machine then i believe (i am not
> 100% sure) that the ".*?" match requires at least a pushdown automota,
> while ".*" can be handled with a simple finite automaton.
>
> disclaimer: this is all fairly new to me as i just recently implemented a
> regular expression matcher myself, and i may be wrong on some of the
> details.
>
> andrew
>
>
> akshat agarwal wrote:
>> Hi,
>>
>> I am trying to use the following snippet of code to print a regex match.
>>
>> s = '01234567890123456789x012'
>>
>> pat = r'(.*?x|[^a]+)*y'
>>
>> mo = re.search(pat, s)
>> if mo is not None:
>>     print mo.group(0)
>>
>> By adding one character before the 'x' in the input string, the time
>> taken
>> to print the match doubles. This behaviour is not observed in perl. I am
>> curious to know about the difference the in regex implementations of
>> perl
>> and python which causes this.
>>
>> Thanks
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
>





More information about the Python-list mailing list