andrew at acooke.org
Wed Apr 1 17:02:55 CEST 2009
more exactly, my guess is perl has a special case for this that avoids
doing a search over all possible matchers via the pushdown stack.
andrew cooke wrote:
> ".*?" is a "not greedy" match, which is significantly more difficult to
> handle than a normal ".*". so the performance will depend on quite
> complex details of how the regular expression engine is implemented. it
> wouldn't surprise me if perl was better here, because it comes from a
> background with a much stronger emphasis on regular expressions.
> i know that not an exact answer, but unless you find the author of the re
> library i am not sure you will get a much better explanation. it comes
> down to whether the regular expression can be implemented using a
> deterministic finite automaton (rather than an indeterministic one). if
> you look at the table at the bottom of
> http://en.wikipedia.org/wiki/Finite_state_machine then i believe (i am not
> 100% sure) that the ".*?" match requires at least a pushdown automota,
> while ".*" can be handled with a simple finite automaton.
> disclaimer: this is all fairly new to me as i just recently implemented a
> regular expression matcher myself, and i may be wrong on some of the
> akshat agarwal wrote:
>> I am trying to use the following snippet of code to print a regex match.
>> s = '01234567890123456789x012'
>> pat = r'(.*?x|[^a]+)*y'
>> mo = re.search(pat, s)
>> if mo is not None:
>> print mo.group(0)
>> By adding one character before the 'x' in the input string, the time
>> to print the match doubles. This behaviour is not observed in perl. I am
>> curious to know about the difference the in regex implementations of
>> and python which causes this.
More information about the Python-list