split hangs

davide at mercurio.localdomain davide at mercurio.localdomain
Fri Dec 29 10:48:07 EST 2000


In comp.lang.python, you wrote:

>The tail end of the output is clear:  adding another character to the string
>doubles the time it takes to figure out that the regexp doesn't match.  This
>isn't a bug, it's the way Python (and Perl, and Emacs, and ...) regexps
>work.  A full explanation is quite involved; you can find one in Jonathan
>Friedl's excellent book "Mastering Regular Expressions" (O'Reilly).

I tried the equivalent regexp in emacs
  <\([^<>"]+\("[^"]*"\)?\)+>
with bigger and bigger files and it does not seem to give an exponential
behaviour.

In everything I read regexp matching is given as a linear time problem in
n + m (where n is the size of the testing string and m the size of the
regular expression).

I am a python newbye and I could not find the reason for this exp 
behaviour. The regular expressions seems pretty standard to me, 
and so there should be no need to go beyond regular languages and DFA. 

Would you explain why this algorithm is needed in python?
Do really the other implementation (grep, perl, emacs, ecc...)
use the same algorithm?

Thanks,

				Davide




More information about the Python-list mailing list