Anomalous behaviour when compiling regular expressions?
Fredrik Lundh
fredrik at pythonware.com
Mon Mar 13 06:23:42 EST 2006
Harvey.Thomas at informa.com wrote:
> >>> import re
> >>> r = re.compile('(a|b*)+')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "c:\python24\lib\sre.py", line 180, in compile
> return _compile(pattern, flags)
> File "c:\python24\lib\sre.py", line 227, in _compile
> raise error, v # invalid expression
> sre_constants.error: nothing to repeat
>
> but
>
> >>> r = re.compile('(a|b*c*)+')
> >>> r.match('def').group()
> ''
>
> Why is there a difference in behaviour between the two cases. Surely the
> two cases are equivalent to:
>
> >>> r = re.compile('(a|b)*')
> >>> r.match('def').group()
> ''
>
> and
>
> >>> r = re.compile('(a|b|c)*')
> >>> r.match('def').group()
> ''
your definition of "equivalent" is a bit unusual:
>>> re.match("(a|b*c*)+", "abc").groups()
('',)
>>> re.match("(a|b)*", "abc").groups()
('b',)
>>> re.match("(a|b|c)*", "abc").groups()
('c',)
that you don't get an error for
> >>> r = re.compile('(a|b*c*)+')
> >>> r.match('def').group()
might be a compiler bug. running it on 2.3 gives you another error,
though:
>>> re.match("(a|b*c*)+", "abc").groups()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\python23\lib\sre.py", line 132, in match
return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded
(a repeated group with a min-length of zero can match anything an
infinite number of times, which is, in general, not what you want)
</F>
More information about the Python-list
mailing list