Anomalous behaviour when compiling regular expressions?

Harvey.Thomas at Harvey.Thomas at
Mon Mar 13 15:16:44 CET 2006

Fredrik wrote:

> your definition of "equivalent" is a bit unusual:
> >>> re.match("(a|b*c*)+", "abc").groups()
> ('',)
> >>> re.match("(a|b)*", "abc").groups()
> ('b',)
> >>> re.match("(a|b|c)*", "abc").groups()
> ('c',)
> that you don't get an error for
> >>> r = re.compile('(a|b*c*)+')
> >>> r.match('def').group()
> might be a compiler bug.  running it on 2.3 gives you another error,
> though:
> >>> re.match("(a|b*c*)+", "abc").groups()
> Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
>  File "C:\python23\lib\", line 132, in match
>    return _compile(pattern, flags).match(string)
> RuntimeError: maximum recursion limit exceeded
> (a repeated group with a min-length of zero can match anything an
infinite number of times, which is, in general, not > > what you want)

I agree to a certain extent, but by analogy

<!ELEMENT a (b | c*)+>

is a valid SGML/XML document and their is a lot of (superficial?)
similarity between DTD content models and REs. The element declaration
would (should?) normally be written as

<!ELEMENT a (b | c)*>

The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by Informa for any loss or damage arising in any way from receipt or use thereof.  Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and delete the message and any attachments.  Further enquiries/returns can be sent to postmaster at

More information about the Python-list mailing list