Nothing to repeat

Terry Reedy tjreedy at udel.edu
Sun Jan 9 13:58:09 EST 2011


On 1/9/2011 11:49 AM, Tom Anderson wrote:
> Hello everyone, long time no see,
>
> This is probably not a Python problem, but rather a regular expressions
> problem.
>
> I want, for the sake of arguments, to match strings comprising any
> number of occurrences of 'spa', each interspersed by any number of
> occurrences of the 'm'. 'any number' includes zero, so the whole pattern
> should match the empty string.

All you sure? A pattern that matches the empty string matches every string.

> Here's the conversation Python and i had about it:
>
> Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:16)
> [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import re
>>>> re.compile("(spa|m*)*")


I believe precedence rule of * tighter than | (not in the doc) makes 
this re is the same as "(spa|(m)*)*", which gives same error traceback. 
I believe that for this, re compiles first (spa)* and then ((m)*)* and 
the latter gives the same traceback. Either would seem to match strings 
of 'm's without and 'spa', which is not your spec.

"((spa|m)*)*" does compile, so it is not the nesting itself.

The doc does not give the formal grammar for Python re's, so it is hard 
to pinpoint which informal rule is violated, or if indeed the error is a 
bug. Someone else may do better.

> Now, i could actually rewrite this particular pattern as '(spa|m)*'.

That also does not match your spec.

> Any thoughts on what i should do? Do i have to bite the bullet and apply
> some cleverness in my pattern generation to avoid situations like this?

Well, it has to generate legal re's according to the engine you are 
using (with whatever bugs and limitations it has).

-- 
Terry Jan Reedy




More information about the Python-list mailing list