regex (?!..) problem

MRAB python at mrabarnett.plus.com
Mon Oct 5 03:51:21 EDT 2009


Wolfgang Rohdewald wrote:
> Hi,
> 
> I want to match a string only if a word (C1 in this example) appears
> at most once in it. This is what I tried:
> 
>>>> re.match(r'(.*?C1)((?!.*C1))','C1b1b1b1 b3b3b3b3 C1C2C3').groups()
> ('C1b1b1b1 b3b3b3b3 C1', '')
>>>> re.match(r'(.*?C1)','C1b1b1b1 b3b3b3b3 C1C2C3').groups()
> ('C1',)
> 
> but this should not have matched. Why is the .*? behaving greedy
> if followed by (?!.*C1)? I would have expected that re first 
> evaluates (.*?C1) before proceeding at all.
> 
> I also tried:
> 
>>>> re.search(r'(.*?C1(?!.*C1))','C1b1b1b1 b3b3b3b3 
> C1C2C3C4').groups()
> ('C1b1b1b1 b3b3b3b3 C1',)
> 
> with the same problem.
> 
> How could this be done?
> 
You're currently looking for one that's not followed by another; the
solution is to check first whether there are two:

 >>> re.match(r'(?!.*?C1.*?C1)(.*?C1)','C1b1b1b1 b3b3b3b3 C1C2C3').groups()

Traceback (most recent call last):
   File "<pyshell#3>", line 1, in <module>
     re.match(r'(?!.*?C1.*?C1)(.*?C1)','C1b1b1b1 b3b3b3b3 C1C2C3').groups()
AttributeError: 'NoneType' object has no attribute 'groups'



More information about the Python-list mailing list