[Tutor] regex: don't match embedded quotes

Peter Otten __peter__ at web.de
Tue Jun 11 13:29:23 CEST 2013


Albert-Jan Roskam wrote:

>> I have written a regex that is supposed to match correctly quoted (single
>> quotes on each side, or double quotes on each side) text. It works, but

> Okay, I am having blood-shut eyes now, but I think I've got it:
 
>>>> matches =
>>>> re.finditer("(?P<quote>['\"])(?P<comment>(?<!(?P=quote)).*?)
(?P=quote)",
>>>> s)
>>>> [match.group("comment") for match in matches]
> ['test', 'blah', 'difficult "One"']
> 
> In other words: The 'comment' group should preceded by be a negative
> lookbehind (?<!) to the 'quote' group, followed by a non-greedy match of
> anything (.*?). Not sure if ".*?" is a good idea, ie
> zero-or-more-of-anything.

I think a non-greedy match is sufficient; you don't need the look-behind:

>>> s = "some enumeration 1 'test' 2 'blah' 3 'difficult \"One\"'."
>>> matches = re.finditer("(?P<quote>['\"])(?P<comment>.*?)(?P=quote)", s)
>>> [match.group("comment") for match in matches]
['test', 'blah', 'difficult "One"']




More information about the Tutor mailing list