[Tutor] regex: don't match embedded quotes
Peter Otten
__peter__ at web.de
Tue Jun 11 13:29:23 CEST 2013
Albert-Jan Roskam wrote:
>> I have written a regex that is supposed to match correctly quoted (single
>> quotes on each side, or double quotes on each side) text. It works, but
> Okay, I am having blood-shut eyes now, but I think I've got it:
>>>> matches =
>>>> re.finditer("(?P<quote>['\"])(?P<comment>(?<!(?P=quote)).*?)
(?P=quote)",
>>>> s)
>>>> [match.group("comment") for match in matches]
> ['test', 'blah', 'difficult "One"']
>
> In other words: The 'comment' group should preceded by be a negative
> lookbehind (?<!) to the 'quote' group, followed by a non-greedy match of
> anything (.*?). Not sure if ".*?" is a good idea, ie
> zero-or-more-of-anything.
I think a non-greedy match is sufficient; you don't need the look-behind:
>>> s = "some enumeration 1 'test' 2 'blah' 3 'difficult \"One\"'."
>>> matches = re.finditer("(?P<quote>['\"])(?P<comment>.*?)(?P=quote)", s)
>>> [match.group("comment") for match in matches]
['test', 'blah', 'difficult "One"']
More information about the Tutor
mailing list