make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

attn.steven.kuo at gmail.com attn.steven.kuo at gmail.com
Thu Mar 29 11:13:39 EDT 2007


On Mar 29, 7:22 am, "aspineux" <aspin... at gmail.com> wrote:
> I want to parse
>
> 'foo at bare' or '<foot at bar>' and get the email address foo at bar
>
> the regex is
>
> r'<\w+@\w+>|\w+@\w+'
>
> now, I want to give it a name
>
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
>
> BUT because I use a | , I will get only one group named 'email' !
>
> Any comment ?
>
> PS: I know the solution for this case is to use  r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'



Regular expressions, alternation, named groups ... oh my!

It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.

>>> pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b(?!>))')
>>> for email in ('foo at bar' , '<foo at bar>', '<start at without_end_bracket'):
...     matched = pattern.search(email)
...     if matched is not None:
...         print matched.group('email')
...
foo at bar
<foo at bar>


I suggest you try some other solution (maybe pyparsing).

--
Hope this helps,
Steven




More information about the Python-list mailing list