make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

attn.steven.kuo at attn.steven.kuo at
Thu Mar 29 17:13:39 CEST 2007

On Mar 29, 7:22 am, "aspineux" <aspin... at> wrote:
> I want to parse
> 'foo at bare' or '<foot at bar>' and get the email address foo at bar
> the regex is
> r'<\w+@\w+>|\w+@\w+'
> now, I want to give it a name
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
> BUT because I use a | , I will get only one group named 'email' !
> Any comment ?
> PS: I know the solution for this case is to use  r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'

Regular expressions, alternation, named groups ... oh my!

It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.

>>> pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b(?!>))')
>>> for email in ('foo at bar' , '<foo at bar>', '<start at without_end_bracket'):
...     matched =
...     if matched is not None:
...         print'email')
foo at bar
<foo at bar>

I suggest you try some other solution (maybe pyparsing).

Hope this helps,

More information about the Python-list mailing list