make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name
attn.steven.kuo at gmail.com
attn.steven.kuo at gmail.com
Thu Mar 29 11:13:39 EDT 2007
On Mar 29, 7:22 am, "aspineux" <aspin... at gmail.com> wrote:
> I want to parse
>
> 'foo at bare' or '<foot at bar>' and get the email address foo at bar
>
> the regex is
>
> r'<\w+@\w+>|\w+@\w+'
>
> now, I want to give it a name
>
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
>
> BUT because I use a | , I will get only one group named 'email' !
>
> Any comment ?
>
> PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'
Regular expressions, alternation, named groups ... oh my!
It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.
>>> pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b(?!>))')
>>> for email in ('foo at bar' , '<foo at bar>', '<start at without_end_bracket'):
... matched = pattern.search(email)
... if matched is not None:
... print matched.group('email')
...
foo at bar
<foo at bar>
I suggest you try some other solution (maybe pyparsing).
--
Hope this helps,
Steven
More information about the Python-list
mailing list