Possible regex match bug (re module)

Tim Peters tim_one at email.msn.com
Mon Apr 5 21:06:45 EDT 1999


[Randall Hopper]
>      Re doesn't handle named groups in alternative patterns like it
> seems it should.  Given an alternative pattern with a particular group
> name in each, it only assigns the match if the group name matches the
> last alterative.

re should raise an exception here -- it never intended to allow your
pattern.  The deal is that symbolic group names are no more than that:
names for numbered groups.  Like so:

>>> import re
>>> p = re.compile('(---(?P<id>[^-]*)---)|(===(?P<id>[^=]*)===)')
>>> p.groupindex
{'id': 4}
>>>

The groupindex member maps a symbolic name to the numeric group for which
it's an alias, and so in this pattern referring to group "id" is identical
to referring to group number 4.  That explains everything you've seen.  re
should instead notice that it already had a definition for name "id", and
complain about the redefinition.

Same as in Perl, you're going to have to write a hairier regexp with only
one interesting group, or give the interesting groups different names and
sort them out after the match (in an alternation involving named groups, at
most one will be non-None after a match).  Here's a discouraging <wink>
example of the former approach:

>>> p = re.compile(r"([-=])\1\1(?P<id>((?!\1).)*)\1\1\1").match
>>> p("---abc---").group("id")
'abc'
>>> p("===def===").group("id")
'def'
>>> print p("===ghi---")
None
>>> p("------").group("id")
''
>>> p("---=---").group("id")
'='
>>> print p("===a=b===")
None
>>>

if-regexps-are-your-friends-you'd-hate-to-meet-your-enemies-ly y'rs  - tim






More information about the Python-list mailing list