why are re group names so restrictive?

Skip Montanaro skip at pobox.com
Fri May 9 15:56:58 EDT 2003


Why does re restrict the characters in group names to be Python identifiers?
They seem to only be used where strings are allowed, thus the character set
should only exclude ">" and ")" (and possibly "<" and ")" for symmetry).  If
I try to create a group name which doesn't look like a Python identifier, it
complains (as documented):

    >>> regex = re.compile("(?P<p+>.*)")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/Users/skip/local/lib/python2.3/sre.py", line 179, in compile
        return _compile(pattern, flags)
      File "/Users/skip/local/lib/python2.3/sre.py", line 229, in _compile
        raise error, v # invalid expression
    sre_constants.error: bad character in group name
    >>> regex = re.compile("(?P<9p>.*)")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/Users/skip/local/lib/python2.3/sre.py", line 179, in compile
        return _compile(pattern, flags)
      File "/Users/skip/local/lib/python2.3/sre.py", line 229, in _compile
        raise error, v # invalid expression
    sre_constants.error: bad character in group name
    >>> regex = re.compile("(?P<p9>.*)")

Are there some contexts where group names are used like Python identifiers
which force this restriction?  I could understand the restriction if groups
could be accessed as attributes of a match object, e.g.:

    >>> regex = re.compile("(?P<p9>.*)")
    >>> mat = regex.match("abc 123")
    >>> mat.group(1)
    'abc 123'
    >>> mat.group("p9")
    'abc 123'
    >>> mat.p9
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    AttributeError: p9

but that isn't possible.

Thx,

Skip





More information about the Python-list mailing list