[docs] [issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.

Ezio Melotti report at bugs.python.org
Sun Jan 29 16:32:27 CET 2012


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

[\w] should definitely work, but [\B] doesn't seem to match anything useful, and it just fails silently because it's neither equivalent to \B nor to [B]:
>>> re.match(r'foo\B', 'foobar')  # on a non-word-boundary -- matches fine
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'foo[B]', 'fooBar')  # same as r'fooB'
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'foo[\B]', 'foobar')  # not equivalent to \B
>>> re.match(r'foo[\B]', 'fooBar')  # not equivalent to [B]

The same is true for \Z and \A:
>>> re.match(r'foo\Z', 'foo')  # end of the string -- matches fine
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'foo[Z]', 'fooZ')  # same as r'fooZ'
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'foo[\Z]', 'foo')  # not equivalent to \Z
>>> re.match(r'foo[\Z]', 'fooZ')  # not equivalent to [Z]
>>>
>>> re.match(r'\Afoo', 'foo')  # beginning of the string -- matches fine
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'[A]foo', 'Afoo')  # same as r'Afoo'
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'[\A]foo', 'foo')  # not equivalent to \A
>>> re.match(r'[\A]foo', 'Afoo')  # not equivalent to [A]

Inside [], \b switches from word boundary to backspace:
>>> re.match(r'foo\b', 'foobar')  # not on a word boundary -- no matches
>>> re.match(r'foo\b', 'foo bar')  # on a word boundary  -- matches fine
<_sre.SRE_Match object at 0xb74a4ec8>
>>> re.match(r'foo[\b]', 'foo bar')  # not equivalent to \b
>>> re.match(r'foo[\b]', 'foo\bbar')  # matches backspace
<_sre.SRE_Match object at 0xb76dd3d8>
>>> re.match(r'foo([\b])', 'foo\bbar').group(1)
'\x08'

Given that \b doesn't keep its word boundary meaning inside the [], \B (and \A and \Z) shouldn't keep it either (also because I can't see how having these inside [] would be of any use).
On the other hand I'm not sure they should be equivalent to B, A, Z either.  There are several escape sequences in the form \X (where X is an upper- or lower-case letter) that are not equivalent to X (\a\b\d\f\s\x\w\D\S\W...).
Raising an error that says something like "I don't think [\A] does what you think it does, use [A] instead." might be a better option (and in case anyone is wondering about re.escape, I just checked and it doesn't escape letters).  Even if this is technically backward incompatible, any string that has \A, \B, \Z inside [] can be considered buggy IMHO (unless someone can come up with a valid use case where they do something useful).

----------
assignee: docs at python -> 

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13899>
_______________________________________


More information about the docs mailing list