[docs] [issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.
Ezio Melotti
report at bugs.python.org
Sun Jan 29 16:32:27 CET 2012
Ezio Melotti <ezio.melotti at gmail.com> added the comment:
[\w] should definitely work, but [\B] doesn't seem to match anything useful, and it just fails silently because it's neither equivalent to \B nor to [B]:
>>> re.match(r'foo\B', 'foobar') # on a non-word-boundary -- matches fine
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'foo[B]', 'fooBar') # same as r'fooB'
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'foo[\B]', 'foobar') # not equivalent to \B
>>> re.match(r'foo[\B]', 'fooBar') # not equivalent to [B]
The same is true for \Z and \A:
>>> re.match(r'foo\Z', 'foo') # end of the string -- matches fine
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'foo[Z]', 'fooZ') # same as r'fooZ'
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'foo[\Z]', 'foo') # not equivalent to \Z
>>> re.match(r'foo[\Z]', 'fooZ') # not equivalent to [Z]
>>>
>>> re.match(r'\Afoo', 'foo') # beginning of the string -- matches fine
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'[A]foo', 'Afoo') # same as r'Afoo'
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'[\A]foo', 'foo') # not equivalent to \A
>>> re.match(r'[\A]foo', 'Afoo') # not equivalent to [A]
Inside [], \b switches from word boundary to backspace:
>>> re.match(r'foo\b', 'foobar') # not on a word boundary -- no matches
>>> re.match(r'foo\b', 'foo bar') # on a word boundary -- matches fine
<_sre.SRE_Match object at 0xb74a4ec8>
>>> re.match(r'foo[\b]', 'foo bar') # not equivalent to \b
>>> re.match(r'foo[\b]', 'foo\bbar') # matches backspace
<_sre.SRE_Match object at 0xb76dd3d8>
>>> re.match(r'foo([\b])', 'foo\bbar').group(1)
'\x08'
Given that \b doesn't keep its word boundary meaning inside the [], \B (and \A and \Z) shouldn't keep it either (also because I can't see how having these inside [] would be of any use).
On the other hand I'm not sure they should be equivalent to B, A, Z either. There are several escape sequences in the form \X (where X is an upper- or lower-case letter) that are not equivalent to X (\a\b\d\f\s\x\w\D\S\W...).
Raising an error that says something like "I don't think [\A] does what you think it does, use [A] instead." might be a better option (and in case anyone is wondering about re.escape, I just checked and it doesn't escape letters). Even if this is technically backward incompatible, any string that has \A, \B, \Z inside [] can be considered buggy IMHO (unless someone can come up with a valid use case where they do something useful).
----------
assignee: docs at python ->
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13899>
_______________________________________
More information about the docs
mailing list