[docs] [issue28450] Misleading/inaccurate documentation about unknown escape sequences in regular expressions

Barry A. Warsaw report at bugs.python.org
Tue Nov 22 15:28:47 EST 2016


Barry A. Warsaw added the comment:

On Nov 22, 2016, at 07:28 PM, Serhiy Storchaka wrote:

>The reason for disallowing some undefined escapes is the same as in pattern
>strings: this would allow as to introduce new special escape sequences.

I'll note that technically speaking, you can still introduce new escapes for
repl without breaking the documented contract.  All the docs say are that
"unknown escapes such as \& are left alone", but that doesn't list what are
unknown escapes.  So if new escapes are added in Python 3.7, and they are
transformed in repl, that would be allowed.

I'll also note that not *all* unknown sequences are rejected now, only
backslashes followed by an ASCII letter.  So \& is still probably left alone,
while \s is now rejected.  That does add to the confusion, although the
deprecation note in the re.sub() documentation does document the new behavior
correctly.

On Nov 22, 2016, at 07:55 PM, R. David Murray wrote:

>There is still the argument that we shouldn't break 2.7 compatibility
>unnecessarily until 2.7 is out of maintenance.  That is: warnings are good,
>removals are bad.  (I haven't read through this issue, so I may be off base.)

This is also a reasonable argument, but not one I've thought about since I'm
using Python 2 only rarely these days.

On Nov 22, 2016, at 07:34 PM, Serhiy Storchaka wrote:

>If you insist I could revert converting warnings to errors (only in
>replacement string or all?) in 3.6.

pattern is a regular expression string so it already follows the syntax as
described in $6.2.1 Regular Expression Syntax.  But I think a reading of that
section (and the "special sequences" bit that follows) could also argue that
unknown escapes shouldn't throw an error.

>But I think they should left errors in 3.7. The earlier we make undefined
>escapes the errors, the earlier we can define new special escape sequences
>without confusing users. It is bad if the escape sequence is valid in two
>Python versions but has different meaning.

Perhaps so, but I do think this is a tricky question from a compatibility
point of view.  One possible optional, although it's late in the cycle, would
be to introduce a new flag so the user could tell re exactly what behavior
they want.  The default would have to be backward compatible (i.e. leave
unknown sequences alone), but there could be say an re.STRICTESCAPES flag that
would cause the error to be thrown.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28450>
_______________________________________


More information about the docs mailing list