On 8/12/2019 12:08 AM, Serhiy Storchaka wrote:
Currently a raw literal cannot end in a single backslash (e.g. in r"C:\User\"). Although there are reasons for this. It is an old gotcha, and there are many closed issues about it. This question is even included in FAQ.
Hmm. I didn't find it documentation, and searching several ways for it in a FAQ, I wasn't able to find it either.
The most common workarounds are:
r"C:\User" "\\"
and
r"C:\User\ "[:-1]
I tried to experiment. It was easy to make the parser allowing a trailing backslash character. It was more difficult to change the Python implementation in the tokenizer module. But this change breaks existing code in more sites than I expected. 14 Python files in the stdlib (not counting tokenizer.py) will need to be fixed. In all cases it is a regular expression.
Few examples:
1. r"([\"\\])"
If only one type of quotes is used in a string, we can just use different kind of quotes for creating a string literal and remove escaping.
r'(["\\])'
2. r'(\'[^\']*\'|"[^"]*"|...'
If different types o quotes are used in different parts of a string, we can use implicit concatenation of string literals created with different quotes (in any case a regular expression is long and should be split on several lines on semantic boundaries).
r"('[^']*'|" r'"[^"]*"|' r'...'
3. r"([^.'\"\\#]\b|^)"
You can also use triple quotes if the string contain both type of quotes together.
r"""([^.'"\\#]\b|^)"""
4. In rare cases a multiline raw string literals can contain both `'''` and `"""`. In this case you can use implicit concatenation of string literals created with different triple quotes.
See https://github.com/python/cpython/pull/15217 .
I do not think we are ready for such breaking change. It will break more code than forbidding unrecognized escape sequences, and the required fixes are less trivial.
Thanks for your investigation, Serhiy. Point 3 seems like the easiest way to convert most regular expressions containing \" or \' from r"..." form to v"""...""", without disturbing the internal gibberish in the regular expression, and without needing significant analysis. Regarding point 4, if it is a string literal used as a regexp, internal triple quotes can be recoded as "{3} and '{3} . But whether or not it is used as a regexp, I fail to find a syntax that permits the creation of a multiline raw string contining both "'''" and '"""', without using implicit concatenation. Since implicit concatenation must already be in use for that case, converting from raw string to verbatim string is straightforward.