[Tutor] use of raw strings with regular expression patterns
Cameron Simpson
cs at cskk.id.au
Sat Nov 7 16:42:02 EST 2020
On 06Nov2020 22:33, Manprit Singh <manpritsinghece at gmail.com> wrote:
>As you know there are some special characters in regular expressions ,
>like
>:
>\A, \B, \b, \d, \D, \s, \S, \w, \W, \Z
>
>is it necessary to use raw string notation like r'\A' while using re
>patterns made up of these characters ?
Another thing not mentioned in the replies is the backslash itself.
The advantage of a raw string is that when you write a backslash, it is
part of the string as-is.
So to put a backslash in a regular string, so that it is part of the
result, you would need to write:
\\
In a raw string, you just write:
\
exactly as you want things.
Now, it happens that in a regular string a backslash _not_ followed by a
special character (eg "n" for "\n", a newline) is preserved. So they get
through to the final string anyway. But the moment you _do_ follow the
backslash with such a character, it is consumed and the character
translated.
Example:
\h
Ordinary string '\h' -> \h
Raw string: r'\h' -> \h
A backslash and an "h" in the result.
But:
\n
Ordinary string: '\n' -> newline
Raw string: r'\n' -> \n
A newline in the result for the former, a backslash and an "n" for the
latter.
So the advantage of the raw string is _reliably preserving the
backslash_.
For any situation where backslashes are intended in the resulting string
it is recommended to use a "raw" string in Python, for this reliability.
The two common situations are regexps where backslash introduces special
character classes and Windows file paths, where backslash is the file
separator.
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Tutor
mailing list