[Tutor] use of raw strings with regular expression patterns
Manprit Singh
manpritsinghece at gmail.com
Sun Nov 8 08:28:54 EST 2020
Dear Sir,
I have one more very basic question .
Suppose I have to remove all "a" inside the string s1.
s1 = "saaaaregaaaaamaaaa"
>>> re.sub(r"a+", "", s1)
'sregm'
>>> re.sub(r"a", "", s1)
'sregm'
I have solved this with two patterns , one includes a "+" that means one
or more repetition of the previous re . I am confused what pattern must be
chosen for this particular case?
Regards
Manprit Singh
On Sun, Nov 8, 2020 at 3:12 AM Cameron Simpson <cs at cskk.id.au> wrote:
> On 06Nov2020 22:33, Manprit Singh <manpritsinghece at gmail.com> wrote:
> >As you know there are some special characters in regular expressions ,
> >like
> >:
> >\A, \B, \b, \d, \D, \s, \S, \w, \W, \Z
> >
> >is it necessary to use raw string notation like r'\A' while using re
> >patterns made up of these characters ?
>
> Another thing not mentioned in the replies is the backslash itself.
>
> The advantage of a raw string is that when you write a backslash, it is
> part of the string as-is.
>
> So to put a backslash in a regular string, so that it is part of the
> result, you would need to write:
>
> \\
>
> In a raw string, you just write:
>
> \
>
> exactly as you want things.
>
> Now, it happens that in a regular string a backslash _not_ followed by a
> special character (eg "n" for "\n", a newline) is preserved. So they get
> through to the final string anyway. But the moment you _do_ follow the
> backslash with such a character, it is consumed and the character
> translated.
>
> Example:
>
> \h
>
> Ordinary string '\h' -> \h
> Raw string: r'\h' -> \h
> A backslash and an "h" in the result.
>
> But:
>
> \n
>
> Ordinary string: '\n' -> newline
> Raw string: r'\n' -> \n
> A newline in the result for the former, a backslash and an "n" for the
> latter.
>
> So the advantage of the raw string is _reliably preserving the
> backslash_.
>
> For any situation where backslashes are intended in the resulting string
> it is recommended to use a "raw" string in Python, for this reliability.
>
> The two common situations are regexps where backslash introduces special
> character classes and Windows file paths, where backslash is the file
> separator.
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list