Correct syntax for pathological re.search()
MRAB
python at mrabarnett.plus.com
Fri Oct 11 20:37:55 EDT 2024
On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what the
> regular expression you typed in will look like by the time it is ready to be
> used?
>
> Obviously, life is not that simple as it can go through multiple layers with
> each dealing with a layer of backslashes.
>
> But for simple cases, ...
>
Yes. It's called 'print'. :-)
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
> Behalf Of Gilmeh Serda via Python-list
> Sent: Friday, October 11, 2024 10:44 AM
> To: python-list at python.org
> Subject: Re: Correct syntax for pathological re.search()
>
> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>
>> I'm trying to discard lines that include the string "\sout{" (which is
>> TeX, for those who are curious. I have tried:
>> if not re.search("\sout{", line): if not re.search("\sout\{", line):
>> if not re.search("\\sout{", line): if not re.search("\\sout\{",
>> line):
>>
>> But the lines with that string keep coming through. What is the right
>> syntax to properly escape the backslash and the left curly bracket?
>
> $ python
> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import re
>>>> s = r"testing \sout{WHADDEVVA}"
>>>> re.search(r"\\sout{", s)
> <re.Match object; span=(8, 14), match='\\sout{'>
>
> You want a literal backslash, hence, you need to escape everything.
>
> It is not enough to escape the "\s" as "\\s", because that only takes care
> of Python's demands for escaping "\". You also need to escape the "\" for
> the RegEx as well, or it will read it like it means "\s", which is the
> RegEx for a space character and therefore your search doesn't match,
> because it reads it like you want to search for " out{".
>
> Therefore, you need to escape it either as per my example, or by using
> four "\" and no "r" in front of the first quote, which also works:
>
>>>> re.search("\\\\sout{", s)
> <re.Match object; span=(8, 14), match='\\sout{'>
>
> You don't need to escape the curly braces. We call them "seagull wings"
> where I live.
>
More information about the Python-list
mailing list