Correct syntax for pathological re.search()

MRAB python at mrabarnett.plus.com
Fri Oct 11 20:37:55 EDT 2024


On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
> Is there some utility function out there that can be called to show what the
> regular expression you typed in will look like by the time it is ready to be
> used?
> 
> Obviously, life is not that simple as it can go through multiple layers with
> each dealing with a layer of backslashes.
> 
> But for simple cases, ...
> 
Yes. It's called 'print'. :-)
> 
> 
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
> Behalf Of Gilmeh Serda via Python-list
> Sent: Friday, October 11, 2024 10:44 AM
> To: python-list at python.org
> Subject: Re: Correct syntax for pathological re.search()
> 
> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
> 
>> I'm trying to discard lines that include the string "\sout{" (which is
>> TeX, for those who are curious. I have tried:
>>    if not re.search("\sout{", line): if not re.search("\sout\{", line):
>>    if not re.search("\\sout{", line): if not re.search("\\sout\{",
>>    line):
>> 
>> But the lines with that string keep coming through. What is the right
>> syntax to properly escape the backslash and the left curly bracket?
> 
> $ python
> Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import re
>>>> s = r"testing \sout{WHADDEVVA}"
>>>> re.search(r"\\sout{", s)
> <re.Match object; span=(8, 14), match='\\sout{'>
> 
> You want a literal backslash, hence, you need to escape everything.
> 
> It is not enough to escape the "\s" as "\\s", because that only takes care
> of Python's demands for escaping "\". You also need to escape the "\" for
> the RegEx as well, or it will read it like it means "\s", which is the
> RegEx for a space character and therefore your search doesn't match,
> because it reads it like you want to search for " out{".
> 
> Therefore, you need to escape it either as per my example, or by using
> four "\" and no "r" in front of the first quote, which also works:
> 
>>>> re.search("\\\\sout{", s)
> <re.Match object; span=(8, 14), match='\\sout{'>
> 
> You don't need to escape the curly braces. We call them "seagull wings"
> where I live.
> 



More information about the Python-list mailing list