Correct syntax for pathological re.search()

Thomas Passin list1 at tompassin.net
Sat Oct 12 09:06:54 EDT 2024


On 10/11/2024 8:37 PM, MRAB via Python-list wrote:
> On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
>> Is there some utility function out there that can be called to show 
>> what the
>> regular expression you typed in will look like by the time it is ready 
>> to be
>> used?
>>
>> Obviously, life is not that simple as it can go through multiple 
>> layers with
>> each dealing with a layer of backslashes.
>>
>> But for simple cases, ...
>>
> Yes. It's called 'print'. :-)

There is section in the Python docs about this backslash subject.  It's 
titled "The Backslash Plague" in

https://docs.python.org/3/howto/regex.html

You can also inspect the compiled expression to see what string it 
received after all the escaping:

>>> import re
>>>
>>> re_string = '\\w+\\\\sub'
>>> re_pattern = re.compile(re_string)
>>>
>>> # Should look as if we had used r'\w+\\sub'
>>> print(re_pattern.pattern)
\w+\\sub


>> -----Original Message-----
>> From: Python-list <python-list- 
>> bounces+avi.e.gross=gmail.com at python.org> On
>> Behalf Of Gilmeh Serda via Python-list
>> Sent: Friday, October 11, 2024 10:44 AM
>> To: python-list at python.org
>> Subject: Re: Correct syntax for pathological re.search()
>>
>> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>>
>>> I'm trying to discard lines that include the string "\sout{" (which is
>>> TeX, for those who are curious. I have tried:
>>>    if not re.search("\sout{", line): if not re.search("\sout\{", line):
>>>    if not re.search("\\sout{", line): if not re.search("\\sout\{",
>>>    line):
>>>
>>> But the lines with that string keep coming through. What is the right
>>> syntax to properly escape the backslash and the left curly bracket?
>>
>> $ python
>> Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on 
>> linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import re
>>>>> s = r"testing \sout{WHADDEVVA}"
>>>>> re.search(r"\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You want a literal backslash, hence, you need to escape everything.
>>
>> It is not enough to escape the "\s" as "\\s", because that only takes 
>> care
>> of Python's demands for escaping "\". You also need to escape the "\" for
>> the RegEx as well, or it will read it like it means "\s", which is the
>> RegEx for a space character and therefore your search doesn't match,
>> because it reads it like you want to search for " out{".
>>
>> Therefore, you need to escape it either as per my example, or by using
>> four "\" and no "r" in front of the first quote, which also works:
>>
>>>>> re.search("\\\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You don't need to escape the curly braces. We call them "seagull wings"
>> where I live.
>>
> 



More information about the Python-list mailing list