Correct syntax for pathological re.search()
Thomas Passin
list1 at tompassin.net
Sat Oct 12 09:06:54 EDT 2024
On 10/11/2024 8:37 PM, MRAB via Python-list wrote:
> On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
>> Is there some utility function out there that can be called to show
>> what the
>> regular expression you typed in will look like by the time it is ready
>> to be
>> used?
>>
>> Obviously, life is not that simple as it can go through multiple
>> layers with
>> each dealing with a layer of backslashes.
>>
>> But for simple cases, ...
>>
> Yes. It's called 'print'. :-)
There is section in the Python docs about this backslash subject. It's
titled "The Backslash Plague" in
https://docs.python.org/3/howto/regex.html
You can also inspect the compiled expression to see what string it
received after all the escaping:
>>> import re
>>>
>>> re_string = '\\w+\\\\sub'
>>> re_pattern = re.compile(re_string)
>>>
>>> # Should look as if we had used r'\w+\\sub'
>>> print(re_pattern.pattern)
\w+\\sub
>> -----Original Message-----
>> From: Python-list <python-list-
>> bounces+avi.e.gross=gmail.com at python.org> On
>> Behalf Of Gilmeh Serda via Python-list
>> Sent: Friday, October 11, 2024 10:44 AM
>> To: python-list at python.org
>> Subject: Re: Correct syntax for pathological re.search()
>>
>> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>>
>>> I'm trying to discard lines that include the string "\sout{" (which is
>>> TeX, for those who are curious. I have tried:
>>> if not re.search("\sout{", line): if not re.search("\sout\{", line):
>>> if not re.search("\\sout{", line): if not re.search("\\sout\{",
>>> line):
>>>
>>> But the lines with that string keep coming through. What is the right
>>> syntax to properly escape the backslash and the left curly bracket?
>>
>> $ python
>> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on
>> linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import re
>>>>> s = r"testing \sout{WHADDEVVA}"
>>>>> re.search(r"\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You want a literal backslash, hence, you need to escape everything.
>>
>> It is not enough to escape the "\s" as "\\s", because that only takes
>> care
>> of Python's demands for escaping "\". You also need to escape the "\" for
>> the RegEx as well, or it will read it like it means "\s", which is the
>> RegEx for a space character and therefore your search doesn't match,
>> because it reads it like you want to search for " out{".
>>
>> Therefore, you need to escape it either as per my example, or by using
>> four "\" and no "r" in front of the first quote, which also works:
>>
>>>>> re.search("\\\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You don't need to escape the curly braces. We call them "seagull wings"
>> where I live.
>>
>
More information about the Python-list
mailing list