[Tutor] Regex/Raw String confusion

Jim Byrnes jf_byrnes at comcast.net
Wed Aug 3 21:54:50 EDT 2016


On 08/03/2016 06:21 PM, Alan Gauld via Tutor wrote:
> On 03/08/16 20:49, Jim Byrnes wrote:
>
>> Regular Expressions he talks about the python escape character being a
>> '\' and regex using alot of backslashes.
>
> In effect there are two levels of escape character, python and
> the regex processor. Unfortunately they both use backslash!
> Python applies its level of escape first then passes the
> modified string to the regex engine which processes the
> remaining regex escapes. It is confusing and one reason
> you should avoid complex regexes if possible.
>
>> by putting an r before the first quote of the string value, you can
>> mark the string as a raw sting, which does not escape characters.</quote>
>
> This avoids python  trying to process the escapes.
> The raw string is then passed to the regex which will
> process the backslash escapes that it recognises.
>
>> A couple of pages later he talks about parentheses having special
>> meaning in regex and what to do if they are in your text.
>>
>> <qoute>In this case, you need to escape the ( and )  characters with a
>> backslash. The \( and \) escape characters in the raw string passed to
>> re.compile() will match actual parenthesis characters.</qoute>
>
> These are regex escape characters. If you did not have the r in front
> you would need to double escape them:
>
> \\( and \\)
>
> So by using the raw string you avoid the initial layer of
> escaping by the python interpreter and only need to worry
> about the regex parser - which is more than enough for anyone
> to worry about!
>

Ok thanks.  The book did not mention 2 levels of escaping.  With what 
you told me in mind I reread that section and the book may have hinted 
at it but I would have never realized it without knowing what you just said.

Is the second example a special case?

phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is: (415) 555-4242.')
print(mo.group(1))
print()
print(mo.group(2))

I ask because it produces the same results with or without the ' r '.

Regards,  Jim



More information about the Tutor mailing list