[Tutor] Regex/Raw String confusion
Jim Byrnes
jf_byrnes at comcast.net
Wed Aug 3 21:54:50 EDT 2016
On 08/03/2016 06:21 PM, Alan Gauld via Tutor wrote:
> On 03/08/16 20:49, Jim Byrnes wrote:
>
>> Regular Expressions he talks about the python escape character being a
>> '\' and regex using alot of backslashes.
>
> In effect there are two levels of escape character, python and
> the regex processor. Unfortunately they both use backslash!
> Python applies its level of escape first then passes the
> modified string to the regex engine which processes the
> remaining regex escapes. It is confusing and one reason
> you should avoid complex regexes if possible.
>
>> by putting an r before the first quote of the string value, you can
>> mark the string as a raw sting, which does not escape characters.</quote>
>
> This avoids python trying to process the escapes.
> The raw string is then passed to the regex which will
> process the backslash escapes that it recognises.
>
>> A couple of pages later he talks about parentheses having special
>> meaning in regex and what to do if they are in your text.
>>
>> <qoute>In this case, you need to escape the ( and ) characters with a
>> backslash. The \( and \) escape characters in the raw string passed to
>> re.compile() will match actual parenthesis characters.</qoute>
>
> These are regex escape characters. If you did not have the r in front
> you would need to double escape them:
>
> \\( and \\)
>
> So by using the raw string you avoid the initial layer of
> escaping by the python interpreter and only need to worry
> about the regex parser - which is more than enough for anyone
> to worry about!
>
Ok thanks. The book did not mention 2 levels of escaping. With what
you told me in mind I reread that section and the book may have hinted
at it but I would have never realized it without knowing what you just said.
Is the second example a special case?
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is: (415) 555-4242.')
print(mo.group(1))
print()
print(mo.group(2))
I ask because it produces the same results with or without the ' r '.
Regards, Jim
More information about the Tutor
mailing list