[Tutor] Regex/Raw String confusion

Alan Gauld alan.gauld at yahoo.co.uk
Wed Aug 3 19:21:22 EDT 2016


On 03/08/16 20:49, Jim Byrnes wrote:

> Regular Expressions he talks about the python escape character being a 
> '\' and regex using alot of backslashes. 

In effect there are two levels of escape character, python and
the regex processor. Unfortunately they both use backslash!
Python applies its level of escape first then passes the
modified string to the regex engine which processes the
remaining regex escapes. It is confusing and one reason
you should avoid complex regexes if possible.

> by putting an r before the first quote of the string value, you can 
> mark the string as a raw sting, which does not escape characters.</quote>

This avoids python  trying to process the escapes.
The raw string is then passed to the regex which will
process the backslash escapes that it recognises.

> A couple of pages later he talks about parentheses having special 
> meaning in regex and what to do if they are in your text.
> 
> <qoute>In this case, you need to escape the ( and )  characters with a 
> backslash. The \( and \) escape characters in the raw string passed to 
> re.compile() will match actual parenthesis characters.</qoute>

These are regex escape characters. If you did not have the r in front
you would need to double escape them:

\\( and \\)

So by using the raw string you avoid the initial layer of
escaping by the python interpreter and only need to worry
about the regex parser - which is more than enough for anyone
to worry about!

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list