When does the escape character work within raw strings?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri May 22 17:29:16 CEST 2009


On Fri, 22 May 2009 07:47:49 -0700, walterbyrd wrote:

> On May 21, 9:44 pm, "Rhodri James" <rho... at wildebst.demon.co.uk> wrote:
> 
>> Escaping the delimiting quote is the *one* time backslashes have a
>> special meaning in raw string literals.
> 
> If that were true, then wouldn't r'\b' be treated as two characters?

It is.

>>> len(r'\b')
2



>> This calls re.sub with a pattern string object that contains two
>> characters, a backslash followed by an 'n'.  This combination *does*
>> have a special meaning to the sub function, which does it's own
>> translation of the pattern into a single newline character.
> 
> So when do I know when a raw string is treated as a raw string, and when
> it's not?

You have misunderstood. All strings are strings, but there are different 
ways to build a string. Raw strings are not different from ordinary 
strings, they're just a different way to *build* an ordinary string.

Here are four ways to make the same string, a backslash followed by a 
lowercase b:

"\\b"        # use an ordinary string, and escape the backslash
chr(92)+"b"  # use the chr() function
"\x5cb"      # use a hex escape
r"\b"        # use a raw string, no escaping needed

The results you get from all of those (and many, many more!) are the same 
string object. They're just written differently as source code.

Now, in regular expressions, the RE engine expects to see special codes 
inside the string that have special meanings. For example, backslash 
followed by lowercase B has a special meaning. So to create a string 
containing that regex, you can use any of the above (or any of the 
others). The RE engine doesn't know, and can't know, how you generated 
the regex. All it sees is a string containing a backslash followed by 
lowercase-B.

But if you forget that Python uses backslash escapes in strings, and just 
write "\b", then the compiler creates the string chr(8) (BEL), which has 
no special meaning to the RE engine.


-- 
Steven



More information about the Python-list mailing list