When does the escape character work within raw strings?

Sat May 23 21:34:25 EDT 2009

On Sat, 23 May 2009 14:05:10 -0700, walterbyrd wrote:

> On May 22, 12:22 pm, "Rhodri James"
>> How do you know how a string object is going to be treated by any given
>> function?  Read the Fine Manual for that function.
> 
> So am I to understand that there is no consistency in string handling
> throughout the standard modules/objects/methods?

No, you have completely misunderstood.

> Seems to make python a lot more complicated than it needs to be, but
> okay.

No, you are imagining complexity that doesn't exist.

To the Python compiler, a string is a string is a string. The rules are 
very simple: you write a string literal using quotation marks to tell the 
compiler "the text between these delimiters are a literal string". Here 
are the delimiters understood by Python:

Regular strings, must be on a single line:
'  '  or  "  "

Regular strings, allowed to include multiple lines:
'''  '''  or  """  """

Raw strings, must be on a single line:
r'  '  or  r"  "

Raw strings, allowed to include multiple lines:
r'''  '''  or  r"""  """

Regular strings interpret backslash escapes specially: \c has special 
meaning depending on what c is. For example, \t is interpreted by the 
compiler as a tab, and \n is interpreted as a newline. Raw strings 
*don't* interpret backslashes specially (except that you can't end the 
raw string with an odd number of backslashes).

That is how you *create* string literals. It is 100% consistent all 
through Python: the rules apply in every module, in every function, 
everywhere, because the compiler creates the string before the function 
or module gets a chance to see the string.

Having been created, how the string is *used* depends on the application, 
and Python modules and functions are no different. Inside a calculator 
application, the meaning of the literal string "x/y" would be very 
different than it would be inside an application dealing with file names. 
Python modules are no different:

- the os module interprets many strings as file names according to the 
rules for your operating system: e.g. on Linux '/' separates parts of the 
pathname into sub-directories. On Windows, either forward or backslashes 
are used to separate directories, and ':' is used to separate drive 
letters from the path.

- the glob module interprets strings according to the rules for shell 
globbing: e.g. '*' means 'match any number of any character', '?' means 
'match a single of any character'.

- the re module interprets strings according to the rules for regular 
expressions: e.g. '.*' means 'match any number of any character (except 
newline by default)' and '\d' (backslash-d) means 'match a single decimal 
digit'.

- the urllib and urllib2 modules interpret strings according to the rules 
of dealing with URLs.

In every case, you construct the string literals using the same rules, 
but the *meaning* of them differs according to the application. Because 
regular expressions give special meanings to literal backslashes, it is 
inconvenient to create many regexes using regular strings, because you 
need to escape the backslashes. That's where raw strings are more useful.

-- 
Steven