backslash woes........

Duncan Booth duncan at NOSPAMrcp.co.uk
Tue Jul 10 09:18:08 EDT 2001


Martin Franklin <martin.franklin at westerngeco.com> wrote in
news:3B4ADD33.CA2836D1 at westerngeco.com: 

>> I think you maybe misunderstand what raw strings do. Raw strings
>> simply prevent any backslash character that is present in the string
>> from being interpreted as an escape sequence. They don't affect the
>> processing or use of the string in any way. Since none of your literal
>> strings contain backslashes there is no reason to use raw strings.
>> In regular expressions backslashes are special, but so are many other
>> characters that could appear in filenames, even on Unix. 
> 
> 
> You are right I don't understand...  My strings do include backslashes
> (they are windows filenames from os.path.walk())  I Have indeed changed
> to using string.replace() - having read the HOW TO on
> www.python.org.... and it seems to work (without using raw strings....)
> This all seems very confusing! 
> 

Let me try to explain. A raw string is a change in notation, not a change 
in the string itself. So r'%s' is exactly the same as '%s' or "%s" or 
'''%s''' or '\x25\x73', but r'\x25\x73' is a string containing 8 characters 
two of which are backslashes.
If you write a string containing a backslash, e.g. 'c:\autoexec.bat' the 
backslash may be interpreted as beginning an escape sequence, so in this 
case you get 'c:\x07utoexec.bat' as the \a converts to a bell character. 
Writing r'c:\autoexec.bat' or writing 'c:\\autoexec.bat' both give you a 
identical string containing exactly 15 characters. Both of these are 
strings (there is no separate raw string type), and each of them contains 
exactly one backslash character:

>>> file1 = r'c:\autoexec.bat'
>>> file2 = 'c:\\autoexec.bat'
>>> print file1
c:\autoexec.bat
>>> print file2
c:\autoexec.bat
>>> print repr(file1)
'c:\\autoexec.bat'
>>> print repr(file2)
'c:\\autoexec.bat'
>>> print len(file1), len(file2)
15 15
>>> print type(file1), type(file2)
<type 'string'> <type 'string'>

In other words the r prefix on a raw string simply changes the way 
the string literal is regarded at compile time, it has no further effect on 
the processing of data after Python has compiled your code.

If your program reads data from a file, or indeed gets it anywhere else, 
then backslashes have no special meaning. Only string literals do this 
special interpretation.

The real confusion creeps in because backslash also has a special meaning 
in regular expressions. So to put a backslash into a regular expression you 
must escape it by preceding it with another backslash, and to write two 
backslashes in literal string you must either use a raw string or write 4 
backslashes. So the string for a regular expression that matches one 
backslash followed by an 'x' could be written as:
    	s = '\\\\x'
    	s = r'\\x'
    	s = re.escape('\\x')
    	s = re.escape(r'\x')
In all of these s ends up as the same three character string: two 
backslashes followed by an 'x'.

Why the 'x'? Because for reasons that escape me, raw strings cannot end 
with a single backslash:
>>> r'\\'
'\\\\'
>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: invalid token

I hope this makes things a bit clearer.
-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?



More information about the Python-list mailing list