backslash woes........
Duncan Booth
duncan at NOSPAMrcp.co.uk
Tue Jul 10 09:18:08 EDT 2001
Martin Franklin <martin.franklin at westerngeco.com> wrote in
news:3B4ADD33.CA2836D1 at westerngeco.com:
>> I think you maybe misunderstand what raw strings do. Raw strings
>> simply prevent any backslash character that is present in the string
>> from being interpreted as an escape sequence. They don't affect the
>> processing or use of the string in any way. Since none of your literal
>> strings contain backslashes there is no reason to use raw strings.
>> In regular expressions backslashes are special, but so are many other
>> characters that could appear in filenames, even on Unix.
>
>
> You are right I don't understand... My strings do include backslashes
> (they are windows filenames from os.path.walk()) I Have indeed changed
> to using string.replace() - having read the HOW TO on
> www.python.org.... and it seems to work (without using raw strings....)
> This all seems very confusing!
>
Let me try to explain. A raw string is a change in notation, not a change
in the string itself. So r'%s' is exactly the same as '%s' or "%s" or
'''%s''' or '\x25\x73', but r'\x25\x73' is a string containing 8 characters
two of which are backslashes.
If you write a string containing a backslash, e.g. 'c:\autoexec.bat' the
backslash may be interpreted as beginning an escape sequence, so in this
case you get 'c:\x07utoexec.bat' as the \a converts to a bell character.
Writing r'c:\autoexec.bat' or writing 'c:\\autoexec.bat' both give you a
identical string containing exactly 15 characters. Both of these are
strings (there is no separate raw string type), and each of them contains
exactly one backslash character:
>>> file1 = r'c:\autoexec.bat'
>>> file2 = 'c:\\autoexec.bat'
>>> print file1
c:\autoexec.bat
>>> print file2
c:\autoexec.bat
>>> print repr(file1)
'c:\\autoexec.bat'
>>> print repr(file2)
'c:\\autoexec.bat'
>>> print len(file1), len(file2)
15 15
>>> print type(file1), type(file2)
<type 'string'> <type 'string'>
In other words the r prefix on a raw string simply changes the way
the string literal is regarded at compile time, it has no further effect on
the processing of data after Python has compiled your code.
If your program reads data from a file, or indeed gets it anywhere else,
then backslashes have no special meaning. Only string literals do this
special interpretation.
The real confusion creeps in because backslash also has a special meaning
in regular expressions. So to put a backslash into a regular expression you
must escape it by preceding it with another backslash, and to write two
backslashes in literal string you must either use a raw string or write 4
backslashes. So the string for a regular expression that matches one
backslash followed by an 'x' could be written as:
s = '\\\\x'
s = r'\\x'
s = re.escape('\\x')
s = re.escape(r'\x')
In all of these s ends up as the same three character string: two
backslashes followed by an 'x'.
Why the 'x'? Because for reasons that escape me, raw strings cannot end
with a single backslash:
>>> r'\\'
'\\\\'
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: invalid token
I hope this makes things a bit clearer.
--
Duncan Booth duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
More information about the Python-list
mailing list