Raw string substitution problem

Rhodri James rhodri at wildebst.demon.co.uk
Thu Dec 17 19:59:12 EST 2009


On Thu, 17 Dec 2009 20:18:12 -0000, Alan G Isaac <alan.isaac at gmail.com>  
wrote:

> So is the bottom line the following?
> A string replacement is not just "converted"
> as described in the documentation, essentially
> it is compiled?

That depends entirely on what you mean.

> But that cannot quite be right.  E.g., \b will be a back
> space not a word boundary.  So then the question arises
> again, why isn't '\\' a backslash? Just because?
> Why does it not get the "obvious" conversion?

'\\' *is* a backslash.  That string containing a single backslash is then  
processed by the re module which sees a backslash, tries to interpret it  
as an escape, fails and barfs.

"re.compile('a\\nc')" passes a sequence of four characters to re.compile:  
'a', '\', 'n' and 'c'.  re.compile() then does it's own interpretation:  
'a' passes through as is, '\' flags an escape which combined with 'n'  
produces the newline character (0x0a), and 'c' passes through as is.

"re.compile('a\nc')" by contrast passes a sequence of three character to  
re.compile: 'a', 0x0a and 'c'.  re.compile() does it's own interpretation,  
which happens not to change any of the characters, resulting in the same  
regular expression as before.

Your problem is that you are conflating the compile-time processing of  
string literals with the run-time processing of strings specific to re.

-- 
Rhodri James *-* Wildebeeste Herder to the Masses



More information about the Python-list mailing list