Raw string substitution problem

MRAB python at mrabarnett.plus.com
Thu Dec 17 15:51:04 EST 2009


Alan G Isaac wrote:
> On 12/17/2009 2:45 PM, MRAB wrote:
>> re.compile('a\\nc') _does_ compile to the same as regex as
>> re.compile('a\nc').
>>
>> However, regex objects never compare equal to each other, so, strictly
>> speaking, re.compile('a\nc') != re.compile('a\nc').
>>
>> However, having said that, the re module contains a cache (keyed on the
>> string and options supplied), so the first re.compile('a\nc') will put
>> the regex object in the cache and the second re.compile('a\nc') will
>> return that same regex object from the cache. If you clear the cache in
>> between the two calls (do re._cache.clear()) you'll get two different
>> regex objects which won't compare equal even though they are to all
>> intents identical.
> 
> 
> OK, this is helpful.
> (I did check equality but did not understand
> I got True only because re used caching.)
> So is the bottom line the following?
> A string replacement is not just "converted"
> as described in the documentation, essentially
> it is compiled?
> 
> But that cannot quite be right.  E.g., \b will be a back
> space not a word boundary.  So then the question arises
> again, why isn't '\\' a backslash? Just because?
> Why does it not get the "obvious" conversion?
> 
If you give the re module a string containing \b, eg. '\\b' or r'\b',
then it will compile it to a word boundary if it's in a regex string or
a backspace if it's in a replacement string. This is different from
giving the re module a string which actually contains a backspace, eg,
'\b'.

Because the re module uses backslashes for escaping, you'll need to
escape a literal backslash with a backslash in the string you give it.
But string literals also use backslashes for escaping, so you'll need to
escape each of those backslashes with a backslash.



More information about the Python-list mailing list