Raw string substitution problem

Alan G Isaac alan.isaac at gmail.com
Thu Dec 17 12:54:07 EST 2009


> Alan G Isaac<alan.isaac at gmail.com>  wrote:
>>           >>>  re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
>>           True
>> Why are the first two strings being treated as if they are the last one?
  

On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:
> They aren't.  The last string is different.

Of course it is different.
That is the basis of my question.
Why is it being treated as if it is the same?
(See the end of this post.)


> Alan G Isaac<alan.isaac at gmail.com>  wrote:
>> More simply, consider::
>>
>>           >>>  re.sub('abc', '\\', '123abcdefg')
>>           Traceback (most recent call last):
>>             File "<stdin>", line 1, in<module>
>>             File "C:\Python26\lib\re.py", line 151, in sub
>>               return _compile(pattern, 0).sub(repl, string, count)
>>             File "C:\Python26\lib\re.py", line 273, in _subx
>>               template = _compile_repl(template, pattern)
>>             File "C:\Python26\lib\re.py", line 260, in _compile_repl
>>               raise error, v # invalid expression
>>           sre_constants.error: bogus escape (end of line)
>>
>> Why is this the proper handling of what one might think would be an
>> obvious substitution?


On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:
> Is this what you want?  What you have is a re expression consisting of
> a single backslash that doesn't escape anything (EOL) so it barfs.
         >>>> re.sub('abc', r'\\', '123abcdefg')
         > '123\\defg'


Turning again to the documentation:
         "if it is a string, any backslash escapes in it are processed.
         That is, \n is converted to a single newline character, \r is
         converted to a linefeed, and so forth."
So why is '\n' converted to a newline but '\\' does not become a literal
backslash?  OK, I don't do much string processing, so perhaps this is where
I am missing the point: how is the replacement being "converted"?
(As Peter's example shows, if you supply the replacement via
a function, this does not happen.) You suggest it is just a matter of
it being an re, but::

         >>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd')
         True
         >>> re.compile('a\\nc') == re.compile('a\nc')
         False

So I have two string that are not the same, nor do they compile
equivalently, yet apparently they are "converted" to something
equivalent for the substitution. Why? Is my question clearer?

If the answer looks too obvious to state, assume I'm missing it anyway
and please state it.  As I said, I seldom use the re module.

Alan Isaac



More information about the Python-list mailing list