Raw string substitution problem

Alan G Isaac alan.isaac at gmail.com
Thu Dec 17 10:08:00 EST 2009


> En Wed, 16 Dec 2009 11:09:32 -0300, Ed Keith <e_d_k at yahoo.com> escribió:
>
>> I am having a problem when substituting a raw string. When I do the
>> following:
>>
>> re.sub('abc', r'a\nb\nc', '123abcdefg')
>>
>> I get
>>
>> """
>> 123a
>> b
>> cdefg
>> """
>>
>> what I want is
>>
>> r'123a\nb\ncdefg'

  
On 12/16/2009 9:35 AM, Gabriel Genellina wrote:
>  From http://docs.python.org/library/re.html#re.sub
>
> re.sub(pattern, repl, string[, count])
>
> ...repl can be a string or a function; if
> it is a string, any backslash escapes in
> it are processed. That is, \n is converted
> to a single newline character, \r is
> converted to a linefeed, and so forth.
>
> So you'll have to double your backslashes:



I'm not persuaded that the docs are clear.  Consider:

         >>> 'ab\\ncd' == r'ab\ncd'
         True

Naturally enough.  So I think the right answer is:

1. this is a documentation bug (i.e., the documentation
    fails to specify unexpected behavior for raw strings), or
2. this is a bug (i.e., raw strings are not handled correctly
    when used as replacements)

I vote for 2.

Peter's use of a function highlights just how odd this is:
getting the raw string via a function produces a different
result than providing it directly.  If this is really the
way things ought to be, I'd appreciate a clear explanation
of why.

Alan Isaac




More information about the Python-list mailing list