Raw string substitution problem

Wed Dec 16 14:54:32 EST 2009

Gabriel Genellina wrote:

> En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__peter__ at web.de>
> escribió:
> 
>> Ed Keith wrote:
>>
>>> --- On Wed, 12/16/09, Gabriel Genellina <gagsl-py2 at yahoo.com.ar> wrote:
>>>
>>>> Ed Keith <e_d_k at yahoo.com>
>>>> escribió:
>>>>
>>>> > I am having a problem when substituting a raw string.
>>>> When I do the following:
>>>> >
>>>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>>>> >
>>>> > I get
>>>> >
>>>> > """
>>>> > 123a
>>>> > b
>>>> > cdefg
>>>> > """
>>>> >
>>>> > what I want is
>>>> >
>>>> > r'123a\nb\ncdefg'
>>>>
>>>> So you'll have to double your backslashes:
>>>>
>>>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>>>> '123a\\nb\\ncdefg'
>>>>
>>> That is going to be a nontrivial exercise. I have control over the
>>> pattern, but the texts to be substituted and substituted into will be
>>> read
>>> from user supplied files. I need to reproduce the exact text the is read
>>> from the file.
>>
>> There is a helper function re.escape() that you can use to sanitize the
>> substitution:
>>
>>>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
>> 123a\nb\ncdefg
> 
> Unfortunately re.escape does much more than that:
> 
> py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
> 123a\.b\.cdefg

Sorry, I didn't think of that.

> I think the string_escape encoding is what the OP needs:
> 
> py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
> '123abcdefg')
> 123a\n(b.c)\nddefg

Another possibility:

>>> print re.sub('abc', lambda m: r'a\nb\n.c\a', '123abcdefg')
123a\nb\n.c\adefg

Peter