[Python-Dev] Omission in re.sub?

Sun Dec 11 21:47:48 CET 2011

On 11/12/2011 20:27, Guido van Rossum wrote:
> On Sun, Dec 11, 2011 at 12:12 PM, MRAB<python at mrabarnett.plus.com>
> wrote:
>> I've just come across an omission in re.sub which I hadn't noticed
>> before.
>>
>> In re.sub the replacement string can contain escape sequences, for
>> example:
>>
>>>>> repr(re.sub(r"x", r"\n", "axb"))
>> "'a\\nb'"
>>
>> However:
>>
>>>>> repr(re.sub(r"x", r"\x0A", "axb"))
>> "'a\\\\x0Ab'"
>>
>> Yes, it doesn't recognise "\xNN".
>>
>> Is there a reason for this?
>>
>> The regex module does the same, but is there any objection to me
>> fixing it in the regex module? (I'm thinking about compatibility
>> with re here.)
>
> As long as there's a way to place a single backslash in the output
> this seems fine to me, though I'm not sure it's important. Of course
> it will likely break some test... the test will then have to be
> fixed.
>
> I can't remember why we did this -- is there a full list of all the
> escapes that re.sub() interprets somewhere? I thought it was pretty
> limited. Maybe it's the related list of escapes that are supported
> in regular expressions?
>
The documentation says: """That is, \n is converted to a single newline 
character, \r is converted to a linefeed, and so forth."""

All of the other escape sequences work as expected, except for \uNNNN
and \UNNNNNNNN which aren't supported at all in re.

I should probably also add \N{...} to the list for completeness.