Different behaviour of regexp in 3.6.0b2

Fri Oct 14 13:01:53 EDT 2016

Lele Gaifax wrote:

> Peter Otten <__peter__ at web.de> writes:
> 
>> Lele Gaifax wrote:
>>
>>> The original intent is to replace spaces within a string with the
>>> regular expression \s+ (see
>>> ...
>>> Accordingly to the documentation
>>> (https://docs.python.org/3.6/library/re.html#re.sub) “unknown escapes
>>> [in the repl argument] such as \& are left alone”.
> 
>> According to
>>
>> https://docs.python.org/dev/library/re.html#re.sub
>>
>> rejection of \s is intentional
>>
>> """
>> Changed in version 3.6: Unknown escapes consisting of '\' and an ASCII
>> letter now are errors.
>> """
> 
> So, how am I supposed to achieve the mentioned intent? By doubling the
> escape in the replacement?

If there are no escape sequences aimed to be handled by re.sub() you can 
escape the replacement wholesale:

>>> re.sub(r'\s+', re.escape(r'\s+'), 'foo bar')
'foo\\s\\+bar'

OK, that probably escaped too much. Second attempt:

>>> re.sub(r'\s+', lambda m: r'\s+', 'foo bar')
'foo\\s+bar'

Better? If that's too much work at runtime:

>>> def double_bs(s): return "\\\\".join(s.split("\\"))
... 
>>> re.sub(r'\s+', double_bs(r'\s+'), 'foo bar')
'foo\\s+bar'

>> though IMHO the traceback needs a cleanup.
> 
> And the documentation as well, to clarify the fact immediately, without
> assuming one will scroll down to the "changed in version" part (at least,
> that is what seem the rule in other parts of the manual).
> 
> Thank you,
> ciao, lele.